Showing papers on "Gaussian process published in 2018"

PDF

Open Access

Proceedings Article•

Neural Tangent Kernel: Convergence and Generalization in Neural Networks

[...]

Arthur Jacot¹, Franck Gabriel², Clément Hongler¹•Institutions (2)

École Polytechnique Fédérale de Lausanne¹, Imperial College London²

20 Jun 2018

TL;DR: This talk will introduce this formalism and give a number of results on the Neural Tangent Kernel and explain how they give us insight into the dynamics of neural networks during training and into their generalization features.

...read moreread less

Abstract: At initialization, artificial neural networks (ANNs) are equivalent to Gaussian processes in the infinite-width limit, thus connecting them to kernel methods. We prove that the evolution of an ANN during training can also be described by a kernel: during gradient descent on the parameters of an ANN, the network function (which maps input vectors to output vectors) follows the so-called kernel gradient associated with a new object, which we call the Neural Tangent Kernel (NTK). This kernel is central to describe the generalization features of ANNs. While the NTK is random at initialization and varies during training, in the infinite-width limit it converges to an explicit limiting kernel and stays constant during training. This makes it possible to study the training of ANNs in function space instead of parameter space. Convergence of the training can then be related to the positive-definiteness of the limiting NTK. We then focus on the setting of least-squares regression and show that in the infinite-width limit, the network function follows a linear differential equation during training. The convergence is fastest along the largest kernel principal components of the input data with respect to the NTK, hence suggesting a theoretical motivation for early stopping. Finally we study the NTK numerically, observe its behavior for wide networks, and compare it to the infinite-width limit.

...read moreread less

1,787 citations

Proceedings Article•

Deep Neural Networks as Gaussian Processes

[...]

Jaehoon Lee¹, Yasaman Bahri², Roman Novak², Samuel S. Schoenholz², Jeffrey Pennington², Jascha Sohl-Dickstein² - Show less +2 more•Institutions (2)

University of British Columbia¹, Google²

15 Feb 2018

TL;DR: The exact equivalence between infinitely wide deep networks and GPs is derived and it is found that test performance increases as finite-width trained networks are made wider and more similar to a GP, and thus that GP predictions typically outperform those of finite- width networks.

...read moreread less

Abstract: It has long been known that a single-layer fully-connected neural network with an i.i.d. prior over its parameters is equivalent to a Gaussian process (GP), in the limit of infinite network width. This correspondence enables exact Bayesian inference for infinite width neural networks on regression tasks by means of evaluating the corresponding GP. Recently, kernel functions which mimic multi-layer random neural networks have been developed, but only outside of a Bayesian framework. As such, previous work has not identified that these kernels can be used as covariance functions for GPs and allow fully Bayesian prediction with a deep neural network. In this work, we derive the exact equivalence between infinitely wide deep networks and GPs. We further develop a computationally efficient pipeline to compute the covariance function for these GPs. We then use the resulting GPs to perform Bayesian inference for wide deep neural networks on MNIST and CIFAR-10. We observe that trained neural network accuracy approaches that of the corresponding GP with increasing layer width, and that the GP uncertainty is strongly correlated with trained network prediction error. We further find that test performance increases as finite-width trained networks are made wider and more similar to a GP, and thus that GP predictions typically outperform those of finite-width networks. Finally we connect the performance of these GPs to the recent theory of signal propagation in random neural networks.

...read moreread less

757 citations

Journal Article•DOI•

A tutorial on Gaussian process regression: Modelling, exploring, and exploiting functions

[...]

Eric Schulz¹, Maarten Speekenbrink², Andreas Krause³•Institutions (3)

Harvard University¹, University College London², ETH Zurich³

01 Aug 2018-Journal of Mathematical Psychology

TL;DR: This tutorial introduces the reader to Gaussian process regression as an expressive tool to model, actively explore and exploit unknown functions and describes a situation modelling risk-averse exploration in which an additional constraint needs to be accounted for.

...read moreread less

585 citations

Posted Content•

Set Transformer: A Framework for Attention-based Permutation-Invariant Neural Networks

[...]

Juho Lee¹, Yoonho Lee, Jungtaek Kim², Adam R. Kosiorek¹, Seungjin Choi², Yee Whye Teh¹ - Show less +2 more•Institutions (2)

University of Oxford¹, Pohang University of Science and Technology²

01 Oct 2018-arXiv: Learning

TL;DR: This work presents an attention-based neural network module, the Set Transformer, specifically designed to model interactions among elements in the input set, and reduces the computation time of self-attention from quadratic to linear in the number of Elements in the set.

...read moreread less

Abstract: Many machine learning tasks such as multiple instance learning, 3D shape recognition, and few-shot image classification are defined on sets of instances. Since solutions to such problems do not depend on the order of elements of the set, models used to address them should be permutation invariant. We present an attention-based neural network module, the Set Transformer, specifically designed to model interactions among elements in the input set. The model consists of an encoder and a decoder, both of which rely on attention mechanisms. In an effort to reduce computational complexity, we introduce an attention scheme inspired by inducing point methods from sparse Gaussian process literature. It reduces the computation time of self-attention from quadratic to linear in the number of elements in the set. We show that our model is theoretically attractive and we evaluate it on a range of tasks, demonstrating the state-of-the-art performance compared to recent methods for set-structured data.

...read moreread less

500 citations

Posted Content•

Gaussian Process Behaviour in Wide Deep Neural Networks

[...]

Alexander G. de G. Matthews¹, Mark Rowland², Jiri Hron², Richard E. Turner², Zoubin Ghahramani² - Show less +1 more•Institutions (2)

Google¹, University of Cambridge²

30 Apr 2018-arXiv: Machine Learning

TL;DR: In this paper, the authors study the relationship between random, wide, fully connected, feedforward networks with more than one hidden layer and Gaussian processes with a recursive kernel definition and show that, under broad conditions, as they make the architecture increasingly wide, the implied random function converges in distribution to a Gaussian process.

...read moreread less

Abstract: Whilst deep neural networks have shown great empirical success, there is still much work to be done to understand their theoretical properties. In this paper, we study the relationship between random, wide, fully connected, feedforward networks with more than one hidden layer and Gaussian processes with a recursive kernel definition. We show that, under broad conditions, as we make the architecture increasingly wide, the implied random function converges in distribution to a Gaussian process, formalising and extending existing results by Neal (1996) to deep networks. To evaluate convergence rates empirically, we use maximum mean discrepancy. We then compare finite Bayesian deep networks from the literature to Gaussian processes in terms of the key predictive quantities of interest, finding that in some cases the agreement can be very close. We discuss the desirability of Gaussian process behaviour and review non-Gaussian alternative models from the literature.

...read moreread less

257 citations

Journal Article•DOI•

Fault Detection for Non-Gaussian Processes Using Generalized Canonical Correlation Analysis and Randomized Algorithms

[...]

Zhiwen Chen¹, Steven X. Ding², Tao Peng¹, Chunhua Yang¹, Weihua Gui¹ - Show less +1 more•Institutions (2)

Central South University¹, University of Duisburg-Essen²

01 Feb 2018-IEEE Transactions on Industrial Electronics

TL;DR: An FD technique combining the generalized CCA with the threshold-setting based on the randomized algorithm is proposed and applied to the simulated traction drive control system of high-speed trains and shows that the proposed method is able to improve the detection performance significantly in comparison with the standard generalized C CA-based FD method.

...read moreread less

Abstract: In this paper, we first study a generalized canonical correlation analysis (CCA)-based fault detection (FD) method aiming at maximizing the fault detectability under an acceptable false alarm rate. More specifically, two residual signals are generated for detecting of faults in input and output subspaces, respectively. The minimum covariances of the two residual signals are achieved by taking the correlation between input and output into account. Considering the limited application scope of the generalized CCA due to the Gaussian assumption on the process noises, an FD technique combining the generalized CCA with the threshold-setting based on the randomized algorithm is proposed and applied to the simulated traction drive control system of high-speed trains. The achieved results show that the proposed method is able to improve the detection performance significantly in comparison with the standard generalized CCA-based FD method.

...read moreread less

252 citations

Posted Content•

Gaussian Processes and Kernel Methods: A Review on Connections and Equivalences.

[...]

Motonobu Kanagawa, Philipp Hennig, Dino Sejdinovic, Bharath K. Sriperumbudur

06 Jul 2018-arXiv: Machine Learning

TL;DR: This paper is an attempt to bridge the conceptual gaps between researchers working on the two widely used approaches based on positive definite kernels: Bayesian learning or inference using Gaussian processes on the one side, and frequentist kernel methods based on reproducing kernel Hilbert spaces on the other.

...read moreread less

Abstract: This paper is an attempt to bridge the conceptual gaps between researchers working on the two widely used approaches based on positive definite kernels: Bayesian learning or inference using Gaussian processes on the one side, and frequentist kernel methods based on reproducing kernel Hilbert spaces on the other. It is widely known in machine learning that these two formalisms are closely related; for instance, the estimator of kernel ridge regression is identical to the posterior mean of Gaussian process regression. However, they have been studied and developed almost independently by two essentially separate communities, and this makes it difficult to seamlessly transfer results between them. Our aim is to overcome this potential difficulty. To this end, we review several old and new results and concepts from either side, and juxtapose algorithmic quantities from each framework to highlight close similarities. We also provide discussions on subtle philosophical and theoretical differences between the two approaches.

...read moreread less

224 citations

Journal Article•DOI•

Inferring probabilistic stellar rotation periods using Gaussian processes

[...]

Ruth Angus¹, Timothy D. Morton², Suzanne Aigrain³, Daniel Foreman-Mackey⁴, Vinesh Rajpaul³ - Show less +1 more•Institutions (4)

Columbia University¹, Princeton University², University of Oxford³, University of Washington⁴

21 Feb 2018-Monthly Notices of the Royal Astronomical Society

TL;DR: A Gaussian Process with a quasi-periodic covariance kernel function that will enable hierarchical studies involving stellar rotation, particularly those involving population modelling, such as inferring stellar ages, obliquities in exoplanet systems, or characterising star-planet interactions.

...read moreread less

Abstract: Variability in the light curves of spotted, rotating stars is often non-sinusoidal and quasi-periodic --- spots move on the stellar surface and have finite lifetimes, causing stellar flux variations to slowly shift in phase. A strictly periodic sinusoid therefore cannot accurately model a rotationally modulated stellar light curve. Physical models of stellar surfaces have many drawbacks preventing effective inference, such as highly degenerate or high-dimensional parameter spaces. In this work, we test an appropriate effective model: a Gaussian Process with a quasi-periodic covariance kernel function. This highly flexible model allows sampling of the posterior probability density function of the periodic parameter, marginalising over the other kernel hyperparameters using a Markov Chain Monte Carlo approach. To test the effectiveness of this method, we infer rotation periods from 333 simulated stellar light curves, demonstrating that the Gaussian process method produces periods that are more accurate than both a sine-fitting periodogram and an autocorrelation function method. We also demonstrate that it works well on real data, by inferring rotation periods for 275 Kepler stars with previously measured periods. We provide a table of rotation periods for these 1132 Kepler objects of interest and their posterior probability density function samples. Because this method delivers posterior probability density functions, it will enable hierarchical studies involving stellar rotation, particularly those involving population modelling, such as inferring stellar ages, obliquities in exoplanet systems, or characterising star-planet interactions. The code used to implement this method is available online.

...read moreread less

171 citations

Journal Article•DOI•

Efficient multiobjective optimization employing Gaussian processes, spectral sampling and a genetic algorithm

[...]

Eric Bradford¹, Eric Bradford², Artur M. Schweidtmann³, Artur M. Schweidtmann¹, Alexei A. Lapkin¹ - Show less +1 more•Institutions (3)

University of Cambridge¹, Norwegian University of Science and Technology², RWTH Aachen University³

16 Feb 2018-Journal of Global Optimization

TL;DR: A new algorithm is proposed, TSEMO, which uses Gaussian processes as surrogates, which gives a simple algorithm without the requirement of a priori knowledge, reduced hypervolume calculations to approach linear scaling with respect to the number of objectives, the capacity to handle noise and the ability for batch-sequential usage.

...read moreread less

Abstract: Many engineering problems require the optimization of expensive, black-box functions involving multiple conflicting criteria, such that commonly used methods like multiobjective genetic algorithms are inadequate. To tackle this problem several algorithms have been developed using surrogates. However, these often have disadvantages such as the requirement of a priori knowledge of the output functions or exponentially scaling computational cost with respect to the number of objectives. In this paper a new algorithm is proposed, TSEMO, which uses Gaussian processes as surrogates. The Gaussian processes are sampled using spectral sampling techniques to make use of Thompson sampling in conjunction with the hypervolume quality indicator and NSGA-II to choose a new evaluation point at each iteration. The reference point required for the hypervolume calculation is estimated within TSEMO. Further, a simple extension was proposed to carry out batch-sequential design. TSEMO was compared to ParEGO, an expected hypervolume implementation, and NSGA-II on nine test problems with a budget of 150 function evaluations. Overall, TSEMO shows promising performance, while giving a simple algorithm without the requirement of a priori knowledge, reduced hypervolume calculations to approach linear scaling with respect to the number of objectives, the capacity to handle noise and lastly the ability for batch-sequential usage.

...read moreread less

167 citations

Journal Article•DOI•

Practical Heteroscedastic Gaussian Process Modeling for Large Simulation Experiments

[...]

Mickaël Binois¹, Robert B. Gramacy², Michael Ludkovski³•Institutions (3)

University of Chicago¹, Virginia Tech², University of California, Santa Barbara³

19 Apr 2018-Journal of Computational and Graphical Statistics

TL;DR: A unified view of likelihood based Gaussian progress regression for simulation experiments exhibiting input-dependent noise is presented, and a latent-variable idea from machine learning is borrowed to address heteroscedasticity, thereby simultaneously leveraging the computational and statistical efficiency of designs with replication.

...read moreread less

Abstract: We present a unified view of likelihood based Gaussian progress regression for simulation experiments exhibiting input-dependent noise. Replication plays an important role in that context, however ...

...read moreread less

166 citations

Journal Article•DOI•

Remarks on multi-output Gaussian process regression

[...]

Haitao Liu¹, Jianfei Cai¹, Yew-Soon Ong¹•Institutions (1)

Nanyang Technological University¹

15 Mar 2018-Knowledge Based Systems

TL;DR: This article investigates the state-of-the-art multi-output Gaussian processes (MOGPs) that can transfer the knowledge across related outputs in order to improve prediction quality and gives some recommendations regarding the usage of MOGPs.

...read moreread less

Abstract: Multi-output regression problems have extensively arisen in modern engineering community. This article investigates the state-of-the-art multi-output Gaussian processes (MOGPs) that can transfer the knowledge across related outputs in order to improve prediction quality. We classify existing MOGPs into two main categories as (1) symmetric MOGPs that improve the predictions for all the outputs, and (2) asymmetric MOGPs, particularly the multi-fidelity MOGPs, that focus on the improvement of high fidelity output via the useful information transferred from related low fidelity outputs. We review existing symmetric/asymmetric MOGPs and analyze their characteristics, e.g., the covariance functions (separable or non-separable), the modeling process (integrated or decomposed), the information transfer (bidirectional or unidirectional), and the hyperparameter inference (joint or separate). Besides, we assess the performance of ten representative MOGPs thoroughly on eight examples in symmetric/asymmetric scenarios by considering, e.g., different training data (heterotopic or isotopic), different training sizes (small, moderate and large), different output correlations (low or high), and different output sizes (up to four outputs). Based on the qualitative and quantitative analysis, we give some recommendations regarding the usage of MOGPs and highlight potential research directions.

...read moreread less

Journal Article•DOI•

Gaussian process-based surrogate modeling framework for process planning in laser powder-bed fusion additive manufacturing of 316L stainless steel

[...]

Gustavo Tapia¹, Saad A. Khairallah², Manyalibo J. Matthews², Wayne E. King², Alaa Elwany¹ - Show less +1 more•Institutions (2)

Texas A&M University¹, Lawrence Livermore National Laboratory²

01 Feb 2018-The International Journal of Advanced Manufacturing Technology

TL;DR: In this article, a Gaussian process-based surrogate model of the laser powder-bed-fusion (L-PBF) process is used to predict melt pool depth in single-track experiments given a laser power, scan speed, and laser beam size combination.

...read moreread less

Abstract: Laser Powder-Bed Fusion (L-PBF) metal-based additive manufacturing (AM) is complex and not fully understood Successful processing for one material, might not necessarily apply to a different material This paper describes a workflow process that aims at creating a material data sheet standard that describes regimes where the process can be expected to be robust The procedure consists of building a Gaussian process-based surrogate model of the L-PBF process that predicts melt pool depth in single-track experiments given a laser power, scan speed, and laser beam size combination The predictions are then mapped onto a power versus scan speed diagram delimiting the conduction from the keyhole melting controlled regimes This statistical framework is shown to be robust even for cases where experimental training data might be suboptimal in quality, if appropriate physics-based filters are applied Additionally, it is demonstrated that a high-fidelity simulation model of L-PBF can equally be successfully used for building a surrogate model, which is beneficial since simulations are getting more efficient and are more practical to study the response of different materials, than to re-tool an AM machine for new material powder

...read moreread less

Journal Article•DOI•

Gaussian Process Morphable Models

[...]

Marcel Lüthi¹, Thomas Gerig¹, Christoph Jud², Thomas Vetter¹•Institutions (2)

University of Basel¹, University Hospital of Basel²

01 Aug 2018-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This paper model the shape variations with a Gaussian process, which they represent using the leading components of its Karhunen-Loève expansion, and introduces a simple algorithm for fitting a GPMM to a surface or image, which results in a non-rigid registration approach whose regularization properties are defined by a G PMM.

...read moreread less

Abstract: Models of shape variations have become a central component for the automated analysis of images. An important class of shape models are point distribution models (PDMs). These models represent a class of shapes as a normal distribution of point variations, whose parameters are estimated from example shapes. Principal component analysis (PCA) is applied to obtain a low-dimensional representation of the shape variation in terms of the leading principal components. In this paper, we propose a generalization of PDMs, which we refer to as Gaussian Process Morphable Models (GPMMs). We model the shape variations with a Gaussian process, which we represent using the leading components of its Karhunen-Loeve expansion. To compute the expansion, we make use of an approximation scheme based on the Nystrom method. The resulting model can be seen as a continuous analog of a standard PDM. However, while for PDMs the shape variation is restricted to the linear span of the example data, with GPMMs we can define the shape variation using any Gaussian process. For example, we can build shape models that correspond to classical spline models and thus do not require any example data. Furthermore, Gaussian processes make it possible to combine different models. For example, a PDM can be extended with a spline model, to obtain a model that incorporates learned shape characteristics but is flexible enough to explain shapes that cannot be represented by the PDM. We introduce a simple algorithm for fitting a GPMM to a surface or image. This results in a non-rigid registration approach whose regularization properties are defined by a GPMM. We show how we can obtain different registration schemes, including methods for multi-scale or hybrid registration, by constructing an appropriate GPMM. As our approach strictly separates modeling from the fitting process, this is all achieved without changes to the fitting algorithm. To demonstrate the applicability and versatility of GPMMs, we perform a set of experiments in typical usage scenarios in medical image analysis and computer vision: The model-based segmentation of 3D forearm images and the building of a statistical model of the face. To complement the paper, we have made all our methods available as open source.

...read moreread less

Journal Article•DOI•

Neural networks vs Gaussian process regression for representing potential energy surfaces: A comparative study of fit quality and vibrational spectrum accuracy

[...]

Aditya Kamath¹, Rodrigo A. Vargas-Hernández², Roman V. Krems², Tucker Carrington³, Sergei Manzhos¹ - Show less +1 more•Institutions (3)

National University of Singapore¹, University of British Columbia², Queen's University³

15 Mar 2018-Journal of Chemical Physics

TL;DR: This paper re-fit an accurate PES of formaldehyde and compares PES errors on the entire point set used to solve the vibrational Schrödinger equation, i.e., the only error that matters in quantum dynamics calculations.

...read moreread less

Abstract: For molecules with more than three atoms, it is difficult to fit or interpolate a potential energy surface (PES) from a small number of (usually ab initio) energies at points. Many methods have been proposed in recent decades, each claiming a set of advantages. Unfortunately, there are few comparative studies. In this paper, we compare neural networks (NNs) with Gaussian process (GP) regression. We re-fit an accurate PES of formaldehyde and compare PES errors on the entire point set used to solve the vibrational Schrodinger equation, i.e., the only error that matters in quantum dynamics calculations. We also compare the vibrational spectra computed on the underlying reference PES and the NN and GP potential surfaces. The NN and GP surfaces are constructed with exactly the same points, and the corresponding spectra are computed with the same points and the same basis. The GP fitting error is lower, and the GP spectrum is more accurate. The best NN fits to 625/1250/2500 symmetry unique potential energy poin...

...read moreread less

Proceedings Article•

Deep Convolutional Networks as shallow Gaussian Processes.

[...]

Adrià Garriga-Alonso¹, Carl Edward Rasmussen¹, Laurence Aitchison²•Institutions (2)

University of Cambridge¹, University of Bristol²

16 Aug 2018

TL;DR: It is shown that the output of a (residual) convolutional neural network (CNN) with an appropriate prior over the weights and biases is a Gaussian process (GP) in the limit of infinitely many Convolutional filters, extending similar results for dense networks.

...read moreread less

Abstract: We show that the output of a (residual) convolutional neural network (CNN) with an appropriate prior over the weights and biases is a Gaussian process (GP) in the limit of infinitely many convolutional filters, extending similar results for dense networks. For a CNN, the equivalent kernel can be computed exactly and, unlike "deep kernels", has very few parameters: only the hyperparameters of the original CNN. Further, we show that this kernel has two properties that allow it to be computed efficiently; the cost of evaluating the kernel for a pair of images is similar to a single forward pass through the original CNN with only one filter per layer. The kernel equivalent to a 32-layer ResNet obtains 0.84% classification error on MNIST, a new record for GPs with a comparable number of parameters.

...read moreread less

Journal Article•DOI•

Efficient nonparametric n -body force fields from machine learning

[...]

Aldo Glielmo¹, Claudio Zeni¹, Alessandro De Vita², Alessandro De Vita¹•Institutions (2)

King's College London¹, University of Trieste²

24 May 2018-Physical Review B

TL;DR: The authors present a scheme to construct classical $n$-body force fields using Gaussian Process (GP) Regression, appropriately mapped over explicit n-body functions (M-FFs), which are as fast as classical parametrized potentials, since they avoid lengthy summations over database entries or weight parameters.

...read moreread less

Abstract: The authors present a scheme to construct classical $n$-body force fields using Gaussian Process (GP) Regression, appropriately mapped over explicit n-body functions (M-FFs). The procedure is possible, and will yield accurate forces, whenever prior knowledge allows to restrict the interactions to a finite order $n$, so that the ``universal approximator'' resolving power of standard GPs or Neural Networks is not needed. Under these conditions, the proposed construction preserves flexibility of training, systematically improvable accuracy, and a clear framework for validation of the underlying machine learning technique. Moreover, the M-FFs are as fast as classical parametrized potentials, since they avoid lengthy summations over database entries or weight parameters.

...read moreread less

Journal Article•DOI•

Comparison of permutationally invariant polynomials, neural networks, and Gaussian approximation potentials in representing water interactions through many-body expansions

[...]

Thuong T. Nguyen¹, Eszter Székely², Giulio Imbalzano³, Jörg Behler⁴, Gábor Csányi², Michele Ceriotti³, Andreas W. Götz¹, Francesco Paesani¹ - Show less +4 more•Institutions (4)

University of California, San Diego¹, University of Cambridge², École Polytechnique Fédérale de Lausanne³, University of Göttingen⁴

09 Apr 2018-Journal of Chemical Physics

TL;DR: In this article, the performance of permutationally invariant polynomials, neural networks, and Gaussian approximation potentials (GAPs) in representing water two-body and three-body interaction energies was investigated.

...read moreread less

Abstract: The accurate representation of multidimensional potential energy surfaces is a necessary requirement for realistic computer simulations of molecular systems. The continued increase in computer power accompanied by advances in correlated electronic structure methods nowadays enables routine calculations of accurate interaction energies for small systems, which can then be used as references for the development of analytical potential energy functions (PEFs) rigorously derived from many-body (MB) expansions. Building on the accuracy of the MB-pol many-body PEF, we investigate here the performance of permutationally invariant polynomials (PIPs), neural networks, and Gaussian approximation potentials (GAPs) in representing water two-body and three-body interaction energies, denoting the resulting potentials PIP-MB-pol, Behler-Parrinello neural network-MB-pol, and GAP-MB-pol, respectively. Our analysis shows that all three analytical representations exhibit similar levels of accuracy in reproducing both two-body and three-body reference data as well as interaction energies of small water clusters obtained from calculations carried out at the coupled cluster level of theory, the current gold standard for chemical accuracy. These results demonstrate the synergy between interatomic potentials formulated in terms of a many-body expansion, such as MB-pol, that are physically sound and transferable, and machine-learning techniques that provide a flexible framework to approximate the short-range interaction energy terms.

...read moreread less

Posted Content•

Neural Processes

[...]

Marta Garnelo, Jonathan Schwarz, Dan Rosenbaum, Fabio Viola, Danilo Jimenez Rezende, S. M. Ali Eslami, Yee Whye Teh - Show less +3 more

04 Jul 2018-arXiv: Learning

TL;DR: This work introduces a class of neural latent variable models which it calls Neural Processes (NPs), combining the best of both worlds: probabilistic, data-efficient and flexible, however they are also computationally intensive and thus limited in their applicability.

...read moreread less

Abstract: A neural network (NN) is a parameterised function that can be tuned via gradient descent to approximate a labelled collection of data with high precision. A Gaussian process (GP), on the other hand, is a probabilistic model that defines a distribution over possible functions, and is updated in light of data via the rules of probabilistic inference. GPs are probabilistic, data-efficient and flexible, however they are also computationally intensive and thus limited in their applicability. We introduce a class of neural latent variable models which we call Neural Processes (NPs), combining the best of both worlds. Like GPs, NPs define distributions over functions, are capable of rapid adaptation to new observations, and can estimate the uncertainty in their predictions. Like NNs, NPs are computationally efficient during training and evaluation but also learn to adapt their priors to data. We demonstrate the performance of NPs on a range of learning tasks, including regression and optimisation, and compare and contrast with related models in the literature.

...read moreread less

Journal Article•DOI•

Bayesian approach to model-based extrapolation of nuclear observables

[...]

Leo Neufcourt¹, Yuchen Cao¹, Witold Nazarewicz¹, Frederi Viens¹•Institutions (1)

Michigan State University¹

24 Sep 2018-Physical Review C

TL;DR: In this article, the authors consider the information contained in the residuals in the regions where the experimental information exists and evaluate the predictive power of global mass models towards more unstable neutron-rich nuclei and provide uncertainty quantification of predictions.

...read moreread less

Abstract: Background: The mass, or binding energy, is the basis property of the atomic nucleus. It determines its stability and reaction and decay rates. Quantifying the nuclear binding is important for understanding the origin of elements in the universe. The astrophysical processes responsible for the nucleosynthesis in stars often take place far from the valley of stability, where experimental masses are not known. In such cases, missing nuclear information must be provided by theoretical predictions using extreme extrapolations. To take full advantage of the information contained in mass model residuals, i.e., deviations between experimental and calculated masses, one can utilize Bayesian machine-learning techniques to improve predictions. Purpose: To improve the quality of model-based predictions of nuclear properties of rare isotopes far from stability, we consider the information contained in the residuals in the regions where the experimental information exist. As a case in point, we discuss two-neutron separation energies S2n of even-even nuclei. Through this observable, we assess the predictive power of global mass models towards more unstable neutron-rich nuclei and provide uncertainty quantification of predictions. Methods: We consider 10 global models based on nuclear density functional theory with realistic energy density functionals as well as two more phenomenological mass models. The emulators of S2n residuals and credibility intervals (Bayesian confidence intervals) defining theoretical error bars are constructed using Bayesian Gaussian processes and Bayesian neural networks. We consider a large training dataset pertaining to nuclei whose masses were measured before 2003. For the testing datasets, we considered those exotic nuclei whose masses have been determined after 2003. By establishing statistical methodology and parameters, we carried out extrapolations toward the 2n dripline. Results: While both Gaussian processes and Bayesian neural networks reduce the root-mean-square (rms) deviation from experiment significantly, GP offers a better and much more stable performance. The increase in the predictive power of microscopic models aided by the statistical treatment is quite astonishing: The resulting rms deviations from experiment on the testing dataset are similar to those of more phenomenological models. We found that Bayesian neural networks results are prone to instabilities caused by the large number of parameters in this method. Moreover, since the classical sigmoid activation function used in this approach has linear tails that do not vanish, it is poorly suited for a bounded extrapolation. The empirical coverage probability curves we obtain match very well the reference values, in a slightly conservative way in most cases, which is highly desirable to ensure honesty of uncertainty quantification. The estimated credibility intervals on predictions make it possible to evaluate predictive power of individual models and also make quantified predictions using groups of models. Conclusions: The proposed robust statistical approach to extrapolation of nuclear model results can be useful for assessing the impact of current and future experiments in the context of model developments. The new Bayesian capability to evaluate residuals is also expected to impact research in the domains where experiments are currently impossible, for instance, in simulations of the astrophysical r process.

...read moreread less

Proceedings Article•

Gaussian Process Behaviour in Wide Deep Neural Networks.

[...]

Alexander G. de G. Matthews¹, Mark Rowland², Jiri Hron², Richard E. Turner², Zoubin Ghahramani² - Show less +1 more•Institutions (2)

Google¹, University of Cambridge²

15 Feb 2018

TL;DR: In this article, the authors study the relationship between random, wide, fully connected, feedforward networks with more than one hidden layer and Gaussian processes with a recursive kernel definition and show that, under broad conditions, as they make the architecture increasingly wide, the implied random function converges in distribution to a Gaussian process.

...read moreread less

Proceedings Article•

Scalable Hyperparameter Transfer Learning

[...]

Valerio Perrone¹, Rodolphe Jenatton², Matthias Seeger², Cédric Archambeau²•Institutions (2)

University of Warwick¹, Amazon.com²

01 Jan 2018

TL;DR: This work proposes a multi-task adaptive Bayesian linear regression model for transfer learning in BO, whose complexity is linear in the function evaluations: one Bayesianlinear regression model is associated to each black-box function optimization problem (or task), while transfer learning is achieved by coupling the models through a shared deep neural net.

...read moreread less

Abstract: Bayesian optimization (BO) is a model-based approach for gradient-free black-box function optimization, such as hyperparameter optimization. Typically, BO relies on conventional Gaussian process (GP) regression, whose algorithmic complexity is cubic in the number of evaluations. As a result, GP-based BO cannot leverage large numbers of past function evaluations, for example, to warm-start related BO runs. We propose a multi-task adaptive Bayesian linear regression model for transfer learning in BO, whose complexity is linear in the function evaluations: one Bayesian linear regression model is associated to each black-box function optimization problem (or task), while transfer learning is achieved by coupling the models through a shared deep neural net. Experiments show that the neural net learns a representation suitable for warm-starting the black-box optimization problems and that BO runs can be accelerated when the target black-box function (e.g., validation loss) is learned together with other related signals (e.g., training loss). The proposed method was found to be at least one order of magnitude faster that methods recently published in the literature.

...read moreread less

Posted Content•

Bayesian Deep Convolutional Networks with Many Channels are Gaussian Processes

[...]

Roman Novak¹, Lechao Xiao², Jaehoon Lee¹, Yasaman Bahri¹, Greg Yang³, Jiri Hron⁴, Daniel A. Abolafia¹, Jeffrey Pennington¹, Jascha Sohl-Dickstein¹ - Show less +5 more•Institutions (4)

Google¹, University of Pennsylvania², Microsoft³, University of Cambridge⁴

11 Oct 2018-arXiv: Machine Learning

TL;DR: In this article, an equivalence between wide fully connected neural networks (FCNs) and Gaussian processes (GPs) was derived for CNNs both with and without pooling layers, and achieved state-of-the-art results on CIFAR10 for GPs without trainable kernels.

...read moreread less

Abstract: There is a previously identified equivalence between wide fully connected neural networks (FCNs) and Gaussian processes (GPs). This equivalence enables, for instance, test set predictions that would have resulted from a fully Bayesian, infinitely wide trained FCN to be computed without ever instantiating the FCN, but by instead evaluating the corresponding GP. In this work, we derive an analogous equivalence for multi-layer convolutional neural networks (CNNs) both with and without pooling layers, and achieve state of the art results on CIFAR10 for GPs without trainable kernels. We also introduce a Monte Carlo method to estimate the GP corresponding to a given neural network architecture, even in cases where the analytic form has too many terms to be computationally feasible. Surprisingly, in the absence of pooling layers, the GPs corresponding to CNNs with and without weight sharing are identical. As a consequence, translation equivariance, beneficial in finite channel CNNs trained with stochastic gradient descent (SGD), is guaranteed to play no role in the Bayesian treatment of the infinite channel limit - a qualitative difference between the two regimes that is not present in the FCN case. We confirm experimentally, that while in some scenarios the performance of SGD-trained finite CNNs approaches that of the corresponding GPs as the channel count increases, with careful tuning SGD-trained CNNs can significantly outperform their corresponding GPs, suggesting advantages from SGD training compared to fully Bayesian parameter estimation.

...read moreread less

Journal Article•DOI•

Probabilistic forecasting of electricity consumption, photovoltaic power generation and net demand of an individual building using Gaussian Processes

[...]

D.W. van der Meer¹, Mahmoud Shepero¹, Andreas Svensson¹, Joakim Widén¹, Joakim Munkhammar¹ - Show less +1 more•Institutions (1)

Uppsala University¹

01 Mar 2018-Applied Energy

TL;DR: It is shown that the dynamic GP produces sharper prediction intervals (PIs) than the static GP with significant lower computational burden, but at the cost of the ability to capture sharp peaks.

...read moreread less

Journal Article•DOI•

Machine Learning-Based Temperature Prediction for Runtime Thermal Management Across System Components

[...]

Kaicheng Zhang¹, Akhil Guliani¹, Seda Ogrenci-Memik¹, Gokhan Memik¹, Kazutomo Yoshii², Rajesh Sankaran², Pete Beckman² - Show less +3 more•Institutions (2)

Northwestern University¹, Argonne National Laboratory²

01 Feb 2018-IEEE Transactions on Parallel and Distributed Systems

TL;DR: This paper presents a framework for creating a lightweight thermal prediction system suitable for run-time management decisions, and develops alternative methods using neural network and linear regression-based methods to perform a comprehensive comparative study of prediction methods.

...read moreread less

Abstract: Elevated temperatures limit the peak performance of systems because of frequent interventions by thermal throttling. Non-uniform thermal states across system nodes also cause performance variation within seemingly equivalent nodes leading to significant degradation of overall performance. In this paper we present a framework for creating a lightweight thermal prediction system suitable for run-time management decisions. We pursue two avenues to explore optimized lightweight thermal predictors. First, we use feature selection algorithms to improve the performance of previously designed machine learning methods. Second, we develop alternative methods using neural network and linear regression-based methods to perform a comprehensive comparative study of prediction methods. We show that our optimized models achieve improved performance with better prediction accuracy and lower overhead as compared with the Gaussian process model proposed previously. Specifically we present a reduced version of the Gaussian process model, a neural network–based model, and a linear regression–based model. Using the optimization methods, we are able to reduce the average prediction errors in the Gaussian process from $4.2^\circ$ C to $2.9^\circ$ C. We also show that the newly developed models using neural network and Lasso linear regression have average prediction errors of $2.9^\circ$ C and $3.8^\circ$ C respectively. The prediction overheads are 0.22, 0.097, and 0.026 ms per prediction for reduced Gaussian process, neural network, and Lasso linear regression models, respectively, compared with 0.57 ms per prediction for the previous Gaussian process model. We have implemented our proposed thermal prediction models on a two-node system configuration to help identify the optimal task placement. The task placement identified by the models reduces the average system temperature by up to $11.9^\circ$ C without any performance degradation. Furthermore, these models respectively achieve 75, 82.5, and 74.17 percent success rates in correctly pointing to those task placements with better thermal response, compared with 72.5 percent success for the original model in achieving the same objective. Finally, we extended our analysis to a 16-node system and we were able to train models and execute them in real time to guide task migration and achieve on average 17 percent reduction in the overall system cooling power.

...read moreread less

Journal Article•DOI•

Permutation and Grouping Methods for Sharpening Gaussian Process Approximations

[...]

Joseph Guinness¹•Institutions (1)

North Carolina State University¹

18 Jun 2018-Technometrics

TL;DR: In this paper, a systematic study of how ordering affects the accuracy of Vecchia's approximation of Gaussian process parameters is presented, showing that random orderings can give dramatically sharper approximations than default coordinate-based orderings.

...read moreread less

Abstract: Vecchia’s approximate likelihood for Gaussian process parameters depends on how the observations are ordered, which has been cited as a deficiency. This article takes the alternative standpoint that the ordering can be tuned to sharpen the approximations. Indeed, the first part of the article includes a systematic study of how ordering affects the accuracy of Vecchia’s approximation. We demonstrate the surprising result that random orderings can give dramatically sharper approximations than default coordinate-based orderings. Additional ordering schemes are described and analyzed numerically, including orderings capable of improving on random orderings. The second contribution of this article is a new automatic method for grouping calculations of components of the approximation. The grouping methods simultaneously improve approximation accuracy and reduce computational burden. In common settings, reordering combined with grouping reduces Kullback–Leibler divergence from the target model by more th...

...read moreread less

Journal Article•DOI•

Unified theory for stochastic modelling of hydroclimatic processes: Preserving marginal distributions, correlation structures, and intermittency

[...]

Simon Michael Papalexiou¹•Institutions (1)

University of California, Irvine¹

01 May 2018-Advances in Water Resources

TL;DR: A single framework is proposed that unifies, extends, and improves a general-purpose modelling strategy, based on the assumption that any process can emerge by transforming a specific “parent” Gaussian process, and is augmented with flexible parametric correlation structures that parsimoniously describe observed correlations.

...read moreread less

Journal Article•DOI•

Scalable Gaussian process-based transfer surrogates for hyperparameter optimization

[...]

Martin Wistuba, Nicolas Schilling, Lars Schmidt-Thieme

01 Jan 2018-Machine Learning

TL;DR: This work proposes to learn individual surrogate models on the observations of each data set and then combine all surrogates to a joint one using ensembling techniques, and extends the framework to directly estimate the acquisition function in the same setting, using a novel technique which is name the “transfer acquisition function”.

...read moreread less

Abstract: Algorithm selection as well as hyperparameter optimization are tedious task that have to be dealt with when applying machine learning to real-world problems. Sequential model-based optimization (SMBO), based on so-called “surrogate models”, has been employed to allow for faster and more direct hyperparameter optimization. A surrogate model is a machine learning regression model which is trained on the meta-level instances in order to predict the performance of an algorithm on a specific data set given the hyperparameter settings and data set descriptors. Gaussian processes, for example, make good surrogate models as they provide probability distributions over labels. Recent work on SMBO also includes meta-data, i.e. observed hyperparameter performances on other data sets, into the process of hyperparameter optimization. This can, for example, be accomplished by learning transfer surrogate models on all available instances of meta-knowledge; however, the increasing amount of meta-information can make Gaussian processes infeasible, as they require the inversion of a large covariance matrix which grows with the number of instances. Consequently, instead of learning a joint surrogate model on all of the meta-data, we propose to learn individual surrogate models on the observations of each data set and then combine all surrogates to a joint one using ensembling techniques. The final surrogate is a weighted sum of all data set specific surrogates plus an additional surrogate that is solely learned on the target observations. Within our framework, any surrogate model can be used and explore Gaussian processes in this scenario. We present two different strategies for finding the weights used in the ensemble: the first is based on a probabilistic product of experts approach, and the second is based on kernel regression. Additionally, we extend the framework to directly estimate the acquisition function in the same setting, using a novel technique which we name the “transfer acquisition function”. In an empirical evaluation including comparisons to the current state-of-the-art on two publicly available meta-data sets, we are able to demonstrate that our proposed approach does not only scale to large meta-data, but also finds the stronger prediction models.

...read moreread less

Journal Article•DOI•

SCADA-based wind turbine anomaly detection using Gaussian process models for wind turbine condition monitoring purposes

[...]

Ravi Kumar Pandit, David Infield

01 Aug 2018-Iet Renewable Power Generation

TL;DR: In this article, a Gaussian process (a nonparametric machine learning approach) based algorithm for condition monitoring is proposed, which uses the standard IEC binned power curve together with individual bin probability distributions to identify operational anomalies.

...read moreread less

Abstract: The penetration of wind energy into power systems is steadily increasing; this highlights the importance of operations and maintenance, and specifically the role of condition monitoring. Wind turbine power curves based on supervisory control and data acquisition data provide a cost-effective approach to wind turbine health monitoring. This study proposes a Gaussian process (a non-parametric machine learning approach) based algorithm for condition monitoring. The standard IEC binned power curve together with individual bin probability distributions can be used to identify operational anomalies. The IEC approach can also be modified to create a form of real-time power curve. Both of these approaches will be compared with a Gaussian process model to assess both speed and accuracy of anomaly detection. Significant yaw misalignment, reflecting a yaw control error or fault, results in a loss of power. Such a fault is quite common and early detection is important to prevent loss of power generation. Yaw control error provides a useful case study to demonstrate the effectiveness of the proposed algorithms and allows the advantages and limitations of the proposed methods to be determined.

...read moreread less

Journal Article•DOI•

Scalable Backpropagation for Gaussian Processes using Celerite

[...]

Daniel Foreman-Mackey

19 Feb 2018

TL;DR: In this paper, the authors present a derivation and implementation of efficient and scalable gradient computations using the celerite algorithm for Gaussian Process (GP) modeling, which can be easily integrated into existing automatic differentiation frameworks to provide a scalable method for evaluating the gradients of the GP likelihood with respect to all input parameters.

...read moreread less

Abstract: This research note presents a derivation and implementation of efficient and scalable gradient computations using the celerite algorithm for Gaussian Process (GP) modeling. The algorithms are derived in a "reverse accumulation" or "backpropagation" framework and they can be easily integrated into existing automatic differentiation frameworks to provide a scalable method for evaluating the gradients of the GP likelihood with respect to all input parameters. The algorithm derived in this note uses less memory and is more efficient than versions using automatic differentiation and the computational cost scales linearly with the number of data points.

...read moreread less

Journal Article•DOI•

Wind turbine power curve modeling based on Gaussian Processes and Artificial Neural Networks

[...]

Bartolomé Manobel¹, Frank Sehnke, Juan A. Lazzús¹, Ignacio Salfate¹, Martin Felder, Sonia Montecinos¹ - Show less +2 more•Institutions (1)

University of La Serena¹

01 Sep 2018-Renewable Energy

TL;DR: This study proposes a complete method based on Gaussian Processes data pre-filtering and ANN modeling of wind turbine power curves that improves the network performance significantly, and saves substantial time and resources.

...read moreread less

Collapse