scispace - formally typeset
Search or ask a question

Showing papers on "Gaussian process published in 2016"


Journal ArticleDOI
TL;DR: In this paper, the authors show that for the most commonly used covariance functions, the matrix $C$ can be hierarchically factored into a product of block low-rank updates of the identity matrix, yielding an $\mathcal {O} (n\,\log^2, n)$ algorithm for inversion.
Abstract: A number of problems in probability and statistics can be addressed using the multivariate normal (Gaussian) distribution. In the one-dimensional case, computing the probability for a given mean and variance simply requires the evaluation of the corresponding Gaussian density. In the $n$ -dimensional setting, however, it requires the inversion of an $n \times n$ covariance matrix, $C$ , as well as the evaluation of its determinant, $\det (C)$ . In many cases, such as regression using Gaussian processes, the covariance matrix is of the form $C = \sigma ^2 I + K$ , where $K$ is computed using a specified covariance kernel which depends on the data and additional parameters (hyperparameters). The matrix $C$ is typically dense, causing standard direct methods for inversion and determinant evaluation to require $\mathcal {O}(n^3)$ work. This cost is prohibitive for large-scale modeling. Here, we show that for the most commonly used covariance functions, the matrix $C$ can be hierarchically factored into a product of block low-rank updates of the identity matrix, yielding an $\mathcal {O} (n\,\log^2\, n)$ algorithm for inversion. More importantly, we show that this factorization enables the evaluation of the determinant $\det (C)$ , permitting the direct calculation of probabilities in high dimensions under fairly broad assumptions on the kernel defining $K$ . Our fast algorithm brings many problems in marginalization and the adaptation of hyperparameters within practical reach using a single CPU core. The combination of nearly optimal scaling in terms of problem size with high-performance computing resources will permit the modeling of previously intractable problems. We illustrate the performance of the scheme on standard covariance kernels.

545 citations


Journal ArticleDOI
TL;DR: A class of highly scalable nearest-neighbor Gaussian process (NNGP) models to provide fully model-based inference for large geostatistical datasets are developed and it is established that the NNGP is a well-defined spatial process providing legitimate finite-dimensional Gaussian densities with sparse precision matrices.
Abstract: Spatial process models for analyzing geostatistical data entail computations that become prohibitive as the number of spatial locations become large. This article develops a class of highly scalable nearest-neighbor Gaussian process (NNGP) models to provide fully model-based inference for large geostatistical datasets. We establish that the NNGP is a well-defined spatial process providing legitimate finite-dimensional Gaussian densities with sparse precision matrices. We embed the NNGP as a sparsity-inducing prior within a rich hierarchical modeling framework and outline how computationally efficient Markov chain Monte Carlo (MCMC) algorithms can be executed without storing or decomposing large matrices. The floating point operations (flops) per iteration of this algorithm is linear in the number of spatial locations, thereby rendering substantial scalability. We illustrate the computational and inferential benefits of the NNGP over competing methods using simulation studies and also analyze fores...

543 citations


MonographDOI
01 Jan 2016
TL;DR: This chapter discusses nonparametric statistical models, function spaces and approximation theory, and the minimax paradigm, which aims to provide a model for adaptive inference oflihood-based procedures.
Abstract: 1. Nonparametric statistical models 2. Gaussian processes 3. Empirical processes 4. Function spaces and approximation theory 5. Linear nonparametric estimators 6. The minimax paradigm 7. Likelihood-based procedures 8. Adaptive inference.

534 citations


Proceedings Article
02 May 2016
TL;DR: In this article, the authors introduce scalable deep kernels, which combine the structural properties of deep learning architectures with the non-parametric flexibility of kernel methods, and jointly learn the properties of these kernels through the marginal likelihood of a Gaussian process.
Abstract: We introduce scalable deep kernels, which combine the structural properties of deep learning architectures with the non-parametric flexibility of kernel methods. Specifically, we transform the inputs of a spectral mixture base kernel with a deep architecture, using local kernel interpolation, inducing points, and structure exploiting (Kronecker and Toeplitz) algebra for a scalable kernel representation. These closed-form kernels can be used as drop-in replacements for standard kernels, with benefits in expressive power and scalability. We jointly learn the properties of these kernels through the marginal likelihood of a Gaussian process. Inference and learning cost O(n) for n training points, and predictions cost O(1) per test point. On a large and diverse collection of applications, including a dataset with 2 million examples, we show improved performance over scalable Gaussian processes with flexible kernel learning models, and stand-alone deep architectures.

325 citations


Book
23 Aug 2016
TL;DR: Fractional Brownian motion (fBm) as mentioned in this paper is a stochastic process which deviates significantly from Brownian Motion and semimartingales, and others classically used in probability theory.
Abstract: Fractional Brownian motion (fBm) is a stochastic process which deviates significantly from Brownian motion and semimartingales, and others classically used in probability theory. As a centered Gaussian process, it is characterized by the stationarity of its increments and a medium- or long-memory property which is in sharp contrast with martingales and Markov processes. FBm has become a popular choice for applications where classical processes cannot model these non-trivial properties; for instance long memory, which is also known as persistence, is of fundamental importance for financial data and in internet traffic. The mathematical theory of fBm is currently being developed vigorously by a number of stochastic analysts, in various directions, using complementary and sometimes competing tools. This book is concerned with several aspects of fBm, including the stochastic integration with respect to it, the study of its supremum and its appearance as limit of partial sums involving stationary sequences, to name but a few. The book is addressed to researchers and graduate students in probability and mathematical statistics. With very few exceptions (where precise references are given), every stated result is proved.

234 citations


Journal ArticleDOI
TL;DR: The main contribution is to derive the optimal control in this case which in fact is given in closed-form (Theorem 1), and in the zero-noise limit, the solution of a (deterministic) mass transport problem with general quadratic cost.
Abstract: We address the problem of steering the state of a linear stochastic system to a prescribed distribution over a finite horizon with minimum energy, and the problem to maintain the state at a stationary distribution over an infinite horizon with minimum power. For both problems the control and Gaussian noise channels are allowed to be distinct, thereby, placing the results of this paper outside of the scope of previous work both in probability and in control. The special case where the disturbance and control enter through the same channels has been addressed in the first part of this work that was presented as Part I. Herein, we present sufficient conditions for optimality in terms of a system of dynamically coupled Riccati equations in the finite horizon case and in terms of algebraic conditions for the stationary case. We then address the question of feasibility for both problems. For the finite-horizon case, provided the system is controllable, we prove that without any restriction on the directionality of the stochastic disturbance it is always possible to steer the state to any arbitrary Gaussian distribution over any specified finite time-interval. For the stationary infinite horizon case, it is not always possible to maintain the state at an arbitrary Gaussian distribution through constant state-feedback. It is shown that covariances of admissible stationary Gaussian distributions are characterized by a certain Lyapunov-like equation and, in fact, they coincide with the class of stationary state covariances that can be attained by a suitable stationary colored noise as input. We finally address the question of how to compute suitable controls numerically. We present an alternative to solving the system of coupled Riccati equations, by expressing the optimal controls in the form of solutions to (convex) semi-definite programs for both cases. We conclude with an example to steer the state covariance of the distribution of inertial particles to an admissible stationary Gaussian distribution over a finite interval, to be maintained at that stationary distribution thereafter by constant-gain state-feedback control.

229 citations


Proceedings Article
19 Jun 2016
TL;DR: A variational Bayesian neural network where the parameters are governed via a probability distribution on random matrices is introduced and "pseudo-data" (Snelson & Ghahramani, 2005) is incorporated in this model, which allows for more efficient posterior sampling while maintaining the properties of the original model.
Abstract: We introduce a variational Bayesian neural network where the parameters are governed via a probability distribution on random matrices. Specifically, we employ a matrix variate Gaussian (Gupta & Nagar, 1999) parameter posterior distribution where we explicitly model the covariance among the input and output dimensions of each layer. Furthermore, with approximate covariance matrices we can achieve a more efficient way to represent those correlations that is also cheaper than fully factorized parameter posteriors. We further show that with the "local reprarametrization trick" (Kingma et al., 2015) on this posterior distribution we arrive at a Gaussian Process (Rasmussen, 2006) interpretation of the hidden units in each layer and we, similarly with (Gal & Ghahramani, 2015), provide connections with deep Gaussian processes. We continue in taking advantage of this duality and incorporate "pseudo-data" (Snelson & Ghahramani, 2005) in our model, which in turn allows for more efficient posterior sampling while maintaining the properties of the original model. The validity of the proposed approach is verified through extensive experiments.

200 citations


Proceedings ArticleDOI
24 Jul 2016
TL;DR: Manifold Gaussian Processes is a novel supervised method that jointly learns a transformation of the data into a feature space and a GP regression from the feature space to observed space, which allows to learn data representations, which are useful for the overall regression task.
Abstract: Off-the-shelf Gaussian Process (GP) covariance functions encode smoothness assumptions on the structure of the function to be modeled. To model complex and non-differentiable functions, these smoothness assumptions are often too restrictive. One way to alleviate this limitation is to find a different representation of the data by introducing a feature space. This feature space is often learned in an unsupervised way, which might lead to data representations that are not useful for the overall regression task. In this paper, we propose Manifold Gaussian Processes, a novel supervised method that jointly learns a transformation of the data into a feature space and a GP regression from the feature space to observed space. The Manifold GP is a full GP and allows to learn data representations, which are useful for the overall regression task. As a proof-of-concept, we evaluate our approach on complex non-smooth functions where standard GPs perform poorly, such as step functions and robotics tasks with contacts.

195 citations


Proceedings ArticleDOI
19 Jun 2016
TL;DR: A new approximate Bayesian learning scheme is developed that enables DGPs to be applied to a range of medium to large scale regression problems for the first time and is almost always better than state-of-the-art deterministic and sampling-based approximate inference methods for Bayesian neural networks.
Abstract: Deep Gaussian processes (DGPs) are multilayer hierarchical generalisations of Gaussian processes (GPs) and are formally equivalent to neural networks with multiple, infinitely wide hidden layers. DGPs are nonparametric probabilistic models and as such are arguably more flexible, have a greater capacity to generalise, and provide better calibrated uncertainty estimates than alternative deep models. This paper develops a new approximate Bayesian learning scheme that enables DGPs to be applied to a range of medium to large scale regression problems for the first time. The new method uses an approximate Expectation Propagation procedure and a novel and efficient extension of the probabilistic backpropagation algorithm for learning. We evaluate the new method for non-linear regression on eleven real-world datasets, showing that it always outperforms GP regression and is almost always better than state-of-the-art deterministic and sampling-based approximate inference methods for Bayesian neural networks. As a by-product, this work provides a comprehensive analysis of six approximate Bayesian methods for training neural networks.

175 citations


Proceedings Article
01 Jan 2016
TL;DR: In this paper, a simple heuristic based on an estimate of the Lipschitz constant was proposed to capture the most important aspect of this interaction (i.e., local repulsion) at negligible computational overhead.
Abstract: The popularity of Bayesian optimization methods for efficient exploration of parameter spaces has lead to a series of papers applying Gaussian processes as surrogates in the optimization of functions. However, most proposed approaches only allow the exploration of the parameter space to occur sequentially. Often, it is desirable to simultaneously propose batches of parameter values to explore. This is particularly the case when large parallel processing facilities are available. These facilities could be computational or physical facets of the process being optimized. E.g. in biological experiments many experimental set ups allow several samples to be simultaneously processed. Batch methods, however, require modeling of the interaction between the evaluations in the batch, which can be expensive in complex scenarios. We investigate a simple heuristic based on an estimate of the Lipschitz constant that captures the most important aspect of this interaction (i.e. local repulsion) at negligible computational overhead. The resulting algorithm compares well, in running time, with much more elaborate alternatives. The approach assumes that the function of interest, $f$, is a Lipschitz continuous function. A wrap-loop around the acquisition function is used to collect batches of points of certain size minimizing the non-parallelizable computational effort. The speed-up of our method with respect to previous approaches is significant in a set of computationally expensive experiments.

167 citations


Proceedings ArticleDOI
15 Mar 2016
TL;DR: This paper considers an approach that learns the ROA from experiments on a real system, without ever leaving the true ROA and, thus, without risking safety-critical failures.
Abstract: Control theory can provide useful insights into the properties of controlled, dynamic systems. One important property of nonlinear systems is the region of attraction (ROA), a safe subset of the state space in which a given controller renders an equilibrium point asymptotically stable. The ROA is typically estimated based on a model of the system. However, since models are only an approximation of the real world, the resulting estimated safe region can contain states outside the ROA of the real system. This is not acceptable in safety-critical applications. In this paper, we consider an approach that learns the ROA from experiments on a real system, without ever leaving the true ROA and, thus, without risking safety-critical failures. Based on regularity assumptions on the model errors in terms of a Gaussian process prior, we use an underlying Lyapunov function in order to determine a region in which an equilibrium point is asymptotically stable with high probability. Moreover, we provide an algorithm to actively and safely explore the state space in order to expand the ROA estimate. We demonstrate the effectiveness of this method in simulation.

Journal ArticleDOI
TL;DR: In this article, an online optimization process based on machine learning is applied to the production of Bose-Einstein condensates (BEC), which is typically created with an exponential evaporation ramp that is optimal for ergodic dynamics with two-body s-wave interactions and no other loss rates, but likely suboptimal for real experiments.
Abstract: We apply an online optimization process based on machine learning to the production of Bose-Einstein condensates (BEC). BEC is typically created with an exponential evaporation ramp that is optimal for ergodic dynamics with two-body s-wave interactions and no other loss rates, but likely sub-optimal for real experiments. Through repeated machine-controlled scientific experimentation and observations our ‘learner’ discovers an optimal evaporation ramp for BEC production. In contrast to previous work, our learner uses a Gaussian process to develop a statistical model of the relationship between the parameters it controls and the quality of the BEC produced. We demonstrate that the Gaussian process machine learner is able to discover a ramp that produces high quality BECs in 10 times fewer iterations than a previously used online optimization technique. Furthermore, we show the internal model developed can be used to determine which parameters are essential in BEC creation and which are unimportant, providing insight into the optimization process of the system.

Journal ArticleDOI
TL;DR: This work discusses an implementation of local approximate Gaussian process models, in the laGP package for R, that offers a particular sparse-matrix remedy uniquely positioned to leverage modern parallel computing architectures.
Abstract: Gaussian process (GP) regression models make for powerful predictors in out of sample exercises, but cubic runtimes for dense matrix decompositions severely limit the size of data - training and testing - on which they can be deployed. That means that in computer experiment, spatial/geo-physical, and machine learning contexts, GPs no longer enjoy privileged status as data sets continue to balloon in size. We discuss an implementation of local approximate Gaussian process models, in the laGP package for R, that offers a particular sparse-matrix remedy uniquely positioned to leverage modern parallel computing architectures. The laGP approach can be seen as an update on the spatial statistical method of local kriging neighborhoods. We briefly review the method, and provide extensive illustrations of the features in the package through worked-code examples. The appendix covers custom building options for symmetric multi-processor and graphical processing units, and built-in wrapper routines that automate distribution over a simple network of workstations.

Journal ArticleDOI
TL;DR: In this article, a Gaussian process-based predictive model was developed to predict porosity of metal parts produced using a selective laser melting (SLM) additive manufacturing (AM) process.
Abstract: Additive manufacturing (AM) is a set of emerging technologies that can produce physical objects with complex geometrical shapes directly from a digital model. With many unique capabilities, such as design freedom, it has recently gained increasing attention from researchers, practitioners, and public media. However, achieving the full potential of AM is hampered by many challenges, including the lack of predictive models that correlate processing parameters with the properties of the processed part. We develop a Gaussian process-based predictive model for the learning and prediction of the porosity in metallic parts produced using selective laser melting (SLM – a laser-based AM process). More specifically, a spatial Gaussian process regression model is first developed to model part porosity as a function of SLM process parameters. Next, a Bayesian inference framework is used to estimate the statistical model parameters, and the porosity of the part at any given setting is predicted using the Kriging method. A case study is conducted to validate this predictive framework through predicting the porosity of 17-4 PH stainless steel manufacturing on a ProX 100 selective laser melting system.

Posted Content
TL;DR: An efficient form of stochastic variational inference is derived which leverages local kernel interpolation, inducing points, and structure exploiting algebra within this framework to enable classification, multi-task learning, additive covariance structures, and Stochastic gradient training.
Abstract: Deep kernel learning combines the non-parametric flexibility of kernel methods with the inductive biases of deep learning architectures. We propose a novel deep kernel learning model and stochastic variational inference procedure which generalizes deep kernel learning approaches to enable classification, multi-task learning, additive covariance structures, and stochastic gradient training. Specifically, we apply additive base kernels to subsets of output features from deep neural architectures, and jointly learn the parameters of the base kernels and deep network through a Gaussian process marginal likelihood objective. Within this framework, we derive an efficient form of stochastic variational inference which leverages local kernel interpolation, inducing points, and structure exploiting algebra. We show improved performance over stand alone deep networks, SVMs, and state of the art scalable Gaussian processes on several classification benchmarks, including an airline delay dataset containing 6 million training points, CIFAR, and ImageNet.

Journal ArticleDOI
TL;DR: A Bayesian method for training GP-LVMs by introducing a non-standard variational inference framework that allows to approximately integrate out the latent variables and subsequently train a GP-LVM by maximising an analytic lower bound on the exact marginal likelihood.
Abstract: The Gaussian process latent variable model (GP-LVM) provides a flexible approach for non-linear dimensionality reduction that has been widely applied. However, the current approach for training GP-LVMs is based on maximum likelihood, where the latent projection variables are maximised over rather than integrated out. In this paper we present a Bayesian method for training GP-LVMs by introducing a non-standard variational inference framework that allows to approximately integrate out the latent variables and subsequently train a GP-LVM by maximising an analytic lower bound on the exact marginal likelihood. We apply this method for learning a GP-LVM from i.i.d. observations and for learning non-linear dynamical systems where the observations are temporally correlated. We show that a benefit of the variational Bayesian procedure is its robustness to overfitting and its ability to automatically select the dimensionality of the non-linear latent space. The resulting framework is generic, flexible and easy to extend for other purposes, such as Gaussian process regression with uncertain or partially missing inputs. We demonstrate our method on synthetic data and standard machine learning benchmarks, as well as challenging real world datasets, including high resolution video data.

Journal ArticleDOI
TL;DR: A probabilistic version of AS which is gradient-free and robust to observational noise is developed which is able to discover the same AS as the classical approach in a challenging one-hundred-dimensional problem involving an elliptic stochastic partial differential equation with random conductivity.

Book ChapterDOI
TL;DR: The performance of the two techniques is compared on three well-known analytical benchmarks (Ishigami, G-Sobol and Morris functions) as well as on a realistic engineering application (deflection of a truss structure).
Abstract: Global sensitivity analysis is now established as a powerful approach for determining the key random input parameters that drive the uncertainty of model output predictions. Yet the classical computation of the so-called Sobol’ indices is based on Monte Carlo simulation, which is not af- fordable when computationally expensive models are used, as it is the case in most applications in engineering and applied sciences. In this respect metamodels such as polynomial chaos expansions (PCE) and Gaussian processes (GP) have received tremendous attention in the last few years, as they allow one to replace the original, taxing model by a surrogate which is built from an experimental design of limited size. Then the surrogate can be used to compute the sensitivity indices in negligible time. In this chapter an introduction to each technique is given, with an emphasis on their strengths and limitations in the context of global sensitivity analysis. In particular, Sobol’ (resp. total Sobol’) indices can be computed analytically from the PCE coefficients. In contrast, confidence intervals on sensitivity indices can be derived straightforwardly from the properties of GPs. The performance of the two techniques is finally compared on three well-known analytical benchmarks (Ishigami, G-Sobol and Morris functions) as well as on a realistic engineering application (deflection of a truss structure).

Posted Content
TL;DR: The Gaussian Process Morphable Models (GPMM) as discussed by the authors model the shape variations with a Gaussian process, which they represent using the leading components of its Karhunen-Loeve expansion.
Abstract: Statistical shape models (SSMs) represent a class of shapes as a normal distribution of point variations, whose parameters are estimated from example shapes. Principal component analysis (PCA) is applied to obtain a low-dimensional representation of the shape variation in terms of the leading principal components. In this paper, we propose a generalization of SSMs, called Gaussian Process Morphable Models (GPMMs). We model the shape variations with a Gaussian process, which we represent using the leading components of its Karhunen-Loeve expansion. To compute the expansion, we make use of an approximation scheme based on the Nystrom method. The resulting model can be seen as a continuous analogon of an SSM. However, while for SSMs the shape variation is restricted to the span of the example data, with GPMMs we can define the shape variation using any Gaussian process. For example, we can build shape models that correspond to classical spline models, and thus do not require any example data. Furthermore, Gaussian processes make it possible to combine different models. For example, an SSM can be extended with a spline model, to obtain a model that incorporates learned shape characteristics, but is flexible enough to explain shapes that cannot be represented by the SSM. We introduce a simple algorithm for fitting a GPMM to a surface or image. This results in a non-rigid registration approach, whose regularization properties are defined by a GPMM. We show how we can obtain different registration schemes,including methods for multi-scale, spatially-varying or hybrid registration, by constructing an appropriate GPMM. As our approach strictly separates modelling from the fitting process, this is all achieved without changes to the fitting algorithm. We show the applicability and versatility of GPMMs on a clinical use case, where the goal is the model-based segmentation of 3D forearm images.

Proceedings ArticleDOI
16 May 2016
TL;DR: This paper develops the Gaussian Process Motion Planner (GPMP), a gradient-based optimization technique that optimizes continuous-time trajectories with respect to a cost functional by exploiting GP interpolation.
Abstract: Motion planning is a fundamental tool in robotics, used to generate collision-free, smooth, trajectories, while satisfying task-dependent constraints. In this paper, we present a novel approach to motion planning using Gaussian processes. In contrast to most existing trajectory optimization algorithms, which rely on a discrete state parameterization in practice, we represent the continuous-time trajectory as a sample from a Gaussian process (GP) generated by a linear time-varying stochastic differential equation. We then provide a gradient-based optimization technique that optimizes continuous-time trajectories with respect to a cost functional. By exploiting GP interpolation, we develop the Gaussian Process Motion Planner (GPMP), that finds optimal trajectories parameterized by a small number of states. We benchmark our algorithm against recent trajectory optimization algorithms by solving 7-DOF robotic arm planning problems in simulation and validate our approach on a real 7-DOF WAM arm.

Posted Content
TL;DR: GPflow is a Gaussian process library that uses TensorFlow for its core computations and Python for its front end and has a particular emphasis on software testing and is able to exploit GPU hardware.
Abstract: GPflow is a Gaussian process library that uses TensorFlow for its core computations and Python for its front end. The distinguishing features of GPflow are that it uses variational inference as the primary approximation method, provides concise code through the use of automatic differentiation, has been engineered with a particular emphasis on software testing and is able to exploit GPU hardware.

Proceedings ArticleDOI
01 Jan 2016
TL;DR: A substantial generalization of the literature on variational framework for learning inducing variables is given and a new proof of the result for infinite index sets is given which allows inducing points that are not data points and likelihoods that depend on all function values.
Abstract: The variational framework for learning inducing variables (Titsias, 2009a) has had a large impact on the Gaussian process literature. The framework may be interpreted as minimizing a rigorously defined Kullback-Leibler divergence between the approximating and posterior processes. To our knowledge this connection has thus far gone unremarked in the literature. In this paper we give a substantial generalization of the literature on this topic. We give a new proof of the result for infinite index sets which allows inducing points that are not data points and likelihoods that depend on all function values. We then discuss augmented index sets and show that, contrary to previous works, marginal consistency of augmentation is not enough to guarantee consistency of variational inference with the original model. We then characterize an extra condition where such a guarantee is obtainable. Finally we show how our framework sheds light on interdomain sparse approximations and sparse approximations for Cox processes.

Proceedings ArticleDOI
16 May 2016
TL;DR: An automatic controller tuning framework based on linear optimal control combined with Bayesian optimization that shall yield improved controllers with fewer evaluations compared to alternative approaches is proposed.
Abstract: This paper proposes an automatic controller tuning framework based on linear optimal control combined with Bayesian optimization. With this framework, an initial set of controller gains is automatically improved according to a pre-defined performance objective evaluated from experimental data. The underlying Bayesian optimization algorithm is Entropy Search, which represents the latent objective as a Gaussian process and constructs an explicit belief over the location of the objective minimum. This is used to maximize the information gain from each experimental evaluation. Thus, this framework shall yield improved controllers with fewer evaluations compared to alternative approaches. A seven-degree-of-freedom robot arm balancing an inverted pole is used as the experimental demonstrator. Results of two- and four-dimensional tuning problems highlight the method's potential for automatic controller tuning on robotic platforms.

Proceedings ArticleDOI
18 Jun 2016
TL;DR: This work considers smooth continuous-time trajectories as samples from a Gaussian process and formulate the planning problem as probabilistic inference and uses factor graphs and numerical optimization to perform inference quickly, and shows how GP interpolation can further increase the speed of the algorithm.
Abstract: With the increased use of high degree-of-freedom robots that must perform tasks in real-time, there is a need for fast algorithms for motion planning. In this work, we view motion planning from a probabilistic perspective. We consider smooth continuous-time trajectories as samples from a Gaussian process (GP) and formulate the planning problem as probabilistic inference. We use factor graphs and numerical optimization to perform inference quickly, and we show how GP interpolation can further increase the speed of the algorithm. Our framework also allows us to incrementally update the solution of the planning problem to contend with changing conditions. We benchmark our algorithm against several recent trajectory optimization algorithms on planning problems in multiple environments. Our evaluation reveals that our approach is several times faster than previous algorithms while retaining robustness. Finally, we demonstrate the incremental version of our algorithm on replanning problems, and show that it often can find successful solutions in a fraction of the time required to replan from scratch.

Journal ArticleDOI
16 Jun 2016-Nature
TL;DR: An analytical approach is introduced to calculate the mean first-passage time of a Gaussian non-Markovian random walker to a target in the limit of a large confining volume, and reveals, on the basis of Gaussian processes, the importance of memory effects in first-Passage statistics of non- Markovianrandom walkers in confinement.
Abstract: The first-passage time, defined as the time a random walker takes to reach a target point in a confining domain, is a key quantity in the theory of stochastic processes. Its importance comes from its crucial role in quantifying the efficiency of processes as varied as diffusion-limited reactions, target search processes or the spread of diseases. Most methods of determining the properties of first-passage time in confined domains have been limited to Markovian (memoryless) processes. However, as soon as the random walker interacts with its environment, memory effects cannot be neglected: that is, the future motion of the random walker does not depend only on its current position, but also on its past trajectory. Examples of non-Markovian dynamics include single-file diffusion in narrow channels, or the motion of a tracer particle either attached to a polymeric chain or diffusing in simple or complex fluids such as nematics, dense soft colloids or viscoelastic solutions. Here we introduce an analytical approach to calculate, in the limit of a large confining volume, the mean first-passage time of a Gaussian non-Markovian random walker to a target. The non-Markovian features of the dynamics are encompassed by determining the statistical properties of the fictitious trajectory that the random walker would follow after the first-passage event takes place, which are shown to govern the first-passage time kinetics. This analysis is applicable to a broad range of stochastic processes, which may be correlated at long times. Our theoretical predictions are confirmed by numerical simulations for several examples of non-Markovian processes, including the case of fractional Brownian motion in one and higher dimensions. These results reveal, on the basis of Gaussian processes, the importance of memory effects in first-passage statistics of non-Markovian random walkers in confinement.

Journal ArticleDOI
TL;DR: A novel class of scalable Dynamic Nearest Neighbor Gaussian Process (DNNGP) models that can provide a sparse approximation to any spatio-temporal GP, and is used to analyze a massive air quality dataset to substantially improve predictions of PM levels across Europe in conjunction with the LOTOS-EUROS chemistry transport models.
Abstract: Particulate matter (PM) is a class of malicious environmental pollutants known to be detrimental to human health. Regulatory efforts aimed at curbing PM levels in different countries often require high resolution space–time maps that can identify red-flag regions exceeding statutory concentration limits. Continuous spatio-temporal Gaussian Process (GP) models can deliver maps depicting predicted PM levels and quantify predictive uncertainty. However, GP-based approaches are usually thwarted by computational challenges posed by large datasets. We construct a novel class of scalable Dynamic Nearest Neighbor Gaussian Process (DNNGP) models that can provide a sparse approximation to any spatio-temporal GP (e.g., with nonseparable covariance structures). The DNNGP we develop here can be used as a sparsity-inducing prior for spatio-temporal random effects in any Bayesian hierarchical model to deliver full posterior inference. Storage and memory requirements for a DNNGP model are linear in the size of the dataset, thereby delivering massive scalability without sacrificing inferential richness. Extensive numerical studies reveal that the DNNGP provides substantially superior approximations to the underlying process than low-rank approximations. Finally, we use the DNNGP to analyze a massive air quality dataset to substantially improve predictions of PM levels across Europe in conjunction with the LOTOS-EUROS chemistry transport models (CTMs). © Institute of Mathematical Statistics, 2016.

Posted Content
TL;DR: In this paper, an approximate Expectation Propagation procedure and a novel and efficient extension of the probabilistic backpropagation algorithm for learning is proposed. But the method is not suitable for large-scale regression problems.
Abstract: Deep Gaussian processes (DGPs) are multi-layer hierarchical generalisations of Gaussian processes (GPs) and are formally equivalent to neural networks with multiple, infinitely wide hidden layers. DGPs are nonparametric probabilistic models and as such are arguably more flexible, have a greater capacity to generalise, and provide better calibrated uncertainty estimates than alternative deep models. This paper develops a new approximate Bayesian learning scheme that enables DGPs to be applied to a range of medium to large scale regression problems for the first time. The new method uses an approximate Expectation Propagation procedure and a novel and efficient extension of the probabilistic backpropagation algorithm for learning. We evaluate the new method for non-linear regression on eleven real-world datasets, showing that it always outperforms GP regression and is almost always better than state-of-the-art deterministic and sampling-based approximate inference methods for Bayesian neural networks. As a by-product, this work provides a comprehensive analysis of six approximate Bayesian methods for training neural networks.

Proceedings Article
01 Jan 2016
TL;DR: This work develops \mfgpucb, a novel method based on upper confidence bound techniques that outperforms such naive strategies and other multi-fidelity methods on several synthetic and real experiments and achieves better regret than strategies which ignore multi- fidelity information.
Abstract: In many scientific and engineering applications, we are tasked with the optimisation of an expensive to evaluate black box function $\func$. Traditional methods for this problem assume just the availability of this single function. However, in many cases, cheap approximations to $\func$ may be obtainable. For example, the expensive real world behaviour of a robot can be approximated by a cheap computer simulation. We can use these approximations to eliminate low function value regions cheaply and use the expensive evaluations of $\func$ in a small but promising region and speedily identify the optimum. We formalise this task as a \emph{multi-fidelity} bandit problem where the target function and its approximations are sampled from a Gaussian process. We develop \mfgpucb, a novel method based on upper confidence bound techniques. In our theoretical analysis we demonstrate that it exhibits precisely the above behaviour, and achieves better regret than strategies which ignore multi-fidelity information. \mfgpucbs outperforms such naive strategies and other multi-fidelity methods on several synthetic and real experiments.

Journal ArticleDOI
TL;DR: The ergodicity of the approximate Markov chain is proved, showing that it samples asymptotically from the exact posterior distribution of interest, and variations of the algorithm that employ either local polynomial approximations or local Gaussian process regressors are described.
Abstract: We construct a new framework for accelerating Markov chain Monte Carlo in posterior sampling problems where standard methods are limited by the computational cost of the likelihood, or of numerical models embedded therein. Our approach introduces local approximations of these models into the Metropolis–Hastings kernel, borrowing ideas from deterministic approximation theory, optimization, and experimental design. Previous efforts at integrating approximate models into inference typically sacrifice either the sampler’s exactness or efficiency; our work seeks to address these limitations by exploiting useful convergence characteristics of local approximations. We prove the ergodicity of our approximate Markov chain, showing that it samples asymptotically from the exact posterior distribution of interest. We describe variations of the algorithm that employ either local polynomial approximations or local Gaussian process regressors. Our theoretical results reinforce the key observation underlying this...

Book ChapterDOI
01 Jan 2016
TL;DR: Bayesian optimization as discussed by the authors is a technique developed for optimizing time-consuming engineering simulations and for fitting machine learning models on large datasets and guides the choice of experiments during materials design and discovery to find good material designs in as few experiments as possible.
Abstract: WeintroduceBayesian optimization, a technique developed for optimizing time-consuming engineering simulations and for fitting machine learning models on large datasets. Bayesian optimization guides the choice of experiments during materials design and discovery to find good material designs in as few experiments as possible. We focus on the case when materials designs are parameterized by a low-dimensional vector. Bayesian optimization is built on a statistical technique called Gaussian process regression, which allows predicting the performance of a new design based on previously tested designs. After providing a detailed introduction to Gaussian process regression, we describe two Bayesian optimization methods: expected improvement, for design problems with noise-free evaluations; and the knowledge-gradient method, which generalizes expected improvement and may be used in design problems with noisy evaluations. Both methods are derived using a value-of-information analysis, and enjoy one-step Bayes-optimality.