scispace - formally typeset
Search or ask a question

Showing papers on "Gaussian process published in 2015"


Posted Content
TL;DR: In this article, the authors explore the use of neural networks as an alternative to GPs to model distributions over functions, and show that performing adaptive basis function regression with a neural network as the parametric form performs competitively with state-of-the-art GP-based approaches, but scales linearly with the number of data rather than cubically.
Abstract: Bayesian optimization is an effective methodology for the global optimization of functions with expensive evaluations. It relies on querying a distribution over functions defined by a relatively cheap surrogate model. An accurate model for this distribution over functions is critical to the effectiveness of the approach, and is typically fit using Gaussian processes (GPs). However, since GPs scale cubically with the number of observations, it has been challenging to handle objectives whose optimization requires many evaluations, and as such, massively parallelizing the optimization. In this work, we explore the use of neural networks as an alternative to GPs to model distributions over functions. We show that performing adaptive basis function regression with a neural network as the parametric form performs competitively with state-of-the-art GP-based approaches, but scales linearly with the number of data rather than cubically. This allows us to achieve a previously intractable degree of parallelism, which we apply to large scale hyperparameter optimization, rapidly finding competitive models on benchmark object recognition tasks using convolutional networks, and image caption generation using neural language models.

524 citations


Proceedings Article
06 Jul 2015
TL;DR: This work shows that performing adaptive basis function regression with a neural network as the parametric form performs competitively with state-of-the-art GP-based approaches, but scales linearly with the number of data rather than cubically, which allows for a previously intractable degree of parallelism.
Abstract: Bayesian optimization is an effective methodology for the global optimization of functions with expensive evaluations. It relies on querying a distribution over functions defined by a relatively cheap surrogate model. An accurate model for this distribution over functions is critical to the effectiveness of the approach, and is typically fit using Gaussian processes (GPs). However, since GPs scale cubically with the number of observations, it has been challenging to handle objectives whose optimization requires many evaluations, and as such, massively parallelizing the optimization. In this work, we explore the use of neural networks as an alternative to GPs to model distributions over functions. We show that performing adaptive basis function regression with a neural network as the parametric form performs competitively with state-of-the-art GP-based approaches, but scales linearly with the number of data rather than cubically. This allows us to achieve a previously intractable degree of parallelism, which we apply to large scale hyperparameter optimization, rapidly finding competitive models on benchmark object recognition tasks using convolutional networks, and image caption generation using neural language models.

503 citations


Proceedings Article
21 Feb 2015
TL;DR: This work shows how to scale the model within a variational inducing point framework, outperforming the state of the art on benchmark datasets, and can be exploited to allow classification in problems with millions of data points.
Abstract: Gaussian process classification is a popular method with a number of appealing properties. We show how to scale the model within a variational inducing point framework, outperforming the state of the art on benchmark datasets. Importantly, the variational formulation can be exploited to allow classification in problems with millions of data points, as we demonstrate in experiments.

489 citations


Journal ArticleDOI
TL;DR: The Gaussian approximation potentials (GAP) framework is described, a variety of descriptors are discussed, how to train the model on total energies and derivatives, and the simultaneous use of multiple models of different complexity are discussed.
Abstract: We present a swift walk-through of our recent work that uses machine learning to fit interatomic potentials based on quantum mechanical data. We describe our Gaussian approximation potentials (GAP) framework, discuss a variety of descriptors, how to train the model on total energies and derivatives, and the simultaneous use of multiple models of different complexity. We also show a small example using QUIP, the software sandbox implementation of GAP that is available for noncommercial use. © 2015 Wiley Periodicals, Inc.

470 citations


Posted Content
TL;DR: A new structured kernel interpolation (SKI) framework is introduced, which generalises and unifies inducing point methods for scalable Gaussian processes (GPs) and naturally enables Kronecker and Toeplitz algebra for substantial additional gains in scalability.
Abstract: We introduce a new structured kernel interpolation (SKI) framework, which generalises and unifies inducing point methods for scalable Gaussian processes (GPs). SKI methods produce kernel approximations for fast computations through kernel interpolation. The SKI framework clarifies how the quality of an inducing point approach depends on the number of inducing (aka interpolation) points, interpolation strategy, and GP covariance kernel. SKI also provides a mechanism to create new scalable kernel methods, through choosing different kernel interpolation strategies. Using SKI, with local cubic kernel interpolation, we introduce KISS-GP, which is 1) more scalable than inducing point alternatives, 2) naturally enables Kronecker and Toeplitz algebra for substantial additional gains in scalability, without requiring any grid data, and 3) can be used for fast and expressive kernel learning. KISS-GP costs O(n) time and storage for GP inference. We evaluate KISS-GP for kernel matrix approximation, kernel learning, and natural sound modelling.

358 citations


Journal ArticleDOI
TL;DR: A family of local sequential design schemes that dynamically define the support of a Gaussian process predictor based on a local subset of the data are derived, enabling a global predictor able to take advantage of modern multicore architectures.
Abstract: We provide a new approach to approximate emulation of large computer experiments. By focusing expressly on desirable properties of the predictive equations, we derive a family of local sequential design schemes that dynamically define the support of a Gaussian process predictor based on a local subset of the data. We further derive expressions for fast sequential updating of all needed quantities as the local designs are built up iteratively. Then we show how independent application of our local design strategy across the elements of a vast predictive grid facilitates a trivially parallel implementation. The end result is a global predictor able to take advantage of modern multicore architectures, providing a nonstationary modeling feature as a bonus. We demonstrate our method on two examples using designs with thousands of data points, and compare to the method of compactly supported covariances. Supplementary materials for this article are available online.

358 citations


Journal ArticleDOI
TL;DR: In this article, a measure of stability for stable Gaussian processes using their spectral properties is introduced, which provides insight into the effect of dependence on the accuracy of the regularized estimates.
Abstract: Many scientific and economic problems involve the analysis of high-dimensional time series datasets. However, theoretical studies in high-dimensional statistics to date rely primarily on the assumption of independent and identically distributed (i.i.d.) samples. In this work, we focus on stable Gaussian processes and investigate the theoretical properties of $\ell_{1}$-regularized estimates in two important statistical problems in the context of high-dimensional time series: (a) stochastic regression with serially correlated errors and (b) transition matrix estimation in vector autoregressive (VAR) models. We derive nonasymptotic upper bounds on the estimation errors of the regularized estimates and establish that consistent estimation under high-dimensional scaling is possible via $\ell_{1}$-regularization for a large class of stable processes under sparsity constraints. A key technical contribution of the work is to introduce a measure of stability for stationary processes using their spectral properties that provides insight into the effect of dependence on the accuracy of the regularized estimates. With this proposed stability measure, we establish some useful deviation bounds for dependent data, which can be used to study several important regularized estimates in a time series setting.

346 citations


Journal ArticleDOI
TL;DR: A multiresolution model to predict two-dimensional spatial fields based on irregularly spaced observations that gives a good approximation to standard covariance functions such as the Matérn and also has flexibility to fit more complicated shapes.
Abstract: We develop a multiresolution model to predict two-dimensional spatial fields based on irregularly spaced observations. The radial basis functions at each level of resolution are constructed using a Wendland compactly supported correlation function with the nodes arranged on a rectangular grid. The grid at each finer level increases by a factor of two and the basis functions are scaled to have a constant overlap. The coefficients associated with the basis functions at each level of resolution are distributed according to a Gaussian Markov random field (GMRF) and take advantage of the fact that the basis is organized as a lattice. Several numerical examples and analytical results establish that this scheme gives a good approximation to standard covariance functions such as the Matern and also has flexibility to fit more complicated shapes. The other important feature of this model is that it can be applied to statistical inference for large spatial datasets because key matrices in the computations are spars...

331 citations


Proceedings ArticleDOI
07 Jun 2015
TL;DR: This work is using the Fisher Vector as a sentence representation by pooling the word2vec embedding of each word in the sentence by using the new Fisher Vectors derived from HGLMMs to represent sentences.
Abstract: In recent years, the problem of associating a sentence with an image has gained a lot of attention. This work continues to push the envelope and makes further progress in the performance of image annotation and image search by a sentence tasks. In this work, we are using the Fisher Vector as a sentence representation by pooling the word2vec embedding of each word in the sentence. The Fisher Vector is typically taken as the gradients of the log-likelihood of descriptors, with respect to the parameters of a Gaussian Mixture Model (GMM). In this work we present two other Mixture Models and derive their Expectation-Maximization and Fisher Vector expressions. The first is a Laplacian Mixture Model (LMM), which is based on the Laplacian distribution. The second Mixture Model presented is a Hybrid Gaussian-Laplacian Mixture Model (HGLMM) which is based on a weighted geometric mean of the Gaussian and Laplacian distribution. Finally, by using the new Fisher Vectors derived from HGLMMs to represent sentences, we achieve state-of-the-art results for both the image annotation and the image search by a sentence tasks on four benchmarks: Pascal1K, Flickr8K, Flickr30K, and COCO.

326 citations


Posted Content
TL;DR: In this paper, the authors introduce scalable deep kernels, which combine the structural properties of deep learning architectures with the non-parametric flexibility of kernel methods, and jointly learn the properties of these kernels through the marginal likelihood of a Gaussian process.
Abstract: We introduce scalable deep kernels, which combine the structural properties of deep learning architectures with the non-parametric flexibility of kernel methods. Specifically, we transform the inputs of a spectral mixture base kernel with a deep architecture, using local kernel interpolation, inducing points, and structure exploiting (Kronecker and Toeplitz) algebra for a scalable kernel representation. These closed-form kernels can be used as drop-in replacements for standard kernels, with benefits in expressive power and scalability. We jointly learn the properties of these kernels through the marginal likelihood of a Gaussian process. Inference and learning cost $O(n)$ for $n$ training points, and predictions cost $O(1)$ per test point. On a large and diverse collection of applications, including a dataset with 2 million examples, we show improved performance over scalable Gaussian processes with flexible kernel learning models, and stand-alone deep architectures.

288 citations


Posted Content
TL;DR: The robust Bayesian Committee Machine is introduced, a practical and scalable product-of-experts model for large-scale distributed GP regression and can be used on heterogeneous computing infrastructures, ranging from laptops to clusters.
Abstract: To scale Gaussian processes (GPs) to large data sets we introduce the robust Bayesian Committee Machine (rBCM), a practical and scalable product-of-experts model for large-scale distributed GP regression. Unlike state-of-the-art sparse GP approximations, the rBCM is conceptually simple and does not rely on inducing or variational parameters. The key idea is to recursively distribute computations to independent computational units and, subsequently, recombine them to form an overall result. Efficient closed-form inference allows for straightforward parallelisation and distributed computations with a small memory footprint. The rBCM is independent of the computational graph and can be used on heterogeneous computing infrastructures, ranging from laptops to clusters. With sufficient computing resources our distributed GP model can handle arbitrarily large data sets.

Journal ArticleDOI
TL;DR: This paper proposes a new model-based method for representing and searching nondominated solutions that is able to alleviate the requirement on solution diversity and in principle, as many solutions as needed can be generated.
Abstract: To approximate the Pareto front, most existing multiobjective evolutionary algorithms store the nondominated solutions found so far in the population or in an external archive during the search. Such algorithms often require a high degree of diversity of the stored solutions and only a limited number of solutions can be achieved. By contrast, model-based algorithms can alleviate the requirement on solution diversity and in principle, as many solutions as needed can be generated. This paper proposes a new model-based method for representing and searching nondominated solutions. The main idea is to construct Gaussian process-based inverse models that map all found nondominated solutions from the objective space to the decision space. These inverse models are then used to create offspring by sampling the objective space. To facilitate inverse modeling, the multivariate inverse function is decomposed into a group of univariate functions, where the number of inverse models is reduced using a random grouping technique. Extensive empirical simulations demonstrate that the proposed algorithm exhibits robust search performance on a variety of medium to high dimensional multiobjective optimization test problems. Additional nondominated solutions are generated a posteriori using the constructed models to increase the density of solutions in the preferred regions at a low computational cost.

Journal ArticleDOI
TL;DR: A novel way to represent and make predictions about diffusion MRI data is described, based on a Gaussian process on one or several spheres similar to the Geostatistical method of “Kriging”.

Posted Content
TL;DR: A simple heuristic based on an estimate of the Lipschitz constant is investigated that captures the most important aspect of this interaction at negligible computational overhead and compares well, in running time, with much more elaborate alternatives.
Abstract: The popularity of Bayesian optimization methods for efficient exploration of parameter spaces has lead to a series of papers applying Gaussian processes as surrogates in the optimization of functions. However, most proposed approaches only allow the exploration of the parameter space to occur sequentially. Often, it is desirable to simultaneously propose batches of parameter values to explore. This is particularly the case when large parallel processing facilities are available. These facilities could be computational or physical facets of the process being optimized. E.g. in biological experiments many experimental set ups allow several samples to be simultaneously processed. Batch methods, however, require modeling of the interaction between the evaluations in the batch, which can be expensive in complex scenarios. We investigate a simple heuristic based on an estimate of the Lipschitz constant that captures the most important aspect of this interaction (i.e. local repulsion) at negligible computational overhead. The resulting algorithm compares well, in running time, with much more elaborate alternatives. The approach assumes that the function of interest, $f$, is a Lipschitz continuous function. A wrap-loop around the acquisition function is used to collect batches of points of certain size minimizing the non-parallelizable computational effort. The speed-up of our method with respect to previous approaches is significant in a set of computationally expensive experiments.

Journal ArticleDOI
TL;DR: A discriminative shared Gaussian process latent variable model (DS-GPLVM) for multiview and view-invariant classification of facial expressions from multiple views is proposed and validated.
Abstract: Images of facial expressions are often captured from various views as a result of either head movements or variable camera position. Existing methods for multiview and/or view-invariant facial expression recognition typically perform classification of the observed expression using either classifiers learned separately for each view or a single classifier learned for all views. However, these approaches ignore the fact that different views of a facial expression are just different manifestations of the same facial expression. By accounting for this redundancy, we can design more effective classifiers for the target task. To this end, we propose a discriminative shared Gaussian process latent variable model (DS-GPLVM) for multiview and view-invariant classification of facial expressions from multiple views. In this model, we first learn a discriminative manifold shared by multiple views of a facial expression. Subsequently, we perform facial expression classification in the expression manifold. Finally, classification of an observed facial expression is carried out either in the view-invariant manner (using only a single view of the expression) or in the multiview manner (using multiple views of the expression). The proposed model can also be used to perform fusion of different facial features in a principled manner. We validate the proposed DS-GPLVM on both posed and spontaneously displayed facial expressions from three publicly available datasets (MultiPIE, labeled face parts in the wild, and static facial expressions in the wild). We show that this model outperforms the state-of-the-art methods for multiview and view-invariant facial expression classification, and several state-of-the-art methods for multiview learning and feature fusion.

Journal ArticleDOI
TL;DR: This article reviews the design and analysis of simulation experiments and focuses on analysis via two types of metamodel, namely, low-order polynomial regression, and Kriging (or Gaussian process).
Abstract: This article reviews the design and analysis of simulation experiments. It focusses on analysis via either low-order polynomial regression or Kriging (also known as Gaussian process) metamodels. The type of metamodel determines the design of the experiment, which determines the input combinations of the simulation experiment. For example, a first-order polynomial metamodel requires a "resolution-III" design, whereas Kriging may use Latin hypercube sampling. Polynomials of first or second order require resolution III, IV, V, or "central composite" designs. Before applying either regression or Kriging, sequential bifurcation may be applied to screen a great many inputs. Optimization of the simulated system may use either a sequence of low-order polynomials known as response surface methodology (RSM) or Kriging models fitted through sequential designs including efficient global optimization (EGO). The review includes robust optimization, which accounts for uncertain simulation inputs.

Journal ArticleDOI
TL;DR: Computer simulation results show that the proposed system reliability analysis method can accurately give the system failure probability with a relatively small number of deterministic slope stability analyses.

Journal ArticleDOI
TL;DR: This paper proposes using Gaussian processes to track an extended object or group of objects, that generates multiple measurements at each scan, that creates a model that describes the shape and the kinematics of the object.
Abstract: In this paper, we propose using Gaussian processes to track an extended object or group of objects, that generates multiple measurements at each scan. The shape and the kinematics of the object are ...

Journal ArticleDOI
TL;DR: A multi-resolution approximation (M-RA) of Gaussian processes observed at irregular locations in space is proposed, which can capture spatial structure from very fine to very large scales.
Abstract: Automated sensing instruments on satellites and aircraft have enabled the collection of massive amounts of high-resolution observations of spatial fields over large spatial regions. If these datasets can be efficiently exploited, they can provide new insights on a wide variety of issues. However, traditional spatial-statistical techniques such as kriging are not computationally feasible for big datasets. We propose a multi-resolution approximation (M-RA) of Gaussian processes observed at irregular locations in space. The M-RA process is specified as a linear combination of basis functions at multiple levels of spatial resolution, which can capture spatial structure from very fine to very large scales. The basis functions are automatically chosen to approximate a given covariance function, which can be nonstationary. All computations involving the M-RA, including parameter inference and prediction, are highly scalable for massive datasets. Crucially, the inference algorithms can also be parallelized to take full advantage of large distributed-memory computing environments. In comparisons using simulated data and a large satellite dataset, the M-RA outperforms a related state-of-the-art method.

Proceedings ArticleDOI
15 Jul 2015
TL;DR: A stabilization task, linearizes the nonlinear, GP-based model around a desired operating point, and solves a convex optimization problem to obtain a linear robust controller that provides robust stability and performance guarantees during learning.
Abstract: This paper introduces a learning-based robust control algorithm that provides robust stability and performance guarantees during learning. The approach uses Gaussian process (GP) regression based on data gathered during operation to update an initial model of the system and to gradually decrease the uncertainty related to this model. Embedding this data-based update scheme in a robust control framework guarantees stability during the learning process. Traditional robust control approaches have not considered online adaptation of the model and its uncertainty before. As a result, their controllers do not improve performance during operation. Typical machine learning algorithms that have achieved similar high-performance behavior by adapting the model and controller online do not provide the guarantees presented in this paper. In particular, this paper considers a stabilization task, linearizes the nonlinear, GP-based model around a desired operating point, and solves a convex optimization problem to obtain a linear robust controller. The resulting performance improvements due to the learning-based controller are demonstrated in experiments on a quadrotor vehicle.

Journal ArticleDOI
TL;DR: This paper investigates a Gaussian process-based Bayesian MRAC architecture (GP-MRAC), which leverages the power and flexibility of GP Bayesian nonparametric models of uncertainty and enables MRAC to handle a broader set of uncertainties, including those that are defined as distributions over functions.
Abstract: Most current model reference adaptive control (MRAC) methods rely on parametric adaptive elements, in which the number of parameters of the adaptive element are fixed a priori, often through expert judgment. An example of such an adaptive element is radial basis function networks (RBFNs), with RBF centers preallocated based on the expected operating domain. If the system operates outside of the expected operating domain, this adaptive element can become noneffective in capturing and canceling the uncertainty, thus rendering the adaptive controller only semiglobal in nature. This paper investigates a Gaussian process-based Bayesian MRAC architecture (GP-MRAC), which leverages the power and flexibility of GP Bayesian nonparametric models of uncertainty. The GP-MRAC does not require the centers to be preallocated, can inherently handle measurement noise, and enables MRAC to handle a broader set of uncertainties, including those that are defined as distributions over functions. We use stochastic stability arguments to show that GP-MRAC guarantees good closed-loop performance with no prior domain knowledge of the uncertainty. Online implementable GP inference methods are compared in numerical simulations against RBFN-MRAC with preallocated centers and are shown to provide better tracking and improved long-term learning.

Journal ArticleDOI
TL;DR: This work investigates MTGPs for physiological monitoring with synthetic data sets and two real-world problems from the field of patient monitoring and radiotherapy, and shows that the framework learned the correlation between physiological time series efficiently, outperforming the existing state of the art.
Abstract: Gaussian process (GP) models are a flexible means of performing nonparametric Bayesian regression. However, GP models in healthcare are often only used to model a single univariate output time series, denoted as single-task GPs (STGP). Due to an increasing prevalence of sensors in healthcare settings, there is an urgent need for robust multivariate time-series tools. Here, we propose a method using multitask GPs (MTGPs) which can model multiple correlated multivariate physiological time series simultaneously. The flexible MTGP framework can learn the correlation between multiple signals even though they might be sampled at different frequencies and have training sets available for different intervals. Furthermore, prior knowledge of any relationship between the time series such as delays and temporal behavior can be easily integrated. A novel normalization is proposed to allow interpretation of the various hyperparameters used in the MTGP. We investigate MTGPs for physiological monitoring with synthetic data sets and two real-world problems from the field of patient monitoring and radiotherapy. The results are compared with standard Gaussian processes and other existing methods in the respective biomedical application areas. In both cases, we show that our framework learned the correlation between physiological time series efficiently, outperforming the existing state of the art.

Journal ArticleDOI
TL;DR: It is demonstrated that the Gaussian process machine learner is able to discover a ramp that produces high quality BECs in 10 times fewer iterations than a previously used online optimization technique.
Abstract: Machine-designed control of complex devices or experiments can discover strategies superior to those developed via simplified models. We describe an online optimization algorithm based on Gaussian processes and apply it to optimization of the production of Bose-Einstein condensates (BEC). BEC is typically created with an exponential evaporation ramp that is approximately optimal for s-wave, ergodic dynamics with two-body interactions and no other loss rates, but likely sub-optimal for many real experiments. Machine learning using a Gaussian process, in contrast, develops a statistical model of the relationship between the parameters it controls and the quality of the BEC produced. This is an online process, and an active one, as the Gaussian process model updates on the basis of each subsequent experiment and proposes a new set of parameters as a result. We demonstrate that the Gaussian process machine learner is able to discover a ramp that produces high quality BECs in 10 times fewer iterations than a previously used online optimization technique. Furthermore, we show the internal model developed can be used to determine which parameters are essential in BEC creation and which are unimportant, providing insight into the optimization process.

Proceedings ArticleDOI
07 Jun 2015
TL;DR: Compared with the previous CRT-based face alignment methods that have shown state-of-the-art performances, cGPRT using shape-indexed DoG features performed best on the HELEN and 300-W datasets which are the most challenging dataset today.
Abstract: In this paper, we propose a face alignment method that uses cascade Gaussian process regression trees (cGPRT) constructed by combining Gaussian process regression trees (GPRT) in a cascade stage-wise manner. Here, GPRT is a Gaussian process with a kernel defined by a set of trees. The kernel measures the similarity between two inputs as the number of trees where the two inputs fall in the same leaves. Without increasing prediction time, the prediction of cGPRT can be performed in the same framework as the cascade regression trees (CRT) but with better generalization. Features for GPRT are designed using shape-indexed difference of Gaussian (DoG) filter responses sampled from local retinal patterns to increase stability and to attain robustness against geometric variances. Compared with the previous CRT-based face alignment methods that have shown state-of-the-art performances, cGPRT using shape-indexed DoG features performed best on the HELEN and 300-W datasets which are the most challenging dataset today.

Journal ArticleDOI
TL;DR: In this article, a new simulation method with the first order approximation and series expansions is proposed to improve the accuracy and efficiency of the Rice/FORM method, which maps the general stochastic process of the response into a Gaussian process, whose samples are then generated by the Expansion Optimal Linear Estimation if the response is stationary or by the Orthogonal Series Expansion if a response is non-stationary.
Abstract: Time-variant reliability is often evaluated by Rice's formula combined with the First Order Reliability Method (FORM). To improve the accuracy and efficiency of the Rice/FORM method, this work develops a new simulation method with the first order approximation and series expansions. The approximation maps the general stochastic process of the response into a Gaussian process, whose samples are then generated by the Expansion Optimal Linear Estimation if the response is stationary or by the Orthogonal Series Expansion if the response is non-stationary. As the computational cost largely comes from estimating the covariance of the response at expansion points, a cheaper surrogate model of the covariance is built and allows for significant reduction in computational cost. In addition to its superior accuracy and efficiency over the Rice/FORM method, the proposed method can also produce the failure rate and probability of failure with respect to time for a given period of time within only one reliability analysis.

Journal ArticleDOI
TL;DR: In this article, an approach based on peaks over thresholds that provides several new estimators for processes η in the max-domain of attraction of the frequently used Husler-Reiss model and its spatial extension: Brown-Resnick processes.
Abstract: Summary Estimation of extreme value parameters from observations in the max-domain of attraction of a multivariate max-stable distribution commonly uses aggregated data such as block maxima. Multivariate peaks-over-threshold methods, in contrast, exploit additional information from the non-aggregated ‘large’ observations. We introduce an approach based on peaks over thresholds that provides several new estimators for processes η in the max-domain of attraction of the frequently used Husler–Reiss model and its spatial extension: Brown–Resnick processes. The method relies on increments η(·)−η(t0) conditional on η(t0) exceeding a high threshold, where t0 is a fixed location. When the marginals are standardized to the Gumbel distribution, these increments asymptotically form a Gaussian process resulting in computationally simple estimates of the Husler–Reiss parameter matrix and particularly enables parametric inference for Brown–Resnick processes based on (high dimensional) multivariate densities. This is a major advantage over composite likelihood methods that are commonly used in spatial extreme value statistics since they rely only on bivariate densities. A simulation study compares the performance of the new estimators with other commonly used methods. As an application, we fit a non-isotropic Brown–Resnick process to the extremes of 12-year data of daily wind speed measurements.

Posted Content
TL;DR: The Variational Gaussian Process (VGP) as discussed by the authors generates approximate posterior samples by generating latent inputs and warping them through random nonlinear mappings; the distribution over random mappings is learned during inference, enabling the transformed outputs to adapt to varying complexity.
Abstract: Variational inference is a powerful tool for approximate inference, and it has been recently applied for representation learning with deep generative models. We develop the variational Gaussian process (VGP), a Bayesian nonparametric variational family, which adapts its shape to match complex posterior distributions. The VGP generates approximate posterior samples by generating latent inputs and warping them through random non-linear mappings; the distribution over random mappings is learned during inference, enabling the transformed outputs to adapt to varying complexity. We prove a universal approximation theorem for the VGP, demonstrating its representative power for learning any model. For inference we present a variational objective inspired by auto-encoders and perform black box inference over a wide class of models. The VGP achieves new state-of-the-art results for unsupervised learning, inferring models such as the deep latent Gaussian model and the recently proposed DRAW.

Proceedings Article
07 Dec 2015
TL;DR: A Hybrid Monte-Carlo sampling scheme which allows for a non-Gaussian approximation over the function values and covariance parameters simultaneously, with efficient computations based on inducing-point sparse GPs.
Abstract: Gaussian process (GP) models form a core part of probabilistic machine learning. Considerable research effort has been made into attacking three issues with GP models: how to compute efficiently when the number of data is large; how to approximate the posterior when the likelihood is not Gaussian and how to estimate covariance function parameter posteriors. This paper simultaneously addresses these, using a variational approximation to the posterior which is sparse in support of the function but otherwise free-form. The result is a Hybrid Monte-Carlo sampling scheme which allows for a non-Gaussian approximation over the function values and covariance parameters simultaneously, with efficient computations based on inducing-point sparse GPs. Code to replicate each experiment in this paper is available at github.com/sparseMCMC.

Journal ArticleDOI
Fan Li1, Jiuping Xu1
TL;DR: A novel integrated approach based on a mixture of Gaussian process (MGP) model and particle filtering (PF) is presented for lithium-ion battery SOH estimation under uncertain conditions, where the distribution of the degradation process is learnt from the inputs based on the available capacity monitoring data.

Journal ArticleDOI
TL;DR: The method is tested on several numerical examples and on an agronomy problem, showing that it provides an efficient trade-off between exploration and intensification.
Abstract: Optimization of expensive computer models with the help of Gaussian process emulators is now commonplace. However, when several (competing) objectives are considered, choosing an appropriate sampling strategy remains an open question. We present here a new algorithm based on stepwise uncertainty reduction principles. Optimization is seen as a sequential reduction of the volume of the excursion sets below the current best solutions (Pareto set), and our sampling strategy chooses the points that give the highest expected reduction. The method is tested on several numerical examples and on an agronomy problem, showing that it provides an efficient trade-off between exploration and intensification.