scispace - formally typeset
Search or ask a question

Showing papers on "Gaussian process published in 2014"


Book ChapterDOI
06 Sep 2014
TL;DR: This paper directly analyze this probability of target appearance as exponentially related to the confidence of a classifier output using Gaussian Processes Regression (GPR), and introduces a latent variable to assist the tracking decision.
Abstract: Modeling the target appearance is critical in many modern visual tracking algorithms. Many tracking-by-detection algorithms formulate the probability of target appearance as exponentially related to the confidence of a classifier output. By contrast, in this paper we directly analyze this probability using Gaussian Processes Regression (GPR), and introduce a latent variable to assist the tracking decision. Our observation model for regression is learnt in a semi-supervised fashion by using both labeled samples from previous frames and the unlabeled samples that are tracking candidates extracted from the current frame. We further divide the labeled samples into two categories: auxiliary samples collected from the very early frames and target samples from most recent frames. The auxiliary samples are dynamically re-weighted by the regression, and the final tracking result is determined by fusing decisions from two individual trackers, one derived from the auxiliary samples and the other from the target samples. All these ingredients together enable our tracker, denoted as TGPR, to alleviate the drifting issue from various aspects. The effectiveness of TGPR is clearly demonstrated by its excellent performances on three recently proposed public benchmarks, involving 161 sequences in total, in comparison with state-of-the-arts.

479 citations


Journal ArticleDOI
TL;DR: The theory of Gaussian multiplicative chaos was introduced by Kahane's seminal work in 1985 as discussed by the authors, and it has been applied in many applications, ranging from finance to quantum gravity.
Abstract: In this article, we review the theory of Gaussian multiplicative chaos initially introduced by Kahane’s seminal work in 1985. Though this beautiful paper faded from memory until recently, it already contains ideas and results that are nowadays under active investigation, like the construction of the Liouville measure in $2d$-Liouville quantum gravity or thick points of the Gaussian Free Field. Also, we mention important extensions and generalizations of this theory that have emerged ever since and discuss a whole family of applications, ranging from finance, through the Kolmogorov-Obukhov model of turbulence to $2d$-Liouville quantum gravity. This review also includes new results like the convergence of discretized Liouville measures on isoradial graphs (thus including the triangle and square lattices) towards the continuous Liouville measures (in the subcritical and critical case) or multifractal analysis of the measures in all dimensions.

469 citations


Journal ArticleDOI
TL;DR: This work presents a method to first detect the directions of the strongest variability using evaluations of the gradient and subsequently exploit these directions to construct a response surface on a low-dimensional subspace---i.e., the active subspace ---of the inputs.
Abstract: Many multivariate functions in engineering models vary primarily along a few directions in the space of input parameters. When these directions correspond to coordinate directions, one may apply global sensitivity measures to determine the most influential parameters. However, these methods perform poorly when the directions of variability are not aligned with the natural coordinates of the input space. We present a method to first detect the directions of the strongest variability using evaluations of the gradient and subsequently exploit these directions to construct a response surface on a low-dimensional subspace---i.e., the active subspace---of the inputs. We develop a theoretical framework with error bounds, and we link the theoretical quantities to the parameters of a kriging response surface on the active subspace. We apply the method to an elliptic PDE model with coefficients parameterized by 100 Gaussian random variables and compare it with a local sensitivity analysis method for dimension reduc...

435 citations


Journal ArticleDOI
TL;DR: A new framework is developed and used in GPEME, which carefully coordinates the surrogate modeling and the evolutionary search, so that the search can focus on a small promising area and is supported by the constructed surrogate model.
Abstract: Surrogate model assisted evolutionary algorithms (SAEAs) have recently attracted much attention due to the growing need for computationally expensive optimization in many real-world applications. Most current SAEAs, however, focus on small-scale problems. SAEAs for medium-scale problems (i.e., 20-50 decision variables) have not yet been well studied. In this paper, a Gaussian process surrogate model assisted evolutionary algorithm for medium-scale computationally expensive optimization problems (GPEME) is proposed and investigated. Its major components are a surrogate model-aware search mechanism for expensive optimization problems when a high-quality surrogate model is difficult to build and dimension reduction techniques for tackling the “curse of dimensionality.” A new framework is developed and used in GPEME, which carefully coordinates the surrogate modeling and the evolutionary search, so that the search can focus on a small promising area and is supported by the constructed surrogate model. Sammon mapping is introduced to transform the decision variables from tens of dimensions to a few dimensions, in order to take advantage of Gaussian process surrogate modeling in a low-dimensional space. Empirical studies on benchmark problems with 20, 30, and 50 variables and a real-world power amplifier design automation problem with 17 variables show the high efficiency and effectiveness of GPEME. Compared to three state-of-the-art SAEAs, better or similar solutions can be obtained with 12% to 50% exact function evaluations.

369 citations


Journal ArticleDOI
TL;DR: The GP-BUCB algorithm is also applicable in the related case of a delay between initiation of an experiment and observation of its results, for which the same regret bounds hold.
Abstract: How can we take advantage of opportunities for experimental parallelization in exploration-exploitation tradeoffs? In many experimental scenarios, it is often desirable to execute experiments simultaneously or in batches, rather than only performing one at a time Additionally, observations may be both noisy and expensive We introduce Gaussian Process Batch Upper Confidence Bound (GP-BUCB), an upper confidence bound-based algorithm, which models the reward function as a sample from a Gaussian process and which can select batches of experiments to run in parallel We prove a general regret bound for GP-BUCB, as well as the surprising result that for some common kernels, the asymptotic average regret can be made independent of the batch size The GP-BUCB algorithm is also applicable in the related case of a delay between initiation of an experiment and observation of its results, for which the same regret bounds hold We also introduce Gaussian Process Adaptive Upper Confidence Bound (GP-AUCB), a variant of GP-BUCB which can exploit parallelism in an adaptive manner We evaluate GP-BUCB and GP-AUCB on several simulated and real data sets These experiments show that GP-BUCB and GP-AUCB are competitive with state-of-the-art heuristics

338 citations


Journal ArticleDOI
TL;DR: This paper explicitly formulate the extension of the HGF's hierarchy to any number of levels, and discusses how various forms of uncertainty are accommodated by the minimization of variational free energy as encoded in the update equations.
Abstract: In its full sense, perception rests on an agent’s model of how its sensory input comes about and the inferences it draws based on this model. These inferences are necessarily uncertain. Here, we illustrate how the hierarchical Gaussian filter (HGF) offers a principled and generic way to deal with the several forms that uncertainty in perception takes. The HGF is a recent derivation of one-step update equations from Bayesian principles that rests on a hierarchical generative model of the environment and its (in)stability. It is computationally highly efficient, allows for online estimates of hidden states, and has found numerous applications to experimental data from human subjects. In this paper, we generalize previous descriptions of the HGF and its account of perceptual uncertainty. First, we explicitly formulate the extension of the HGF’s hierarchy to any number of levels; second, we discuss how various forms of uncertainty are accommodated by the minimization of variational free energy as encoded in the update equations; third, we combine the HGF with decision models and demonstrate the inversion of this combination; finally, we report a simulation study that compared four optimization methods for inverting the HGF/decision model combination at different noise levels. These four methods (Nelder-Mead simplex algorithm, Gaussian process-based global optimization, variational Bayes and Markov chain Monte Carlo sampling) all performed well even under considerable noise, with variational Bayes offering the best combination of efficiency and informativeness of inference. Our results demonstrate that the HGF provides a principled, flexible, and efficient - but at the same time intuitive - framework for the resolution of perceptual uncertainty in behaving agents.

294 citations


ReportDOI
TL;DR: An abstract approximation theorem that is applicable to a wide variety of problems, primarily in statistics, is proved and the bound in the main approximation theorem is non-asymptotic and the theorem does not require uniform boundedness of the class of functions.
Abstract: This paper develops a new direct approach to approximating suprema of general empirical processes by a sequence of suprema of Gaussian processes, without taking the route of approximating whole empirical processes in the sup-norm. We prove an abstract approximation theorem applicable to a wide variety of statistical problems, such as construction of uniform confidence bands for functions. Notably, the bound in the main approximation theorem is nonasymptotic and the theorem allows for functions that index the empirical process to be unbounded and have entropy divergent with the sample size. The proof of the approximation theorem builds on a new coupling inequality for maxima of sums of random vectors, the proof of which depends on an effective use of Stein’s method for normal approximation, and some new empirical process techniques. We study applications of this approximation theorem to local and series empirical processes arising in nonparametric estimation via kernel and series methods, where the classes of functions change with the sample size and are non-Donsker. Importantly, our new technique is able to prove the Gaussian approximation for the supremum type statistics under weak regularity conditions, especially concerning the bandwidth and the number of series functions, in those examples.

257 citations


Book
16 Feb 2014
TL;DR: In this paper, the authors present an overview of the history of Gaussian Processes and their application to Banach Space Theory, including Bernouilli Processes, Random Fourier Series and Trigonometric Sums, and the fundamental Conjectures.
Abstract: 0. Introduction.- 1. Philosophy and Overview of the Book.- 2. Gaussian Processes and the Generic Chaining.- 3. Random Fourier Series and Trigonometric Sums, I. - 4. Matching Theorems I.- 5. Bernouilli Processes.- 6. Trees and the Art of Lower Bounds.- 7. Random Fourier Series and Trigonometric Sums, II.- 8. Processes Related to Gaussian Processes.- 9. Theory and Practice of Empirical Processes.- 10. Partition Scheme for Families of Distances.- 11. Infinitely Divisible Processes.- 12. The Fundamental Conjectures.- 13. Convergence of Orthogonal Series Majorizing Measures.- 14. Matching Theorems, II: Shor's Matching Theorem. 15. The Ultimate Matching Theorem in Dimension => 3.- 16. Applications to Banach Space Theory.- 17. Appendix: What this Book is Really About.- 18. Appendix: Continuity.- References. Index.

216 citations


Journal ArticleDOI
TL;DR: An approximation in which observations are split into contiguous blocks and independence across blocks is assumed often provides a much better approximation to the likelihood than a low rank approximation requiring similar memory and calculations.
Abstract: Evaluating the likelihood function for Gaussian models when a spatial process is observed irregularly is problematic for larger datasets due to constraints of memory and calculation. If the covariance structure can be approximated by a diagonal matrix plus a low rank matrix, then both the memory and calculations needed to evaluate the likelihood function are greatly reduced. When neighboring observations are strongly correlated, much of the variation in the observations can be captured by low frequency components, so the low rank approach might be thought to work well in this setting. Through both theory and numerical results, where the diagonal matrix is assumed to be a multiple of the identity, this paper shows that the low rank approximation sometimes performs poorly in this setting. In particular, an approximation in which observations are split into contiguous blocks and independence across blocks is assumed often provides a much better approximation to the likelihood than a low rank approximation requiring similar memory and calculations. An example with satellite-based measurements of total column ozone shows that these results are relevant to real data and that the low rank models also can be highly statistically inefficient for spatial interpolation.

199 citations


Posted Content
TL;DR: In this article, the authors develop a methodology for automatically learning a wide family of bijective transformations or warpings of the input space using the Beta cumulative distribution function and further extend the warping framework to multi-task Bayesian optimization so that multiple tasks can be warped into a jointly stationary space.
Abstract: Bayesian optimization has proven to be a highly effective methodology for the global optimization of unknown, expensive and multimodal functions. The ability to accurately model distributions over functions is critical to the effectiveness of Bayesian optimization. Although Gaussian processes provide a flexible prior over functions which can be queried efficiently, there are various classes of functions that remain difficult to model. One of the most frequently occurring of these is the class of non-stationary functions. The optimization of the hyperparameters of machine learning algorithms is a problem domain in which parameters are often manually transformed a priori, for example by optimizing in "log-space," to mitigate the effects of spatially-varying length scale. We develop a methodology for automatically learning a wide family of bijective transformations or warpings of the input space using the Beta cumulative distribution function. We further extend the warping framework to multi-task Bayesian optimization so that multiple tasks can be warped into a jointly stationary space. On a set of challenging benchmark optimization tasks, we observe that the inclusion of warping greatly improves on the state-of-the-art, producing better results faster and more reliably.

198 citations


Proceedings Article
27 Jul 2014
TL;DR: The beginnings of an automatic statistician is presented, focusing on regression problems, which explores an open-ended space of statistical models to discover a good explanation of a data set, and then produces a detailed report with figures and natural language text.
Abstract: This paper presents the beginnings of an automatic statistician, focusing on regression problems. Our system explores an open-ended space of statistical models to discover a good explanation of a data set, and then produces a detailed report with figures and natural language text. Our approach treats unknown regression functions nonparametrically using Gaussian processes, which has two important consequences. First, Gaussian processes can model functions in terms of high-level properties (e.g. smoothness, trends, periodicity, changepoints). Taken together with the compositional structure of our language of models this allows us to automatically describe functions in simple terms. Second, the use of flexible nonparametric models and a rich language for composing them in an open-ended manner also results in state-of-the-art extrapolation performance evaluated over 13 real time series data sets from various domains.

Journal ArticleDOI
TL;DR: Gaussian Process Regression (GPR), an effective kernel-based machine learning algorithm, is applied to probabilistic streamflow forecasting and indicates relatively strong persistence of streamflow predictability in the extended period, although the low-predictability basins tend to show more variations.

Journal ArticleDOI
TL;DR: This work presents a new method based on Dirichlet process Gaussian mixture models, which is used to estimate per-pixel background distributions, followed by probabilistic regularisation, and develops novel model learning algorithms for continuous update of the model in a principled fashion as the scene changes.
Abstract: Video analysis often begins with background subtraction. This problem is often approached in two steps-a background model followed by a regularisation scheme. A model of the background allows it to be distinguished on a per-pixel basis from the foreground, whilst the regularisation combines information from adjacent pixels. We present a new method based on Dirichlet process Gaussian mixture models, which are used to estimate per-pixel background distributions. It is followed by probabilistic regularisation. Using a non-parametric Bayesian method allows per-pixel mode counts to be automatically inferred, avoiding over-/under- fitting. We also develop novel model learning algorithms for continuous update of the model in a principled fashion as the scene changes. These key advantages enable us to outperform the state-of-the-art alternatives on four benchmarks.

Journal ArticleDOI
TL;DR: A Bayesian analysis of inverse Gaussian process models for degradation modeling and inference and a classic example is presented to demonstrate the applicability of the Bayesian method for degradation analysis with the inverse Gaussia process models.

Journal ArticleDOI
TL;DR: In this paper, the scaling Brownian motion (SBM) model is shown to be weakly nonergodic but does not exhibit a significant amplitude scatter of the time averaged mean squared displacement.
Abstract: Anomalous diffusion is frequently described by scaled Brownian motion (SBM), a Gaussian process with a power-law time dependent diffusion coefficient. Its mean squared displacement is 〈x2(t)〉 ≃ 2(t)t with (t) ≃ tα−1 for 0 < α < 2. SBM may provide a seemingly adequate description in the case of unbounded diffusion, for which its probability density function coincides with that of fractional Brownian motion. Here we show that free SBM is weakly non-ergodic but does not exhibit a significant amplitude scatter of the time averaged mean squared displacement. More severely, we demonstrate that under confinement, the dynamics encoded by SBM is fundamentally different from both fractional Brownian motion and continuous time random walks. SBM is highly non-stationary and cannot provide a physical description for particles in a thermalised stationary system. Our findings have direct impact on the modelling of single particle tracking experiments, in particular, under confinement inside cellular compartments or when optical tweezers tracking methods are used.

Journal ArticleDOI
24 Mar 2014-PLOS ONE
TL;DR: The quality of inference is comparable or superior to the one achieved by mean-field approximations to inference with discrete variables, as done by direct-coupling analysis for the prediction of residue-residue contacts in proteins and the identification of protein-protein interaction partner in bacterial signal transduction.
Abstract: In the course of evolution, proteins show a remarkable conservation of their three-dimensional structure and their biological function, leading to strong evolutionary constraints on the sequence variability between homologous proteins. Our method aims at extracting such constraints from rapidly accumulating sequence data, and thereby at inferring protein structure and function from sequence information alone. Recently, global statistical inference methods (e.g. direct-coupling analysis, sparse inverse covariance estimation) have achieved a breakthrough towards this aim, and their predictions have been successfully implemented into tertiary and quaternary protein structure prediction methods. However, due to the discrete nature of the underlying variable (amino-acids), exact inference requires exponential time in the protein length, and efficient approximations are needed for practical applicability. Here we propose a very efficient multivariate Gaussian modeling approach as a variant of direct-coupling analysis: the discrete amino-acid variables are replaced by continuous Gaussian random variables. The resulting statistical inference problem is efficiently and exactly solvable. We show that the quality of inference is comparable or superior to the one achieved by mean-field approximations to inference with discrete variables, as done by direct-coupling analysis. This is true for (i) the prediction of residue-residue contacts in proteins, and (ii) the identification of protein-protein interaction partner in bacterial signal transduction. An implementation of our multivariate Gaussian approach is available at the website http://areeweb.polito.it/ricerca/cmp/code.

Journal ArticleDOI
TL;DR: In this paper, the random hypersurface model (RHM) is introduced for estimating a shape approximation of an extended object in addition to its kinematic state, where the shape parameters and measurements are related via a measurement equation that serves as the basis for a Gaussian state estimator.
Abstract: The random hypersurface model (RHM) is introduced for estimating a shape approximation of an extended object in addition to its kinematic state. An RHM represents the spatial extent by means of randomly scaled versions of the shape boundary. In doing so, the shape parameters and the measurements are related via a measurement equation that serves as the basis for a Gaussian state estimator. Specific estimators are derived for elliptic and star-convex shapes.

Posted Content
TL;DR: In this paper, the Student-t process is proposed as an alternative to the Gaussian process as a nonparametric prior over functions, and closed form expressions for the marginal likelihood and predictive distribution of a Student-T process are derived by integrating away an inverse Wishart process prior over the covariance kernel.
Abstract: We investigate the Student-t process as an alternative to the Gaussian process as a nonparametric prior over functions. We derive closed form expressions for the marginal likelihood and predictive distribution of a Student-t process, by integrating away an inverse Wishart process prior over the covariance kernel of a Gaussian process model. We show surprising equivalences between different hierarchical Gaussian process models leading to Student-t processes, and derive a new sampling scheme for the inverse Wishart process, which helps elucidate these equivalences. Overall, we show that a Student-t process can retain the attractive properties of a Gaussian process -- a nonparametric representation, analytic marginal and predictive distributions, and easy model selection through covariance kernels -- but has enhanced flexibility, and predictive covariances that, unlike a Gaussian process, explicitly depend on the values of training observations. We verify empirically that a Student-t process is especially useful in situations where there are changes in covariance structure, or in applications like Bayesian optimization, where accurate predictive covariances are critical for good performance. These advantages come at no additional computational cost over Gaussian processes.

Journal ArticleDOI
TL;DR: The generalized likelihood ratio test (GLRT), Rao test, Wald test, as well as their two-step variations, in homogeneous environments are derived, inhomogeneous environments and three types of spectral norm tests (SNTs) are introduced.
Abstract: In this two-part paper, we consider the problem of adaptive multidimensional/multichannel signal detection in Gaussian noise with unknown covariance matrix. The test data (primary data) is assumed as a collection of sample vectors, arranged as the columns of a rectangular data array. The rows and columns of the signal matrix are both assumed to lie in known subspaces, but with unknown coordinates. Due to this feature of the signal structure, we name this kind of signal as the double subspace signal. Part I of this paper focuses on the adaptive detection in homogeneous environments, while Part II deals with the adaptive detection in partially homogeneous environments. Precisely, in this part, we derive the generalized likelihood ratio test (GLRT), Rao test, Wald test, as well as their two-step variations, in homogeneous environments. Three types of spectral norm tests (SNTs) are also introduced. All these detectors are shown to possess the constant false alarm rate (CFAR) property. Moreover, we discuss the differences between them and show how they work. Another contribution is that we investigate various special cases of these detectors. Remarkably, some of them are well-known existing detectors, while some others are still new. At the stage of performance evaluation, conducted by Monte Carlo simulations, both matched and mismatched signals are dealt with. For each case, more than one scenario is considered.

Posted Content
TL;DR: This work identifies four desirable properties that are important for scalability, expressiveness and robustness, when learning and inferring with a combination of multiple models and shows that gPoE of Gaussian processes has these qualities, while no other existing combination schemes satisfy all of them at the same time.
Abstract: In this work, we propose a generalized product of experts (gPoE) framework for combining the predictions of multiple probabilistic models. We identify four desirable properties that are important for scalability, expressiveness and robustness, when learning and inferring with a combination of multiple models. Through analysis and experiments, we show that gPoE of Gaussian processes (GP) have these qualities, while no other existing combination schemes satisfy all of them at the same time. The resulting GP-gPoE is highly scalable as individual GP experts can be independently learned in parallel; very expressive as the way experts are combined depends on the input rather than fixed; the combined prediction is still a valid probabilistic model with natural interpretation; and finally robust to unreliable predictions from individual experts.

Proceedings Article
08 Dec 2014
TL;DR: It is shown that a distinct combination of expressive kernels, a fully non-parametric representation, and scalable inference which exploits existing model structure, are critical for large scale multidimensional pattern extrapolation.
Abstract: The ability to automatically discover patterns and perform extrapolation is an essential quality of intelligent systems. Kernel methods, such as Gaussian processes, have great potential for pattern extrapolation, since the kernel flexibly and interpretably controls the generalisation properties of these methods. However, automatically extrapolating large scale multidimensional patterns is in general difficult, and developing Gaussian process models for this purpose involves several challenges. A vast majority of kernels, and kernel learning methods, currently only succeed in smoothing and interpolation. This difficulty is compounded by the fact that Gaussian processes are typically only tractable for small datasets, and scaling an expressive kernel learning approach poses different challenges than scaling a standard Gaussian process model. One faces additional computational constraints, and the need to retain significant model structure for expressing the rich information available in a large dataset. In this paper, we propose a Gaussian process approach for large scale multidimensional pattern extrapolation. We recover sophisticated out of class kernels, perform texture extrapolation, inpainting, and video extrapolation, and long range forecasting of land surface temperatures, all on large multidimensional datasets, including a problem with 383,400 training points. The proposed method significantly outperforms alternative scalable and flexible Gaussian process methods, in speed and accuracy. Moreover, we show that a distinct combination of expressive kernels, a fully non-parametric representation, and scalable inference which exploits existing model structure, are critical for large scale multidimensional pattern extrapolation.

Journal ArticleDOI
TL;DR: In this paper, dimensionality reduction targeting the preservation of multimodal structures is proposed to counter the parameter-space issue, where locality-preserving nonnegative matrix factorization, as well as local Fisher's discriminant analysis, is deployed as preprocessing to reduce the dimensionality of data for the Gaussian-mixture-model classifier.
Abstract: The Gaussian mixture model is a well-known classification tool that captures non-Gaussian statistics of multivariate data. However, the impractically large size of the resulting parameter space has hindered widespread adoption of Gaussian mixture models for hyperspectral imagery. To counter this parameter-space issue, dimensionality reduction targeting the preservation of multimodal structures is proposed. Specifically, locality-preserving nonnegative matrix factorization, as well as local Fisher's discriminant analysis, is deployed as preprocessing to reduce the dimensionality of data for the Gaussian-mixture-model classifier, while preserving multimodal structures within the data. In addition, the pixel-wise classification results from the Gaussian mixture model are combined with spatial-context information resulting from a Markov random field. Experimental results demonstrate that the proposed classification system significantly outperforms other approaches even under limited training data.

Proceedings Article
08 Dec 2014
TL;DR: A novel re-parametrisation of variational inference for sparse GP regression and latent variable models that allows for an efficient distributed algorithm and shows that GPs perform better than many common models often used for big data.
Abstract: Gaussian processes (GPs) are a powerful tool for probabilistic inference over functions. They have been applied to both regression and non-linear dimensionality reduction, and offer desirable properties such as uncertainty estimates, robustness to over-fitting, and principled ways for tuning hyper-parameters. However the scalability of these models to big datasets remains an active topic of research. We introduce a novel re-parametrisation of variational inference for sparse GP regression and latent variable models that allows for an efficient distributed algorithm. This is done by exploiting the decoupling of the data given the inducing points to re-formulate the evidence lower bound in a Map-Reduce setting. We show that the inference scales well with data and computational resources, while preserving a balanced distribution of the load among the nodes. We further demonstrate the utility in scaling Gaussian processes to big data. We show that GP performance improves with increasing amounts of data in regression (on flight data with 2 million records) and latent variable modelling (on MNIST). The results show that GPs perform better than many common models often used for big data.

Journal ArticleDOI
TL;DR: Two approaches for on-line Gaussian process regression with low computational and memory demands are proposed, one that assumes known hyperparameters and performs regression on a set of basis vectors that stores mean and covariance estimates of the latent function.

Proceedings ArticleDOI
12 Jul 2014
TL;DR: This paper revisits batch state estimation through the lens of Gaussian process (GP) regression, and shows that this class of prior results in an inverse kernel matrix that is exactly sparse (block-tridiagonal) and that this can be exploited to carry out GP regression (and interpolation) very efficiently.
Abstract: In this paper, we revisit batch state estimation through the lens of Gaussian process (GP) regression. We consider continuous-discrete estimation problems wherein a trajectory is viewed as a one-dimensional GP, with time as the independent variable. Our continuous-time prior can be defined by any linear, time-varying stochastic differential equation driven by white noise; this allows the possibility of smoothing our trajectory estimates using a variety of vehicle dynamics models (e.g., ‘constant-velocity’). We show that this class of prior results in an inverse kernel matrix (i.e., covariance matrix between all pairs of measurement times) that is exactly sparse (block-tridiagonal) and that this can be exploited to carry out GP regression (and interpolation) very efficiently. Though the prior is continuous, we consider measurements to occur at discrete times. When the measurement model is also linear, this GP approach is equivalent to classical, discrete-time smoothing (at the measurement times). When the measurement model is nonlinear, we iterate over the whole trajectory (as is common in vision and robotics) to maximize accuracy. We test the approach experimentally on a simultaneous trajectory estimation and mapping problem using a mobile robot dataset.

Posted Content
TL;DR: This work presents a procedure for efficient variational Bayesian learning of nonlinear state-space models based on sparse Gaussian processes and offers the possibility to straightforwardly trade off model capacity and computational cost whilst avoiding overfitting.
Abstract: State-space models have been successfully used for more than fifty years in different areas of science and engineering. We present a procedure for efficient variational Bayesian learning of nonlinear state-space models based on sparse Gaussian processes. The result of learning is a tractable posterior over nonlinear dynamical systems. In comparison to conventional parametric models, we offer the possibility to straightforwardly trade off model capacity and computational cost whilst avoiding overfitting. Our main algorithm uses a hybrid inference approach combining variational Bayes and sequential Monte Carlo. We also present stochastic variational inference and online learning approaches for fast learning with long time series.

Posted Content
TL;DR: Manifold GP as discussed by the authors learns a transformation of the data into a feature space and a GP regression from the feature space to observed space, which is a full GP and allows to learn data representations which are useful for the overall regression task.
Abstract: Off-the-shelf Gaussian Process (GP) covariance functions encode smoothness assumptions on the structure of the function to be modeled. To model complex and non-differentiable functions, these smoothness assumptions are often too restrictive. One way to alleviate this limitation is to find a different representation of the data by introducing a feature space. This feature space is often learned in an unsupervised way, which might lead to data representations that are not useful for the overall regression task. In this paper, we propose Manifold Gaussian Processes, a novel supervised method that jointly learns a transformation of the data into a feature space and a GP regression from the feature space to observed space. The Manifold GP is a full GP and allows to learn data representations, which are useful for the overall regression task. As a proof-of-concept, we evaluate our approach on complex non-smooth functions where standard GPs perform poorly, such as step functions and robotics tasks with contacts.

Journal ArticleDOI
TL;DR: A nonparametric Bayesian dynamic model is proposed, which reduces dimensionality in characterizing the binary matrix through a lower-dimensional latent space representation, with the latent coordinates evolving in continuous time via Gaussian processes, to obtain a flexible and computationally tractable formulation.
Abstract: Symmetric binary matrices representing relations are collected in many areas. Our focus is on dynamically evolving binary relational matrices, with interest being on inference on the relationship structure and prediction. We propose a nonparametric Bayesian dynamic model, which reduces dimensionality in characterizing the binary matrix through a lower-dimensional latent space representation, with the latent coordinates evolving in continuous time via Gaussian processes. By using a logistic mapping function from the link probability matrix space to the latent relational space, we obtain a flexible and computationally tractable formulation. Employing Polya-gamma data augmentation, an efficient Gibbs sampler is developed for posterior computation, with the dimension of the latent space automatically inferred. We provide theoretical results on flexibility of the model, and illustrate its performance via simulation experiments. We also consider an application to co-movements in world financial markets.

ReportDOI
TL;DR: In this paper, an anti-concentration property of the supremum of a Gaussian process is derived from an inequality leading to a generalized SBR condition for separable Gaussian processes.
Abstract: Modern construction of uniform condence e and Nickl (2010). This condition requires the existence of a limit distribution of an extreme value type for the supremum of a studentized empirical process (equivalently, for the supremum of a Gaussian process with the same covariance function as that of the studentized empirical process). The principal contribution of this paper is to remove the need for this classical condition. We show that a considerably weaker sucient condi- tion is derived from an anti-concentration property of the supremum of the approximating Gaussian process, and we derive an inequality lead- ing to such a property for separable Gaussian processes. We refer to the new condition as a generalized SBR condition. Our new result shows that the supremum does not concentrate too fast around any value. We then apply this result to derive a Gaussian multiplier bootstrap procedure for constructing honest condence bands for nonparametric density estimators (this result can be applied in other nonparametric problems as well). An essential advantage of our approach is that it ap- plies generically even in those cases where the limit distribution of the supremum of the studentized empirical process does not exist (or is un- known). This is of particular importance in problems where resolution levels or other tuning parameters have been chosen in a data-driven fash- ion, which is needed for adaptive constructions of the condence bands. Furthermore, our approach is asymptotically honest at a polynomial rate { namely, the error in coverage level converges to zero at a fast, polynomial speed (with respect to the sample size). In sharp contrast, the approach based on extreme value theory is asymptotically honest only at a logarithmic rate { the error converges to zero at a slow, loga- rithmic speed. Finally, of independent interest is our introduction of a new, practical version of Lepski's method, which computes the optimal, non-conservative resolution levels via a Gaussian multiplier bootstrap method.

Journal ArticleDOI
TL;DR: In this paper, a convex programming approach is used to disentangle signal and corruption, and conditions for exact signal recovery from structured corruption and stable signal recovery with added unstructured noise are provided.
Abstract: We study the problem of corrupted sensing, a generalization of compressed sensing in which one aims to recover a signal from a collection of corrupted or unreliable measurements. While an arbitrary signal cannot be recovered in the face of arbitrary corruption, tractable recovery is possible when both signal and corruption are suitably structured. We quantify the relationship between signal recovery and two geometric measures of structure, the Gaussian complexity of a tangent cone, and the Gaussian distance to a subdifferential. We take a convex programming approach to disentangling signal and corruption, analyzing both penalized programs that tradeoff between signal and corruption complexity, and constrained programs that bound the complexity of signal or corruption when prior information is available. In each case, we provide conditions for exact signal recovery from structured corruption and stable signal recovery from structured corruption with added unstructured noise. Our simulations demonstrate close agreement between our theoretical recovery bounds and the sharp phase transitions observed in practice. In addition, we provide new interpretable bounds for the Gaussian complexity of sparse vectors, block-sparse vectors, and low-rank matrices, which lead to sharper guarantees of recovery when combined with our results and those in the literature.