Showing papers in &quot;Statistics and Computing in 2021&quot;

Ensemble Kalman inversion: mean-field limit and convergence analysis

TL;DR: In this paper, a single-pass algorithm for computing the gradient and Fisher information of Vecchia's Gaussian process loglikelihood approximation is proposed, which provides a computationally efficient means for applying the Fisher scoring algorithm for maximizing the log likelihood.

...read moreread less

Abstract: We derive a single-pass algorithm for computing the gradient and Fisher information of Vecchia’s Gaussian process loglikelihood approximation, which provides a computationally efficient means for applying the Fisher scoring algorithm for maximizing the loglikelihood. The advantages of the optimization techniques are demonstrated in numerical examples and in an application to Argo ocean temperature data. The new methods find the maximum likelihood estimates much faster and more reliably than an optimization method that uses only function evaluations, especially when the covariance function has many parameters. This allows practitioners to fit nonstationary models to large spatial and spatial–temporal datasets.

...read moreread less

19 citations

Journal Article•DOI•

[...]

Zhiyan Ding¹, Qin Li¹•Institutions (1)

University of Wisconsin-Madison¹

Bayesian ODE solvers: the maximum a posteriori estimate

TL;DR: In this paper, the authors analyzed the continuous version of EKI, a coupled SDE system, and proved the mean-field limit of the SDE in the weakly nonlinear case.

...read moreread less

Abstract: Ensemble Kalman inversion (EKI) has been a very popular algorithm used in Bayesian inverse problems (Iglesias et al. in Inverse Probl 29: 045001, 2013). It samples particles from a prior distribution and introduces a motion to move the particles around in pseudo-time. As the pseudo-time goes to infinity, the method finds the minimizer of the objective function, and when the pseudo-time stops at 1, the ensemble distribution of the particles resembles, in some sense, the posterior distribution in the linear setting. The ideas trace back further to ensemble Kalman filter and the associated analysis (Evensen in J Geophys Res: Oceans 99: 10143–10162, 1994; Reich in BIT Numer Math 51: 235–249, 2011), but to today, when viewed as a sampling method, why EKI works, and in what sense with what rate the method converges is still largely unknown. In this paper, we analyze the continuous version of EKI, a coupled SDE system, and prove the mean-field limit of this SDE system. In particular, we will show that 1. as the number of particles goes to infinity, the empirical measure of particles following SDE converges to the solution to a Fokker–Planck equation in Wasserstein 2-distance with an optimal rate, for both linear and weakly nonlinear case; 2. the solution to the Fokker–Planck equation reconstructs the target distribution in finite time in the linear case, as suggested in Iglesias et al. (Inverse Probl 29: 045001, 2013).

...read moreread less

17 citations

Journal Article•DOI•

[...]

Filip Tronarp¹, Simo Särkkä², Philipp Hennig³•Institutions (3)

University of Tübingen¹, Aalto University², Max Planck Society³

Rank-one multi-reference factor analysis

TL;DR: In this paper, the maximum a posteriori estimate of a Gauss-Markov prior with an iterated extended Kalman smoother was studied under mild conditions on the vector field and convergence rates were obtained via nonlinear analysis and scattered data approximation.

...read moreread less

Abstract: There is a growing interest in probabilistic numerical solutions to ordinary differential equations. In this paper, the maximum a posteriori estimate is studied under the class of $$ u $$ times differentiable linear time-invariant Gauss–Markov priors, which can be computed with an iterated extended Kalman smoother. The maximum a posteriori estimate corresponds to an optimal interpolant in the reproducing kernel Hilbert space associated with the prior, which in the present case is equivalent to a Sobolev space of smoothness $$ u +1$$ . Subject to mild conditions on the vector field, convergence rates of the maximum a posteriori estimate are then obtained via methods from nonlinear analysis and scattered data approximation. These results closely resemble classical convergence results in the sense that a $$ u $$ times differentiable prior process obtains a global order of $$ u $$ , which is demonstrated in numerical examples.

...read moreread less

14 citations

Journal Article•DOI•

[...]

Yariv Aizenbud¹, Boris Landa¹, Yoel Shkolnisky²•Institutions (2)

Yale University¹, Tel Aviv University²

Unbiased estimation of the gradient of the log-likelihood in inverse problems

TL;DR: In this article, the authors consider the problem of estimating a signal from its scaled, cyclically shifted and noisy observations, and derive a procedure which is proved to consistently estimate the signal.

...read moreread less

Abstract: In recent years, there is a growing need for processing methods aimed at extracting useful information from large datasets. In many cases, the challenge is to discover a low-dimensional structure in the data, often concealed by the existence of nuisance parameters and noise. Motivated by such challenges, we consider the problem of estimating a signal from its scaled, cyclically shifted and noisy observations. We focus on the particularly challenging regime of low signal-to-noise ratio (SNR), where different observations cannot be shift-aligned. We show that an accurate estimation of the signal from its noisy observations is possible, and derive a procedure which is proved to consistently estimate the signal. The asymptotic sample complexity (the number of observations required to recover the signal) of the procedure is $$1{/}{\text {SNR}}^4$$ . Additionally, we propose a procedure which is experimentally shown to improve the sample complexity by a factor equal to the signal’s length. Finally, we present numerical experiments which demonstrate the performance of our algorithms and corroborate our theoretical findings.

...read moreread less

13 citations

Journal Article•DOI•

[...]

Ajay Jasra¹, Kody J. H. Law², Deng Lu³•Institutions (3)

King Abdullah University of Science and Technology¹, University of Manchester², National University of Singapore³

03 Mar 2021-Statistics and Computing

TL;DR: In this article, the authors develop a new methodology to unbiasedly estimate the gradient of the log-likelihood with respect to the unknown parameter, i.e. the expectation of the estimate has no discretization bias.

...read moreread less

Abstract: We consider the problem of estimating a parameter $$\theta \in \Theta \subseteq {\mathbb {R}}^{d_{\theta }}$$ associated with a Bayesian inverse problem. Typically one must resort to a numerical approximation of gradient of the log-likelihood and also adopt a discretization of the problem in space and/or time. We develop a new methodology to unbiasedly estimate the gradient of the log-likelihood with respect to the unknown parameter, i.e. the expectation of the estimate has no discretization bias. Such a property is not only useful for estimation in terms of the original stochastic model of interest, but can be used in stochastic gradient algorithms which benefit from unbiased estimates. Under appropriate assumptions, we prove that our estimator is not only unbiased but of finite variance. In addition, when implemented on a single processor, we show that the cost to achieve a given level of error is comparable to multilevel Monte Carlo methods, both practically and theoretically. However, the new algorithm is highly amenable to parallel computation.

...read moreread less

11 citations

Journal Article•DOI•

Locally induced Gaussian processes for large-scale simulation experiments

[...]

D. Austin Cole¹, Ryan B. Christianson¹, Robert B. Gramacy¹•Institutions (1)

Virginia Tech¹

Efficient stochastic optimisation by unadjusted Langevin Monte Carlo: Application to maximum marginal likelihood and empirical Bayesian estimation

TL;DR: In this article, a cascade of strategies for planning the selection of local inducing points is provided, and comparisons are drawn to related methodology with emphasis on computer surrogate modeling applications, and the proposed methodology hybridizes global inducing point and data subset-based local GP approximation.

...read moreread less

Abstract: Gaussian processes (GPs) serve as flexible surrogates for complex surfaces, but buckle under the cubic cost of matrix decompositions with big training data sizes. Geospatial and machine learning communities suggest pseudo-inputs, or inducing points, as one strategy to obtain an approximation easing that computational burden. However, we show how placement of inducing points and their multitude can be thwarted by pathologies, especially in large-scale dynamic response surface modeling tasks. As remedy, we suggest porting the inducing point idea, which is usually applied globally, over to a more local context where selection is both easier and faster. In this way, our proposed methodology hybridizes global inducing point and data subset-based local GP approximation. A cascade of strategies for planning the selection of local inducing points is provided, and comparisons are drawn to related methodology with emphasis on computer surrogate modeling applications. We show that local inducing points extend their global and data subset component parts on the accuracy–computational efficiency frontier. Illustrative examples are provided on benchmark data and a large-scale real-simulation satellite drag interpolation problem.

...read moreread less

10 citations

Journal Article•DOI•

Ensemble slice sampling

[...]

Minas Karamanis¹, Florian Beutler¹•Institutions (1)

University of Edinburgh¹

01 Sep 2021-Statistics and Computing

TL;DR: The Ensemble Slice Sampling (ESS) algorithm as discussed by the authors adapts to the characteristics of the target distribution with minimal hand-tuning by using an ensemble of parallel walkers in order to handle strong correlations between parameters.

...read moreread less

Abstract: Slice sampling has emerged as a powerful Markov Chain Monte Carlo algorithm that adapts to the characteristics of the target distribution with minimal hand-tuning. However, Slice Sampling’s performance is highly sensitive to the user-specified initial length scale hyperparameter and the method generally struggles with poorly scaled or strongly correlated distributions. This paper introduces Ensemble Slice Sampling (ESS), a new class of algorithms that bypasses such difficulties by adaptively tuning the initial length scale and utilising an ensemble of parallel walkers in order to efficiently handle strong correlations between parameters. These affine-invariant algorithms are trivial to construct, require no hand-tuning, and can easily be implemented in parallel computing environments. Empirical tests show that Ensemble Slice Sampling can improve efficiency by more than an order of magnitude compared to conventional MCMC methods on a broad range of highly correlated target distributions. In cases of strongly multimodal target distributions, Ensemble Slice Sampling can sample efficiently even in high dimensions. We argue that the parallel, black-box and gradient-free nature of the method renders it ideal for use in scientific fields such as physics, astrophysics and cosmology which are dominated by a wide variety of computationally expensive and non-differentiable models.

...read moreread less

10 citations

Journal Article•DOI•

[...]

Valentin De Bortoli¹, Valentin De Bortoli², Alain Durmus¹, Marcelo Pereyra³, Ana Fernandez Vidal³ - Show less +1 more•Institutions (3)

École Normale Supérieure¹, University of Oxford², Heriot-Watt University³

A piecewise deterministic Monte Carlo method for diffusion bridges

TL;DR: In this article, the Langevin Langevin algorithm is used to construct the stochastic approximation, which leads to a highly efficient algorithm with favorable convergence properties that can be quantified explicitly and easily checked.

...read moreread less

Abstract: Stochastic approximation methods play a central role in maximum likelihood estimation problems involving intractable likelihood functions, such as marginal likelihoods arising in problems with missing or incomplete data, and in parametric empirical Bayesian estimation. Combined with Markov chain Monte Carlo algorithms, these stochastic optimisation methods have been successfully applied to a wide range of problems in science and industry. However, this strategy scales poorly to large problems because of methodological and theoretical difficulties related to using high-dimensional Markov chain Monte Carlo algorithms within a stochastic approximation scheme. This paper proposes to address these difficulties by using unadjusted Langevin algorithms to construct the stochastic approximation. This leads to a highly efficient stochastic optimisation methodology with favourable convergence properties that can be quantified explicitly and easily checked. The proposed methodology is demonstrated with three experiments, including a challenging application to statistical audio analysis and a sparse Bayesian logistic regression with random effects problem.

...read moreread less

10 citations

Journal Article•DOI•

[...]

Joris Bierkens¹, Sebastiano Grazzi¹, Frank van der Meulen¹, Moritz Schauer², Moritz Schauer³ - Show less +1 more•Institutions (3)

Delft University of Technology¹, Chalmers University of Technology², University of Gothenburg³

A closed-form filter for binary time series

TL;DR: The Zig-Zag sampler as mentioned in this paper is a rejection-free sampling scheme based on a non-reversible continuous piecewise deterministic Markov process, which expands the diffusion path in a truncated Faber-Schauder basis.

...read moreread less

Abstract: We introduce the use of the Zig-Zag sampler to the problem of sampling conditional diffusion processes (diffusion bridges). The Zig-Zag sampler is a rejection-free sampling scheme based on a non-reversible continuous piecewise deterministic Markov process. Similar to the Levy–Ciesielski construction of a Brownian motion, we expand the diffusion path in a truncated Faber–Schauder basis. The coefficients within the basis are sampled using a Zig-Zag sampler. A key innovation is the use of the fully local algorithm for the Zig-Zag sampler that allows to exploit the sparsity structure implied by the dependency graph of the coefficients and by the subsampling technique to reduce the complexity of the algorithm. We illustrate the performance of the proposed methods in a number of examples.

...read moreread less

9 citations

Journal Article•DOI•

[...]

Augusto Fasano¹, Augusto Fasano², Giovanni Rebaudo³, Daniele Durante⁴, Sonia Petrone⁴ - Show less +1 more•Institutions (4)

Collegio Carlo Alberto¹, University of Turin², University of Texas at Austin³, Bocconi University⁴

Scalable Bayesian inference for self-excitatory stochastic processes applied to big American gunfire data.

TL;DR: In this paper, the authors show that the filtering, predictive and smoothing distributions in dynamic probit models with Gaussian state variables are, in fact, available and belong to a class of unified skewnormals (sun) whose parameters can be updated recursively in time via analytical expressions.

...read moreread less

Abstract: Non-Gaussian state-space models arise in several applications, and within this framework the binary time series setting provides a relevant example. However, unlike for Gaussian state-space models — where filtering, predictive and smoothing distributions are available in closed form — binary state-space models require approximations or sequential Monte Carlo strategies for inference and prediction. This is due to the apparent absence of conjugacy between the Gaussian states and the likelihood induced by the observation equation for the binary data. In this article we prove that the filtering, predictive and smoothing distributions in dynamic probit models with Gaussian state variables are, in fact, available and belong to a class of unified skew-normals (sun) whose parameters can be updated recursively in time via analytical expressions. Also the key functionals of these distributions are, in principle, available, but their calculation requires the evaluation of multivariate Gaussian cumulative distribution functions. Leveraging sun properties, we address this issue via novel Monte Carlo methods based on independent samples from the smoothing distribution, that can easily be adapted to the filtering and predictive case, thus improving state-of-the-art approximate and sequential Monte Carlo inference in small-to-moderate dimensional studies. Novel sequential Monte Carlo procedures that exploit the sun properties are also developed to deal with online inference in high dimensions. Performance gains over competitors are outlined in a financial application.

...read moreread less

Journal Article•DOI•

[...]

Andrew Holbrook¹, Charles Loeffler², Seth Flaxman³, Marc A. Suchard¹•Institutions (3)

University of California, Los Angeles¹, University of Pennsylvania², Imperial College London³

An adaptive MCMC method for Bayesian variable selection in logistic and accelerated failure time regression models

TL;DR: The hpHawkes as discussed by the authors package provides a high-performance computing framework for Bayesian analysis of big gunshot data generated in Washington D.C. between 2006 and 2019, thereby extending a past analysis of the same data from under 10,000 to over 85,000 observations.

...read moreread less

Abstract: The Hawkes process and its extensions effectively model self-excitatory phenomena including earthquakes, viral pandemics, financial transactions, neural spike trains and the spread of memes through social networks. The usefulness of these stochastic process models within a host of economic sectors and scientific disciplines is undercut by the processes’ computational burden: complexity of likelihood evaluations grows quadratically in the number of observations for both the temporal and spatiotemporal Hawkes processes. We show that, with care, one may parallelize these calculations using both central and graphics processing unit implementations to achieve over 100-fold speedups over single-core processing. Using a simple adaptive Metropolis–Hastings scheme, we apply our high-performance computing framework to a Bayesian analysis of big gunshot data generated in Washington D.C. between the years of 2006 and 2019, thereby extending a past analysis of the same data from under 10,000 to over 85,000 observations. To encourage widespread use, we provide hpHawkes, an open-source R package, and discuss high-level implementation and program design for leveraging aspects of computational hardware that become necessary in a big data setting.

...read moreread less

Journal Article•DOI•

[...]

Kitty Wan¹, Jim E. Griffin²•Institutions (2)

Novartis¹, University College London²

15 Jul 2021-Statistics and Computing

TL;DR: In this paper, the adaptive scaled independence sampler (ASIS) algorithm is applied to logistic regression and accelerated failure time models, and compared with data augmentation, Laplace approximation and correlated pseudo-marginal method.

...read moreread less

Abstract: Bayesian variable selection is an important method for discovering variables which are most useful for explaining the variation in a response. The widespread use of this method has been restricted by the challenging computational problem of sampling from the corresponding posterior distribution. Recently, the use of adaptive Monte Carlo methods has been shown to lead to performance improvement over traditionally used algorithms in linear regression models. This paper looks at applying one of these algorithms (the adaptively scaled independence sampler) to logistic regression and accelerated failure time models. We investigate the use of this algorithm with data augmentation, Laplace approximation and the correlated pseudo-marginal method. The performance of the algorithms is compared on several genomic data sets.

...read moreread less

Journal Article•DOI•

Bayesian additive regression trees with model trees

[...]

Estevão B. Prado¹, Rafael de Andrade Moral¹, Andrew C. Parnell¹•Institutions (1)

Maynooth University¹

Accelerating sequential Monte Carlo with surrogate likelihoods

TL;DR: MOTR-BART as mentioned in this paper is an extension of BART that considers piecewise linear functions at node levels instead of piecewise constants, rather than having a unique value at node level for the prediction, instead of estimating a linear predictor considering the covariates that have been used as the split variables in the corresponding tree.

...read moreread less

Abstract: Bayesian additive regression trees (BART) is a tree-based machine learning method that has been successfully applied to regression and classification problems. BART assumes regularisation priors on a set of trees that work as weak learners and is very flexible for predicting in the presence of nonlinearity and high-order interactions. In this paper, we introduce an extension of BART, called model trees BART (MOTR-BART), that considers piecewise linear functions at node levels instead of piecewise constants. In MOTR-BART, rather than having a unique value at node level for the prediction, a linear predictor is estimated considering the covariates that have been used as the split variables in the corresponding tree. In our approach, local linearities are captured more efficiently and fewer trees are required to achieve equal or better performance than BART. Via simulation studies and real data applications, we compare MOTR-BART to its main competitors. R code for MOTR-BART implementation is available at https://github.com/ebprado/MOTR-BART .

...read moreread less

Journal Article•DOI•

[...]

Joshua J. Bon¹, Anthony Lee², Christopher C. Drovandi¹•Institutions (2)

Queensland University of Technology¹, University of Bristol²

18 Aug 2021-Statistics and Computing

TL;DR: In this paper, a delayed-acceptance kernel for Markov chain Monte Carlo (MCMC) is proposed to reduce the number of expensive likelihoods evaluations required to approximate a posterior expectation.

...read moreread less

Abstract: Delayed-acceptance is a technique for reducing computational effort for Bayesian models with expensive likelihoods. Using a delayed-acceptance kernel for Markov chain Monte Carlo can reduce the number of expensive likelihoods evaluations required to approximate a posterior expectation. Delayed-acceptance uses a surrogate, or approximate, likelihood to avoid evaluation of the expensive likelihood when possible. Within the sequential Monte Carlo framework, we utilise the history of the sampler to adaptively tune the surrogate likelihood to yield better approximations of the expensive likelihood and use a surrogate first annealing schedule to further increase computational efficiency. Moreover, we propose a framework for optimising computation time whilst avoiding particle degeneracy, which encapsulates existing strategies in the literature. Overall, we develop a novel algorithm for computationally efficient SMC with expensive likelihood functions. The method is applied to static Bayesian models, which we demonstrate on toy and real examples.

...read moreread less

Journal Article•DOI•

A wavelet-based approach for imputation in nonstationary multivariate time series

[...]

Rebecca E. Wilson¹, Idris A. Eckley¹, Matthew A. Nunes², Timothy Park³•Institutions (3)

Lancaster University¹, University of Bath², Royal Dutch Shell³

17 Feb 2021-Statistics and Computing

TL;DR: In this article, a novel method for data imputation in multivariate non-stationary time series, based on the so-called locally stationary wavelet modelling paradigm, is introduced.

...read moreread less

Abstract: Many multivariate time series observed in practice are second order nonstationary, i.e. their covariance properties vary over time. In addition, missing observations in such data are encountered in many applications of interest, due to recording failures or sensor dropout, hindering successful analysis. This article introduces a novel method for data imputation in multivariate nonstationary time series, based on the so-called locally stationary wavelet modelling paradigm. Our methodology is shown to perform well across a range of simulation scenarios, with a variety of missingness structures, as well as being competitive in the stationary time series setting. We also demonstrate our technique on data arising in a health monitoring application.

...read moreread less

Journal Article•DOI•

Convergence rates for optimised adaptive importance samplers

[...]

Ömer Deniz Akyildiz¹, Ömer Deniz Akyildiz², Joaquín Míguez³•Institutions (3)

The Turing Institute¹, University of Warwick², Charles III University of Madrid³

01 Mar 2021-Statistics and Computing

TL;DR: In this paper, the adaptive importance samplers (OAIS) were investigated and the convergence rate of the adaptive Monte Carlo algorithm was shown to be bounded by nonasymptotic error bounds for the mean squared errors (MSEs).

...read moreread less

Abstract: Adaptive importance samplers are adaptive Monte Carlo algorithms to estimate expectations with respect to some target distribution which adapt themselves to obtain better estimators over a sequence of iterations. Although it is straightforward to show that they have the same $$\mathcal {O}(1/\sqrt{N})$$ convergence rate as standard importance samplers, where N is the number of Monte Carlo samples, the behaviour of adaptive importance samplers over the number of iterations has been left relatively unexplored. In this work, we investigate an adaptation strategy based on convex optimisation which leads to a class of adaptive importance samplers termed optimised adaptive importance samplers (OAIS). These samplers rely on the iterative minimisation of the $$\chi ^2$$ -divergence between an exponential family proposal and the target. The analysed algorithms are closely related to the class of adaptive importance samplers which minimise the variance of the weight function. We first prove non-asymptotic error bounds for the mean squared errors (MSEs) of these algorithms, which explicitly depend on the number of iterations and the number of samples together. The non-asymptotic bounds derived in this paper imply that when the target belongs to the exponential family, the $$L_2$$ errors of the optimised samplers converge to the optimal rate of $$\mathcal {O}(1/\sqrt{N})$$ and the rate of convergence in the number of iterations are explicitly provided. When the target does not belong to the exponential family, the rate of convergence is the same but the asymptotic $$L_2$$ error increases by a factor $$\sqrt{\rho ^\star } > 1$$ , where $$\rho ^\star - 1$$ is the minimum $$\chi ^2$$ -divergence between the target and an exponential family proposal.

...read moreread less

Journal Article•DOI•

Particle-based energetic variational inference

[...]

Yiwei Wang¹, Jiuhai Chen¹, Chun Liu¹, Lulu Kang¹•Institutions (1)

Illinois Institute of Technology¹

On the performance of particle filters with adaptive number of particles

TL;DR: Zhang et al. as mentioned in this paper proposed a new particle-based variational inference (EVI) framework, which minimizes the VI objective function based on a prescribed energy-dissipation law.

...read moreread less

Abstract: We introduce a new variational inference (VI) framework, called energetic variational inference (EVI). It minimizes the VI objective function based on a prescribed energy-dissipation law. Using the EVI framework, we can derive many existing particle-based variational inference (ParVI) methods, including the popular Stein variational gradient descent (SVGD). More importantly, many new ParVI schemes can be created under this framework. For illustration, we propose a new particle-based EVI scheme, which performs the particle-based approximation of the density first and then uses the approximated density in the variational procedure, or “Approximation-then-Variation” for short. Thanks to this order of approximation and variation, the new scheme can maintain the variational structure at the particle level, and can significantly decrease the KL-divergence in each iteration. Numerical experiments show the proposed method outperforms some existing ParVI methods in terms of fidelity to the target distribution.

...read moreread less

Journal Article•DOI•

[...]

Victor Elvira¹, Victor Elvira², Joaquín Míguez³, Petar M. Djuric⁴•Institutions (4)

University of Edinburgh¹, The Turing Institute², Charles III University of Madrid³, Stony Brook University⁴

24 Oct 2021-Statistics and Computing

TL;DR: In this article, the performance of a class of block-adaptive particle filters (PFs) that can automatically tune their computational complexity by evaluating online certain predictive statistics which are invariant for a broad class of state-space models is investigated.

...read moreread less

Abstract: We investigate the performance of a class of particle filters (PFs) that can automatically tune their computational complexity by evaluating online certain predictive statistics which are invariant for a broad class of state-space models. To be specific, we propose a family of block-adaptive PFs based on the methodology of Elvira et al. (IEEE Trans Signal Process 65(7):1781–1794, 2017). In this class of algorithms, the number of Monte Carlo samples (known as particles) is adjusted periodically, and we prove that the theoretical error bounds of the PF actually adapt to the updates in the number of particles. The evaluation of the predictive statistics that lies at the core of the methodology is done by generating fictitious observations, i.e., particles in the observation space. We study, both analytically and numerically, the impact of the number K of these particles on the performance of the algorithm. In particular, we prove that if the predictive statistics with K fictitious observations converged exactly, then the particle approximation of the filtering distribution would match the first K elements in a series of moments of the true filter. This result can be understood as a converse to some convergence theorems for PFs. From this analysis, we deduce an alternative predictive statistic that can be computed (for some models) without sampling any fictitious observations at all. Finally, we conduct an extensive simulation study that illustrates the theoretical results and provides further insights into the complexity, performance and behavior of the new class of algorithms.

...read moreread less

Journal Article•DOI•

Exploiting low-rank covariance structures for computing high-dimensional normal and Student- t probabilities

[...]

Jian Cao¹, Marc G. Genton¹, David E. Keyes¹, George Turkiyyah²•Institutions (2)

King Abdullah University of Science and Technology¹, American University of Beirut²

12 Jan 2021-Statistics and Computing

TL;DR: In this article, a tile-low-rank representation of covariance matrices with a block-reordering scheme for efficient quasi-Monte Carlo simulation is presented. But the method is not suitable for high-dimensional skew-normal random fields.

...read moreread less

Abstract: We present a preconditioned Monte Carlo method for computing high-dimensional multivariate normal and Student-t probabilities arising in spatial statistics. The approach combines a tile-low-rank representation of covariance matrices with a block-reordering scheme for efficient quasi-Monte Carlo simulation. The tile-low-rank representation decomposes the high-dimensional problem into many diagonal-block-size problems and low-rank connections. The block-reordering scheme reorders between and within the diagonal blocks to reduce the impact of integration variables from right to left, thus improving the Monte Carlo convergence rate. Simulations up to dimension 65,536 suggest that the new method can improve the run time by an order of magnitude compared with the hierarchical quasi-Monte Carlo method and two orders of magnitude compared with the dense quasi-Monte Carlo method. Our method also forms a strong substitute for the approximate conditioning methods as a more robust estimation with error guarantees. An application study to wind stochastic generators is provided to illustrate that the new computational method makes the maximum likelihood estimation feasible for high-dimensional skew-normal random fields.

...read moreread less

Journal Article•DOI•

Fast Incremental Expectation Maximization for finite-sum optimization: nonasymptotic convergence

[...]

Gersende Fort¹, Pierre Gach¹, Eric Moulines², Eric Moulines³•Institutions (3)

Institut de Mathématiques de Toulouse¹, École Polytechnique², National Research University – Higher School of Economics³

Evaluating Gaussian process metamodels and sequential designs for noisy level set estimation

TL;DR: In this paper, the authors recast Fast Incremental Expectation Maximization (FIEM) and other incremental EM type algorithms in the Stochastic Approximation within EM framework and provided nonasymptotic bounds for the convergence in expectation as a function of the number of examples n and of the maximal number of iterations.

...read moreread less

Abstract: Fast incremental expectation maximization (FIEM) is a version of the EM framework for large datasets. In this paper, we first recast FIEM and other incremental EM type algorithms in the Stochastic Approximation within EM framework. Then, we provide nonasymptotic bounds for the convergence in expectation as a function of the number of examples n and of the maximal number of iterations $$K_\mathrm {max}$$ . We propose two strategies for achieving an $$\epsilon $$ -approximate stationary point, respectively with $$K_\mathrm {max}= O(n^{2/3}/\epsilon )$$ and $$K_\mathrm {max}= O(\sqrt{n}/\epsilon ^{3/2})$$ , both strategies relying on a random termination rule before $$K_\mathrm {max}$$ and on a constant step size in the Stochastic Approximation step. Our bounds provide some improvements on the literature. First, they allow $$K_\mathrm {max}$$ to scale as $$\sqrt{n}$$ which is better than $$n^{2/3}$$ which was the best rate obtained so far; it is at the cost of a larger dependence upon the tolerance $$\epsilon $$ , thus making this control relevant for small to medium accuracy with respect to the number of examples n. Second, for the $$n^{2/3}$$ -rate, the numerical illustrations show that thanks to an optimized choice of the step size and of the bounds in terms of quantities characterizing the optimization problem at hand, our results design a less conservative choice of the step size and provide a better control of the convergence in expectation.

...read moreread less

Journal Article•DOI•

[...]

Xiong Lyu¹, Mickaël Binois, Michael Ludkovski¹•Institutions (1)

University of California, Santa Barbara¹

Consistent online Gaussian process regression without the sample complexity bottleneck

TL;DR: In this paper, the authors consider the problem of learning the level set for which a noisy black-box function exceeds a given threshold, and investigate Gaussian process (GP) metamodels.

...read moreread less

Abstract: We consider the problem of learning the level set for which a noisy black-box function exceeds a given threshold. To efficiently reconstruct the level set, we investigate Gaussian process (GP) metamodels. Our focus is on strongly stochastic samplers, in particular with heavy-tailed simulation noise and low signal-to-noise ratio. To guard against noise misspecification, we assess the performance of three variants: (i) GPs with Student-$t$ observations; (ii) Student-$t$ processes (TPs); and (iii) classification GPs modeling the sign of the response. In conjunction with these metamodels, we analyze several acquisition functions for guiding the sequential experimental designs, extending existing stepwise uncertainty reduction criteria to the stochastic contour-finding context. This also motivates our development of (approximate) updating formulas to efficiently compute such acquisition functions. Our schemes are benchmarked by using a variety of synthetic experiments in 1--6 dimensions. We also consider an application of level set estimation for determining the optimal exercise policy of Bermudan options in finance.

...read moreread less

Journal Article•DOI•

[...]

Alec Koppel¹, Hrusikesha Pradhan², Ketan Rajawat²•Institutions (2)

United States Army Research Laboratory¹, Indian Institute of Technology Kanpur²

01 Nov 2021-Statistics and Computing

TL;DR: The Parsimonious Online Gaussian Process (POGP) as discussed by the authors is the first online Gaussian process approximation that preserves convergence to the population posterior, i.e., asymptotic posterior consistency, while ameliorating its intractable complexity growth with the sample size.

...read moreread less

Abstract: Gaussian processes provide a framework for nonlinear nonparametric Bayesian inference widely applicable across science and engineering. Unfortunately, their computational burden scales cubically with the training sample size, which in the case that samples arrive in perpetuity, approaches infinity. This issue necessitates approximations for use with streaming data, which to date mostly lack convergence guarantees. Thus, we develop the first online Gaussian process approximation that preserves convergence to the population posterior, i.e., asymptotic posterior consistency, while ameliorating its intractable complexity growth with the sample size. We propose an online compression scheme that, following each a posteriori update, fixes an error neighborhood with respect to the Hellinger metric centered at the current posterior, and greedily tosses out past kernel dictionary elements until its boundary is hit. We call the resulting method Parsimonious Online Gaussian Processes (POG). For diminishing error radius, asymptotic statistical stationarity is achieved (Theorem 1ii) at the cost of unbounded memory in the limit. On the other hand, for constant error radius, POG converges to a neighborhood of stationarity (Theorem 1ii) but with finite memory at-worst determined by the metric entropy of the feature space (Theorem 2). Here stationarity refers to the distributional distance between sequential marginal posteriors approaching null with the time index. Experimental results are presented on several nonlinear regression problems which illuminates the merits of this approach as compared with alternatives that fix the subspace dimension defining the history of past points.

...read moreread less

Journal Article•DOI•

Fisher Scoring for crossed factor linear mixed models.

[...]

Thomas Maullin-Sapey¹, Thomas E. Nichols¹•Institutions (1)

University of Oxford¹

Robust fitting for generalized additive models for location, scale and shape

TL;DR: This paper presents and derive new expressions for the extension of an algorithm classically used for single-factor LMM parameter estimation, Fisher Scoring, to multiple, crossed-factor designs, and provides a new method for LMM Satterthwaite degrees of freedom estimation based on analytical results, which does not require iterative gradient estimation.

...read moreread less

Abstract: The analysis of longitudinal, heterogeneous or unbalanced clustered data is of primary importance to a wide range of applications. The linear mixed model (LMM) is a popular and flexible extension of the linear model specifically designed for such purposes. Historically, a large proportion of material published on the LMM concerns the application of popular numerical optimization algorithms, such as Newton–Raphson, Fisher Scoring and expectation maximization to single-factor LMMs (i.e. LMMs that only contain one “factor” by which observations are grouped). However, in recent years, the focus of the LMM literature has moved towards the development of estimation and inference methods for more complex, multi-factored designs. In this paper, we present and derive new expressions for the extension of an algorithm classically used for single-factor LMM parameter estimation, Fisher Scoring, to multiple, crossed-factor designs. Through simulation and real data examples, we compare five variants of the Fisher Scoring algorithm with one another, as well as against a baseline established by the R package lme4, and find evidence of correctness and strong computational efficiency for four of the five proposed approaches. Additionally, we provide a new method for LMM Satterthwaite degrees of freedom estimation based on analytical results, which does not require iterative gradient estimation. Via simulation, we find that this approach produces estimates with both lower bias and lower variance than the existing methods.

...read moreread less

Journal Article•DOI•

[...]

William H. Aeberhard¹, Eva Cantoni², Giampiero Marra³, Rosalba Radice⁴•Institutions (4)

ETH Zurich¹, University of Geneva², University College London³, City University London⁴

12 Jan 2021-Statistics and Computing

TL;DR: In this paper, the authors propose a general approach to achieve robustness in fitting generalized additive models for location, scale and shape (GAMLSS) by limiting the contribution of observations with low log-likelihood values.

...read moreread less

Abstract: The validity of estimation and smoothing parameter selection for the wide class of generalized additive models for location, scale and shape (GAMLSS) relies on the correct specification of a likelihood function. Deviations from such assumption are known to mislead any likelihood-based inference and can hinder penalization schemes meant to ensure some degree of smoothness for nonlinear effects. We propose a general approach to achieve robustness in fitting GAMLSSs by limiting the contribution of observations with low log-likelihood values. Robust selection of the smoothing parameters can be carried out either by minimizing information criteria that naturally arise from the robustified likelihood or via an extended Fellner–Schall method. The latter allows for automatic smoothing parameter selection and is particularly advantageous in applications with multiple smoothing parameters. We also address the challenge of tuning robust estimators for models with nonlinear effects by proposing a novel median downweighting proportion criterion. This enables a fair comparison with existing robust estimators for the special case of generalized additive models, where our estimator competes favorably. The overall good performance of our proposal is illustrated by further simulations in the GAMLSS setting and by an application to functional magnetic resonance brain imaging using bivariate smoothing splines.

...read moreread less

Journal Article•DOI•

Deep state-space Gaussian processes

[...]

Zheng Zhao¹, Muhammad F. Emzir¹, Simo Särkkä¹•Institutions (1)

Aalto University¹

23 Sep 2021-Statistics and Computing

TL;DR: In this paper, a state-space approach to deep Gaussian process regression is proposed, which is based on a non-linear hierarchical system of linear stochastic differential equations, where each SDE corresponds to a conditional GP.

...read moreread less

Abstract: This paper is concerned with a state-space approach to deep Gaussian process (DGP) regression. We construct the DGP by hierarchically putting transformed Gaussian process (GP) priors on the length scales and magnitudes of the next level of Gaussian processes in the hierarchy. The idea of the state-space approach is to represent the DGP as a non-linear hierarchical system of linear stochastic differential equations (SDEs), where each SDE corresponds to a conditional GP. The DGP regression problem then becomes a state estimation problem, and we can estimate the state efficiently with sequential methods by using the Markov property of the state-space DGP. The computational complexity scales linearly with respect to the number of measurements. Based on this, we formulate state-space MAP as well as Bayesian filtering and smoothing solutions to the DGP regression problem. We demonstrate the performance of the proposed models and methods on synthetic non-stationary signals and apply the state-space DGP to detection of the gravitational waves from LIGO measurements.

...read moreread less

Journal Article•DOI•

Bayesian numerical methods for nonlinear partial differential equations

[...]

Junyang Wang¹, Jon Cockayne², Oksana Chkrebtii³, Timothy Sullivan⁴, Chris J. Oates¹ - Show less +1 more•Institutions (4)

Newcastle University¹, The Turing Institute², Ohio State University³, University of Warwick⁴

27 Jul 2021-Statistics and Computing

TL;DR: In this paper, a probabilistic uncertainty quantification for the unknown solution of a non-linear partial differential equation (PDE) is proposed, which is based on discretisation of the nonlinear differential operator.

...read moreread less

Abstract: The numerical solution of differential equations can be formulated as an inference problem to which formal statistical approaches can be applied. However, nonlinear partial differential equations (PDEs) pose substantial challenges from an inferential perspective, most notably the absence of explicit conditioning formula. This paper extends earlier work on linear PDEs to a general class of initial value problems specified by nonlinear PDEs, motivated by problems for which evaluations of the right-hand-side, initial conditions, or boundary conditions of the PDE have a high computational cost. The proposed method can be viewed as exact Bayesian inference under an approximate likelihood, which is based on discretisation of the nonlinear differential operator. Proof-of-concept experimental results demonstrate that meaningful probabilistic uncertainty quantification for the unknown solution of the PDE can be performed, while controlling the number of times the right-hand-side, initial and boundary conditions are evaluated. A suitable prior model for the solution of PDEs is identified using novel theoretical analysis of the sample path properties of Matern processes, which may be of independent interest.

...read moreread less

Journal Article•DOI•

Fast generation of Gaussian random fields for direct numerical simulations of stochastic transport

[...]

D. I. Palade, Madalina Vlad

01 Sep 2021-Statistics and Computing

TL;DR: In this article, a discrete method of constructing Gaussian Random Fields based on a combination of modified spectral representations, Fourier and Blob is proposed for Direct Numerical Simulations of the V-Langevin equations.

...read moreread less

Abstract: We propose a novel discrete method of constructing Gaussian Random Fields based on a combination of modified spectral representations, Fourier and Blob. The method is intended for Direct Numerical Simulations of the V-Langevin equations. The latter are stereotypical descriptions of anomalous stochastic transport in various physical systems. From an Eulerian perspective, our method is designed to exhibit improved convergence rates. From a Lagrangian perspective, our method offers a pertinent description of particle trajectories in turbulent velocity fields: the exact Lagrangian invariant laws are well reproduced. From a computational perspective, the computing time is reduced by a factor of two in comparison with Fourier-like or Blob-like methods and an order of magnitude in comparison with FFT algorithms.

...read moreread less

Journal Article•DOI•

Control variate selection for Monte Carlo integration

[...]

Rémi Leluc¹, François Portier¹, Johan Segers²•Institutions (2)

Télécom ParisTech¹, Université catholique de Louvain²