scispace - formally typeset
Search or ask a question

Showing papers in "arXiv: Computation in 2020"


Posted Content
TL;DR: The cutpointr package offers robust methods for estimating optimal cutpoints and the out-of-sample performance of binary classification algorithms, and provides mechanisms to utilize user-defined metrics and estimation methods.
Abstract: 'Optimal cutpoints' for binary classification tasks are often established by testing which cutpoint yields the best discrimination, for example the Youden index, in a specific sample. This results in 'optimal' cutpoints that are highly variable and systematically overestimate the out-of-sample performance. To address these concerns, the cutpointr package offers robust methods for estimating optimal cutpoints and the out-of-sample performance. The robust methods include bootstrapping and smoothing based on kernel estimation, generalized additive models, smoothing splines, and local regression. These methods can be applied to a wide range of binary-classification and cost-based metrics. cutpointr also provides mechanisms to utilize user-defined metrics and estimation methods. The package has capabilities for parallelization of the bootstrapping, including reproducible random number generation. Furthermore, it is pipe-friendly, for example for compatibility with functions from tidyverse. Various functions for plotting receiver operating characteristic curves, precision recall graphs, bootstrap results and other representations of the data are included. The package contains example data from a study on psychological characteristics and suicide attempts suitable for applying binary classification algorithms.

115 citations


Posted Content
TL;DR: EP is revisited as a prototype for scalable algorithms that partition big datasets into many parts and analyze each part in parallel to perform inference of shared parameters to be particularly efficient for hierarchical models.
Abstract: We revisit expectation propagation (EP) as a prototype for scalable algorithms that partition big datasets into many parts and analyze each part in parallel to perform inference of shared parameters. The algorithm should be particularly efficient for hierarchical models, for which the EP algorithm works on the shared parameters (hyperparameters) of the model. The central idea of EP is to work at each step with a "tilted distribution" that combines the likelihood for a part of the data with the "cavity distribution," which is the approximate model for the prior and all other parts of the data. EP iteratively approximates the moments of the tilted distributions and incorporates those approximations into a global posterior approximation. As such, EP can be used to divide the computation for large models into manageable sizes. The computation for each partition can be made parallel with occasional exchanging of information between processes through the global posterior approximation. Moments of multivariate tilted distributions can be approximated in various ways, including, MCMC, Laplace approximations, and importance sampling.

62 citations


Posted Content
TL;DR: In this article, optimal subsampling for quantile regression is investigated and algorithms based on the optimal sampling probabilities are proposed to obtain asymptotic distributions and optimality of the resulting estimators.
Abstract: We investigate optimal subsampling for quantile regression. We derive the asymptotic distribution of a general subsampling estimator and then derive two versions of optimal subsampling probabilities. One version minimizes the trace of the asymptotic variance-covariance matrix for a linearly transformed parameter estimator and the other minimizes that of the original parameter estimator. The former does not depend on the densities of the responses given covariates and is easy to implement. Algorithms based on optimal subsampling probabilities are proposed and asymptotic distributions and asymptotic optimality of the resulting estimators are established. Furthermore, we propose an iterative subsampling procedure based on the optimal subsampling probabilities in the linearly transformed parameter estimation which has great scalability to utilize available computational resources. In addition, this procedure yields standard errors for parameter estimators without estimating the densities of the responses given the covariates. We provide numerical examples based on both simulated and real data to illustrate the proposed method.

60 citations


Posted Content
TL;DR: In this article, the authors provide an up-to-date introduction to marginal likelihood computation for model selection and hypothesis testing, highlighting limitations, benefits, connections and differences among the different techniques.
Abstract: This is an up-to-date introduction to, and overview of, marginal likelihood computation for model selection and hypothesis testing. Computing normalizing constants of probability models (or ratio of constants) is a fundamental issue in many applications in statistics, applied mathematics, signal processing and machine learning. This article provides a comprehensive study of the state-of-the-art of the topic. We highlight limitations, benefits, connections and differences among the different techniques. Problems and possible solutions with the use of improper priors are also described. Some of the most relevant methodologies are compared through theoretical comparisons and numerical experiments.

41 citations


Posted Content
TL;DR: A new exact MIP framework for $\ell_0\ell_2$-regularized regression that can scale to $p \sim 10^7$, achieving over $3600x speed-ups compared to the fastest exact methods, and a novel method for obtaining dual bounds from primal solutions, which certifiably works in high dimensions.
Abstract: We consider the least squares regression problem, penalized with a combination of the $\ell_{0}$ and squared $\ell_{2}$ penalty functions (a.k.a. $\ell_0 \ell_2$ regularization). Recent work shows that the resulting estimators are of key importance in many high-dimensional statistical settings. However, exact computation of these estimators remains a major challenge. Indeed, modern exact methods, based on mixed integer programming (MIP), face difficulties when the number of features $p \sim 10^4$. In this work, we present a new exact MIP framework for $\ell_0\ell_2$-regularized regression that can scale to $p \sim 10^7$, achieving speedups of at least $5000$x, compared to state-of-the-art exact methods. Unlike recent work, which relies on modern commercial MIP solvers, we design a specialized nonlinear branch-and-bound (BnB) framework, by critically exploiting the problem structure. A key distinguishing component in our framework lies in efficiently solving the node relaxations using a specialized first-order method, based on coordinate descent (CD). Our CD-based method effectively leverages information across the BnB nodes, through using warm starts, active sets, and gradient screening. In addition, we design a novel method for obtaining dual bounds from primal CD solutions, which certifiably works in high dimensions. Experiments on synthetic and real high-dimensional datasets demonstrate that our framework is not only significantly faster than the state of the art, but can also deliver certifiably optimal solutions to statistically challenging instances that cannot be handled with existing methods. We open source the implementation through our toolkit L0BnB.

37 citations


Journal ArticleDOI
TL;DR: This article provides two examples of the consequences of discrepancy when calibrating models at the ion channel and action potential scales, and attempts to account for this discrepancy when calibration and validating an ion channel model using different methods.
Abstract: Uncertainty quantification (UQ) is a vital step in using mathematical models and simulations to take decisions. The field of cardiac simulation has begun to explore and adopt UQ methods to characterise uncertainty in model inputs and how that propagates through to outputs or predictions. In this perspective piece we draw attention to an important and under-addressed source of uncertainty in our predictions -- that of uncertainty in the model structure or the equations themselves. The difference between imperfect models and reality is termed model discrepancy, and we are often uncertain as to the size and consequences of this discrepancy. Here we provide two examples of the consequences of discrepancy when calibrating models at the ion channel and action potential scales. Furthermore, we attempt to account for this discrepancy when calibrating and validating an ion channel model using different methods, based on modelling the discrepancy using Gaussian processes (GPs) and autoregressive-moving-average (ARMA) models, then highlight the advantages and shortcomings of each approach. Finally, suggestions and lines of enquiry for future work are provided.

35 citations


Posted Content
TL;DR: A novel approach for low-rank approximate Bayesian Gaussian processes, based on a basis function approximation via Laplace eigenfunctions for stationary covariance functions, which exhibits an attractive computational complexity due to its linear structure and is easy to implement in probabilistic programming frameworks.
Abstract: Gaussian processes are powerful non-parametric probabilistic models for stochastic functions. However they entail a complexity that is computationally intractable when the number of observations is large, especially when estimated with fully Bayesian methods such as Markov chain Monte Carlo. In this paper, we focus on a novel approach for low-rank approximate Bayesian Gaussian processes, based on a basis function approximation via Laplace eigenfunctions for stationary covariance functions. The main contribution of this paper is a detailed analysis of the performance and practical implementation of the method in relation to key factors such as the number of basis functions, domain of the prediction space, and smoothness of the latent function. We provide intuitive visualizations and recommendations for choosing the values of these factors, which make it easier for users to improve approximation accuracy and computational performance. We also propose diagnostics for checking that the number of basis functions and the domain of the prediction space are adequate given the data. The proposed approach is simple and exhibits an attractive computational complexity due to its linear structure, and it is easy to implement in probabilistic programming frameworks. Several illustrative examples of the performance and applicability of the method in the probabilistic programming language Stan are presented together with the underlying Stan model code.

34 citations


Posted Content
TL;DR: The rstanarm R package can be used to fit a wide range of Bayesian survival models, including standard parametric (exponential, Weibull, Gompertz) and flexibleparametric (spline-based) hazard models, as well asstandard parametric accelerated failure time (AFT) models.
Abstract: Survival data is encountered in a range of disciplines, most notably health and medical research. Although Bayesian approaches to the analysis of survival data can provide a number of benefits, they are less widely used than classical (e.g. likelihood-based) approaches. This may be in part due to a relative absence of user-friendly implementations of Bayesian survival models. In this article we describe how the rstanarm R package can be used to fit a wide range of Bayesian survival models. The rstanarm package facilitates Bayesian regression modelling by providing a user-friendly interface (users specify their model using customary R formula syntax and data frames) and using the Stan software (a C++ library for Bayesian inference) for the back-end estimation. The suite of models that can be estimated using rstanarm is broad and includes generalised linear models (GLMs), generalised linear mixed models (GLMMs), generalised additive models (GAMs) and more. In this article we focus only on the survival modelling functionality. This includes standard parametric (exponential, Weibull, Gompertz) and flexible parametric (spline-based) hazard models, as well as standard parametric accelerated failure time (AFT) models. All types of censoring (left, right, interval) are allowed, as is delayed entry (left truncation), time-varying covariates, time-varying effects, and frailty effects. We demonstrate the functionality through worked examples. We anticipate these implementations will increase the uptake of Bayesian survival analysis in applied research.

28 citations


Posted Content
TL;DR: This tutorial shows how to build, fit, and criticize disease transmission models in Stan, and should be useful to researchers interested in modeling the severe acute respiratory syndrome coronavirus 2 (SARS‐CoV‐2) pandemic and other infectious diseases in a Bayesian framework.
Abstract: This tutorial shows how to build, fit, and criticize disease transmission models in Stan, and should be useful to researchers interested in modeling the SARS-CoV-2 pandemic and other infectious diseases in a Bayesian framework. Bayesian modeling provides a principled way to quantify uncertainty and incorporate both data and prior knowledge into the model estimates. Stan is an expressive probabilistic programming language that abstracts the inference and allows users to focus on the modeling. As a result, Stan code is readable and easily extensible, which makes the modeler's work more transparent. Furthermore, Stan's main inference engine, Hamiltonian Monte Carlo sampling, is amiable to diagnostics, which means the user can verify whether the obtained inference is reliable. In this tutorial, we demonstrate how to formulate, fit, and diagnose a compartmental transmission model in Stan, first with a simple Susceptible-Infected-Recovered (SIR) model, then with a more elaborate transmission model used during the SARS-CoV-2 pandemic. We also cover advanced topics which can further help practitioners fit sophisticated models; notably, how to use simulations to probe the model and priors, and computational techniques to scale-up models based on ordinary differential equations.

26 citations


Posted Content
TL;DR: This article presents this class of methods and a number of recent advances, with the goal of helping statisticians assess the applicability and usefulness of these methods for their purposes.
Abstract: Sequential Monte Carlo samplers provide consistent approximations of sequences of probability distributions and of their normalizing constants, via particles obtained with a combination of importance weights and Markov transitions. This article presents this class of methods and a number of recent advances, with the goal of helping statisticians assess the applicability and usefulness of these methods for their purposes. Our presentation emphasizes the role of bridging distributions for computational and statistical purposes. Numerical experiments are provided on simple settings such as multivariate Normals, logistic regression and a basic susceptible-infected-recovered model, illustrating the impact of the dimension, the ability to perform inference sequentially and the estimation of normalizing constants.

26 citations


Posted Content
TL;DR: The proposed methodology for the fitting of SDE models is demonstrated, first in a simulation study with a noisy Lorenz ’63 model, and then in other applications, including dimension reduction in deterministic chaotic systems arising in the atmospheric sciences, large-scale pattern modeling in climate dynamics and simplified models for key observables arising in molecular dynamics.
Abstract: Although the governing equations of many systems, when derived from first principles, may be viewed as known, it is often too expensive to numerically simulate all the interactions within the first principles description. Therefore researchers often seek simpler descriptions that describe complex phenomena without numerically resolving all the interacting components. Stochastic differential equations (SDEs) arise naturally as models in this context. The growth in data acquisition provides an opportunity for the systematic derivation of SDE models in many disciplines. However, inconsistencies between SDEs and real data at small time scales often cause problems, when standard statistical methodology is applied to parameter estimation. The incompatibility between SDEs and real data can be addressed by deriving sufficient statistics from the time-series data and learning parameters of SDEs based on these. Following this approach, we formulate the fitting of SDEs to sufficient statistics from real data as an inverse problem and demonstrate that this inverse problem can be solved by using ensemble Kalman inversion (EKI). Furthermore, we create a framework for non-parametric learning of drift and diffusion terms by introducing hierarchical, refineable parameterizations of unknown functions, using Gaussian process regression. We demonstrate the proposed methodology for the fitting of SDE models, first in a simulation study with a noisy Lorenz 63 model, and then in other applications, including dimension reduction starting from various deterministic chaotic systems arising in the atmospheric sciences, large-scale pattern modeling in climate dynamics, and simplified models for key observables arising in molecular dynamics. The results confirm that the proposed methodology provides a robust and systematic approach to fitting SDE models to real data.

Posted Content
TL;DR: Using macroeconomic data for the US, it is found that regression models that combine time-varying parameters with the information in many predictors have the potential to improve forecasts of price inflation over a number of alternative forecasting models.
Abstract: This paper proposes a variational Bayes algorithm for computationally efficient posterior and predictive inference in time-varying parameter (TVP) models. Within this context we specify a new dynamic variable/model selection strategy for TVP dynamic regression models in the presence of a large number of predictors. This strategy allows for assessing in individual time periods which predictors are relevant (or not) for forecasting the dependent variable. The new algorithm is evaluated numerically using synthetic data and its computational advantages are established. Using macroeconomic data for the US we find that regression models that combine time-varying parameters with the information in many predictors have the potential to improve forecasts of price inflation over a number of alternative forecasting models.

Posted Content
TL;DR: The R package evgam as mentioned in this paper provides functions for fitting extreme value distributions, including the generalized extreme value and generalized Pareto distributions, through quantile regression via the asymmetric Laplace distribution, which is useful for estimating high thresholds, sometimes used to discriminate between extreme and non-extreme values.
Abstract: This article introduces the R package evgam. The package provides functions for fitting extreme value distributions. These include the generalized extreme value and generalized Pareto distributions. The former can also be fitted through a point process representation. evgam supports quantile regression via the asymmetric Laplace distribution, which can be useful for estimating high thresholds, sometimes used to discriminate between extreme and non-extreme values. The main addition of evgam is to let extreme value distribution parameters have generalized additive model forms, which can be objectively estimated using Laplace's method. Illustrative examples fitting various distributions with various specifications are given. These include daily precipitation accumulations for part of Colorado, US, used to illustrate spatial models, and daily maximum temperatures for Fort Collins, Colorado, US, used to illustrate temporal models.

Posted Content
TL;DR: This paper takes the reader on a chronological tour of Bayesian computation over the past two and a half centuries, and place all computational problems into a common framework, and describe all computational methods using a common notation.
Abstract: The Bayesian statistical paradigm uses the language of probability to express uncertainty about the phenomena that generate observed data. Probability distributions thus characterize Bayesian inference, with the rules of probability used to transform prior probability distributions for all unknowns - models, parameters, latent variables - into posterior distributions, subsequent to the observation of data. Conducting Bayesian inference requires the evaluation of integrals in which these probability distributions appear. Bayesian computation is all about evaluating such integrals in the typical case where no analytical solution exists. This paper takes the reader on a chronological tour of Bayesian computation over the past two and a half centuries. Beginning with the one-dimensional integral first confronted by Bayes in 1763, through to recent problems in which the unknowns number in the millions, we place all computational problems into a common framework, and describe all computational methods using a common notation. The aim is to help new researchers in particular - and more generally those interested in adopting a Bayesian approach to empirical work - make sense of the plethora of computational techniques that are now on offer; understand when and why different methods are useful; and see the links that do exist, between them all.

Posted Content
Cheng Meng1, Xinlian Zhang1, Jingyi Zhang1, Wenxuan Zhong1, Ping Ma1 
TL;DR: By selecting basis functions corresponding to approximately equally spaced observations, the proposed method chooses a set of basis functions with great diversity, which leads to a smaller prediction error than other basis selection methods.
Abstract: We consider the problem of approximating smoothing spline estimators in a nonparametric regression model. When applied to a sample of size $n$, the smoothing spline estimator can be expressed as a linear combination of $n$ basis functions, requiring $O(n^3)$ computational time when the number of predictors $d\geq 2$. Such a sizable computational cost hinders the broad applicability of smoothing splines. In practice, the full sample smoothing spline estimator can be approximated by an estimator based on $q$ randomly-selected basis functions, resulting in a computational cost of $O(nq^2)$. It is known that these two estimators converge at the identical rate when $q$ is of the order $O\{n^{2/(pr+1)}\}$, where $p\in [1,2]$ depends on the true function $\eta$, and $r > 1$ depends on the type of spline. Such $q$ is called the essential number of basis functions. In this article, we develop a more efficient basis selection method. By selecting the ones corresponding to roughly equal-spaced observations, the proposed method chooses a set of basis functions with a large diversity. The asymptotic analysis shows our proposed smoothing spline estimator can decrease $q$ to roughly $O\{n^{1/(pr+1)}\}$, when $d\leq pr+1$. Applications on synthetic and real-world datasets show the proposed method leads to a smaller prediction error compared with other basis selection methods.

Posted Content
TL;DR: This work exploits a connection between rare event simulation and Bayesian inverse problems to adapt dimension reduction techniques from Bayesian inference to construct new, effectively low-dimensional, biasing distributions within the cross-entropy method.
Abstract: The estimation of rare event or failure probabilities in high dimensions is of interest in many areas of science and technology. We consider problems where the rare event is expressed in terms of a computationally costly numerical model. Importance sampling with the cross-entropy method offers an efficient way to address such problems provided that a suitable parametric family of biasing densities is employed. Although some existing parametric distribution families are designed to perform efficiently in high dimensions, their applicability within the cross-entropy method is limited to problems with dimension of O(1e2). In this work, rather than directly building sampling densities in high dimensions, we focus on identifying the intrinsic low-dimensional structure of the rare event simulation problem. To this end, we exploit a connection between rare event simulation and Bayesian inverse problems. This allows us to adapt dimension reduction techniques from Bayesian inference to construct new, effectively low-dimensional, biasing distributions within the cross-entropy method. In particular, we employ the approach in [47], as it enables control of the error in the approximation of the optimal biasing distribution. We illustrate our method using two standard high-dimensional reliability benchmark problems and one structural mechanics application involving random fields.

Posted Content
TL;DR: A new algorithm is presented that offers major improvements for the computation of Shapley effects, reducing computational burden by several orders of magnitude and makes it possible to estimate all generalized (Shapley-Owen) effects for interactions.
Abstract: Shapley effects are attracting increasing attention as sensitivity measures. When the value function is the conditional variance, they account for the individual and higher order effects of a model input. They are also well defined under model input dependence. However, one of the issues associated with their use is computational cost. We present a new algorithm that offers major improvements for the computation of Shapley effects, reducing computational burden by several orders of magnitude (from $k!\cdot k$ to $2^k$, where $k$ is the number of inputs) with respect to currently available implementations. The algorithm works in the presence of input dependencies. The algorithm also makes it possible to estimate all generalized (Shapley-Owen) effects for interactions.

Posted Content
TL;DR: This paper proposes a unifying Gaussian simulation framework by deriving a stochastic counterpart of the celebrated proximal point algorithm in optimization and offers a novel and unifying revisit of most of the existing MCMC approaches while extending them.
Abstract: Efficient sampling from a high-dimensional Gaussian distribution is an old but high-stake issue. Vanilla Cholesky samplers imply a computational cost and memory requirements which can rapidly become prohibitive in high dimension. To tackle these issues, multiple methods have been proposed from different communities ranging from iterative numerical linear algebra to Markov chain Monte Carlo (MCMC) approaches. Surprisingly, no complete review and comparison of these methods have been conducted. This paper aims at reviewing all these approaches by pointing out their differences, close relations, benefits and limitations. In addition to this state of the art, this paper proposes a unifying Gaussian simulation framework by deriving a stochastic counterpart of the celebrated proximal point algorithm in optimization. This framework offers a novel and unifying revisit of most of the existing MCMC approaches while extending them. Guidelines to choose the appropriate Gaussian simulation method for a given sampling problem in high dimension are proposed and illustrated with numerical examples.

Posted Content
TL;DR: A general, efficient approach to Bayesian SEM estimation in Stan is described, contrasting it with previous implementations in R package blavaan (Merkle & Rosseel, 2018); a practical comparison shows that the new approach is clearly better.
Abstract: Structural equation models comprise a large class of popular statistical models, including factor analysis models, certain mixed models, and extensions thereof Model estimation is complicated by the fact that we typically have multiple interdependent response variables and multiple latent variables (which may also be called random effects or hidden variables), often leading to slow and inefficient MCMC samples In this paper, we describe and illustrate a general, efficient approach to Bayesian SEM estimation in Stan, contrasting it with previous implementations in R package blavaan (Merkle & Rosseel, 2018) After describing the approaches in detail, we conduct a practical comparison under multiple scenarios The comparisons show that the new approach is clearly better We also discuss ways that the approach may be extended to other models that are of interest to psychometricians

Posted Content
TL;DR: The TensorFlow Probability MCMC toolkit is introduced, and some of the considerations that motivated its design are discussed, including whether asymptotic convergence, stability, and estimator-variance bounds are guaranteed using only unnormalized probability functions.
Abstract: Markov chain Monte Carlo (MCMC) is widely regarded as one of the most important algorithms of the 20th century. Its guarantees of asymptotic convergence, stability, and estimator-variance bounds using only unnormalized probability functions make it indispensable to probabilistic programming. In this paper, we introduce the TensorFlow Probability MCMC toolkit, and discuss some of the considerations that motivated its design.

Posted Content
TL;DR: It is pointed out that well-known MCMC convergence results often imply that these "subsampling" MCMC algorithms cannot greatly improve performance, and generic results are applied to realistic statistical problems and proposed algorithms.
Abstract: It is widely known that the performance of Markov chain Monte Carlo (MCMC) can degrade quickly when targeting computationally expensive posterior distributions, such as when the sample size is large. This has motivated the search for MCMC variants that scale well to large datasets. One general approach has been to look at only a subsample of the data at every step. In this note, we point out that well-known MCMC convergence results often imply that these "subsampling" MCMC algorithms cannot greatly improve performance. We apply these generic results to realistic statistical problems and proposed algorithms, and also discuss some design principles suggested by the results.

Posted Content
TL;DR: The development of a multi-purpose software for Bayesian statistical inference, BAT.jl, written in the Julia language is described, together with a test suite that ensures the proper functioning of the algorithms.
Abstract: We describe the development of a multi-purpose software for Bayesian statistical inference, BAT.jl, written in the Julia language. The major design considerations and implemented algorithms are summarized here, together with a test suite that ensures the proper functioning of the algorithms. We also give an extended example from the realm of physics that demonstrates the functionalities of BAT.jl.

Posted ContentDOI
TL;DR: The volesti as discussed by the authors package is a C++ package with an R interface that provides efficient, scalable algorithms for volume estimation, uniform and Gaussian sampling from convex polytopes.
Abstract: Sampling from high dimensional distributions and volume approximation of convex bodies are fundamental operations that appear in optimization, finance, engineering and machine learning. In this paper we present volesti, a C++ package with an R interface that provides efficient, scalable algorithms for volume estimation, uniform and Gaussian sampling from convex polytopes. volesti scales to hundreds of dimensions, handles efficiently three different types of polyhedra and provides non existing sampling routines to R. We demonstrate the power of volesti by solving several challenging problems using the R language.

Journal ArticleDOI
TL;DR: A wide range of approaches to teaching Git are presented, aiming to serve as a resource for statistics and data science instructors teaching courses at any level within an undergraduate or graduate curriculum.
Abstract: A version control system records changes to a file or set of files over time so that changes can be tracked and specific versions of a file can be recalled later. As such, it is an essential element of a reproducible workflow that deserves due consideration among the learning objectives of statistics and data science courses. This paper describes experiences and implementation decisions of four contributing faculty who are teaching different courses at a variety of institutions. Each of these faculty have set version control as a learning objective and successfully integrated one such system (Git) into one or more statistics courses. The various approaches described in the paper span different implementation strategies to suit student background, course type, software choices, and assessment practices. By presenting a wide range of approaches to teaching Git, the paper aims to serve as a resource for statistics and data science instructors teaching courses at any level within an undergraduate or graduate curriculum.

Posted Content
TL;DR: An unbiased Monte Carlo estimator is introduced for the gradient of the expected information gain with finite expected squared $\ell_2$-norm and finite expected computational cost per sample.
Abstract: In this paper we propose an efficient stochastic optimization algorithm to search for Bayesian experimental designs such that the expected information gain is maximized. The gradient of the expected information gain with respect to experimental design parameters is given by a nested expectation, for which the standard Monte Carlo method using a fixed number of inner samples yields a biased estimator. In this paper, applying the idea of randomized multilevel Monte Carlo (MLMC) methods, we introduce an unbiased Monte Carlo estimator for the gradient of the expected information gain with finite expected squared $\ell_2$-norm and finite expected computational cost per sample. Our unbiased estimator can be combined well with stochastic gradient descent algorithms, which results in our proposal of an optimization algorithm to search for an optimal Bayesian experimental design. Numerical experiments confirm that our proposed algorithm works well not only for a simple test problem but also for a more realistic pharmacokinetic problem.

Posted Content
TL;DR: This paper introduces the Boomerang Sampler as a novel class of continuous-time non-reversible Markov chain Monte Carlo algorithms and demonstrates theoretically and empirically that it can out-perform existing benchmark piecewise deterministic Markov processes such as the bouncy particle sampler and the Zig-Zag.
Abstract: This paper introduces the Boomerang Sampler as a novel class of continuous-time non-reversible Markov chain Monte Carlo algorithms The methodology begins by representing the target density as a density, $e^{-U}$, with respect to a prescribed (usually) Gaussian measure and constructs a continuous trajectory consisting of a piecewise elliptical path The method moves from one elliptical orbit to another according to a rate function which can be written in terms of $U$ We demonstrate that the method is easy to implement and demonstrate empirically that it can out-perform existing benchmark piecewise deterministic Markov processes such as the bouncy particle sampler and the Zig-Zag In the Bayesian statistics context, these competitor algorithms are of substantial interest in the large data context due to the fact that they can adopt data subsampling techniques which are exact (ie induce no error in the stationary distribution) We demonstrate theoretically and empirically that we can also construct a control-variate subsampling boomerang sampler which is also exact, and which possesses remarkable scaling properties in the large data limit We furthermore illustrate a factorised version on the simulation of diffusion bridges

Posted Content
TL;DR: A hybrid resampling method to approximate finitely supported Wasserstein barycenters on large-scale datasets, which can be combined with any exact solver, and which is shown to be optimal and independent of the underlying dimension.
Abstract: We propose a hybrid resampling method to approximate finitely supported Wasserstein barycenters on large-scale datasets, which can be combined with any exact solver. Nonasymptotic bounds on the expected error of the objective value as well as the barycenters themselves allow to calibrate computational cost and statistical accuracy. The rate of these upper bounds is shown to be optimal and independent of the underlying dimension, which appears only in the constants. Using a simple modification of the subgradient descent algorithm of Cuturi and Doucet, we showcase the applicability of our method on a myriad of simulated datasets, as well as a real-data example from cell microscopy which are out of reach for state of the art algorithms for computing Wasserstein barycenters.

Posted Content
TL;DR: A hierarchical Vecchia approximation is proposed, whose conditional-independence assumptions imply sparsity in the Cholesky factors of both the precision and the covariance matrix, crucial for applications to high-dimensional spatiotemporal filtering.
Abstract: Spatial statistics often involves Cholesky decomposition of covariance matrices To ensure scalability to high dimensions, several recent approximations have assumed a sparse Cholesky factor of the precision matrix We propose a hierarchical Vecchia approximation, whose conditional-independence assumptions imply sparsity in the Cholesky factors of both the precision and the covariance matrix This remarkable property is crucial for applications to high-dimensional spatio-temporal filtering We present a fast and simple algorithm to compute our hierarchical Vecchia approximation, and we provide extensions to non-linear data assimilation with non-Gaussian data based on the Laplace approximation In several numerical comparisons, our methods strongly outperformed alternative approaches

Posted Content
TL;DR: To evaluate novel scalable solutions based on tile-low-rank Monte Carlo methods for computing multivariate Gaussian probabilities and on variational approximations of multivariate truncated normals, the proposed methods scale to dimensions where state-of-the-art solutions are impractical.
Abstract: Predictive models for binary data are fundamental in various fields, and the growing complexity of modern applications has motivated several flexible specifications for modeling the relationship between the observed predictors and the binary responses. A widely-implemented solution is to express the probability parameter via a probit mapping of a Gaussian process indexed by predictors. However, unlike for continuous settings, there is a lack of closed-form results for predictive distributions in binary models with Gaussian process priors. Markov chain Monte Carlo methods and approximation strategies provide common solutions to this problem, but state-of-the-art algorithms are either computationally intractable or inaccurate in moderate-to-high dimensions. In this article, we aim to cover this gap by deriving closed-form expressions for the predictive probabilities in probit Gaussian processes that rely either on cumulative distribution functions of multivariate Gaussians or on functionals of multivariate truncated normals. To evaluate these quantities we develop novel scalable solutions based on tile-low-rank Monte Carlo methods for computing multivariate Gaussian probabilities, and on mean-field variational approximations of multivariate truncated normals. Closed-form expressions for the marginal likelihood and for the posterior distribution of the Gaussian process are also discussed. As shown in simulated and real-world empirical studies, the proposed methods scale to dimensions where state-of-the-art solutions are impractical.

Posted Content
TL;DR: In this paper, a control variate technique is proposed for post-processing of Markov chain Monte Carlo output, based both on Stein's method and an approach to numerical integration due to Sard.
Abstract: The numerical approximation of posterior expected quantities of interest is considered. A novel control variate technique is proposed for post-processing of Markov chain Monte Carlo output, based both on Stein's method and an approach to numerical integration due to Sard. The resulting estimators are proven to be polynomially exact in the Gaussian context, while empirical results suggest the estimators approximate a Gaussian cubature method near the Bernstein-von-Mises limit. The main theoretical result establishes a bias-correction property in settings where the Markov chain does not leave the posterior invariant. Empirical results are presented across a selection of Bayesian inference tasks. All methods used in this paper are available in the R package ZVCV.