Showing papers in "arXiv: Computation in 2015"

PDF

Open Access

Journal Article•DOI•

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

[...]

Aki Vehtari¹, Andrew Gelman², Jonah Gabry²•Institutions (2)

Helsinki Institute for Information Technology¹, Columbia University²

16 Jul 2015-arXiv: Computation

TL;DR: In this article, leave-one-out cross-validation (LOO) and the widely applicable information criterion (WAIC) are used to estimate pointwise out-of-sample prediction accuracy from a fitted Bayesian model using the log-likelihood evaluated at the posterior simulations of the parameter values.

...read moreread less

Abstract: Leave-one-out cross-validation (LOO) and the widely applicable information criterion (WAIC) are methods for estimating pointwise out-of-sample prediction accuracy from a fitted Bayesian model using the log-likelihood evaluated at the posterior simulations of the parameter values. LOO and WAIC have various advantages over simpler estimates of predictive error such as AIC and DIC but are less used in practice because they involve additional computational steps. Here we lay out fast and stable computations for LOO and WAIC that can be performed using existing simulation draws. We introduce an efficient computation of LOO using Pareto-smoothed importance sampling (PSIS), a new procedure for regularizing importance weights. Although WAIC is asymptotically equal to LOO, we demonstrate that PSIS-LOO is more robust in the finite case with weak priors or influential observations. As a byproduct of our calculations, we also obtain approximate standard errors for estimated predictive errors and for comparing of predictive errors between two models. We implement the computations in an R package called 'loo' and demonstrate using models fit with the Bayesian inference package Stan.

...read moreread less

2,455 citations

Journal Article•DOI•

TMB: Automatic Differentiation and Laplace Approximation

[...]

Kasper Kristensen, Anders Nielsen, Casper Willestofte Berg, Hans J. Skaug¹, Brad Bell² - Show less +1 more•Institutions (2)

University of Bergen¹, University of Washington²

02 Sep 2015-arXiv: Computation

TL;DR: TMB as discussed by the authors is an open source R package that enables quick implementation of complex nonlinear random effect (latent variable) models in a manner similar to the established AD Model Builder package (ADMB, this http URL).

...read moreread less

Abstract: TMB is an open source R package that enables quick implementation of complex nonlinear random effect (latent variable) models in a manner similar to the established AD Model Builder package (ADMB, this http URL). In addition, it offers easy access to parallel computations. The user defines the joint likelihood for the data and the random effects as a C++ template function, while all the other operations are done in R; e.g., reading in the data. The package evaluates and maximizes the Laplace approximation of the marginal likelihood where the random effects are automatically integrated out. This approximation, and its derivatives, are obtained using automatic differentiation (up to order three) of the joint likelihood. The computations are designed to be fast for problems with many random effects (~10^6) and parameters (~10^3). Computation times using ADMB and TMB are compared on a suite of examples ranging from simple models to large spatial models where the random effects are a Gaussian random field. Speedups ranging from 1.5 to about 100 are obtained with increasing gains for large problems. The package and examples are available at this http URL.

...read moreread less

506 citations

Journal Article•DOI•

Programming with models: writing statistical algorithms for general model structures with NIMBLE

[...]

Perry de Valpine¹, Daniel Turek¹, Christopher J. Paciorek¹, Clifford Anderson-Bergman¹, Duncan Temple Lang², Rastislav Bodik¹ - Show less +2 more•Institutions (2)

University of California, Berkeley¹, University of California, Davis²

19 May 2015-arXiv: Computation

TL;DR: The NIMBLE language represents a compilable domain-specific language (DSL) embedded within R that extends the BUGS language and creates model objects, which can manipulate variables, calculate log probability values, generate simulations, and query the relationships among variables.

...read moreread less

Abstract: We describe NIMBLE, a system for programming statistical algorithms for general model structures within R. NIMBLE is designed to meet three challenges: flexible model specification, a language for programming algorithms that can use different models, and a balance between high-level programmability and execution efficiency. For model specification, NIMBLE extends the BUGS language and creates model objects, which can manipulate variables, calculate log probability values, generate simulations, and query the relationships among variables. For algorithm programming, NIMBLE provides functions that operate with model objects using two stages of evaluation. The first stage allows specialization of a function to a particular model and/or nodes, such as creating a Metropolis-Hastings sampler for a particular block of nodes. The second stage allows repeated execution of computations using the results of the first stage. To achieve efficient second-stage computation, NIMBLE compiles models and functions via C++, using the Eigen library for linear algebra, and provides the user with an interface to compiled objects. The NIMBLE language represents a compilable domain-specific language (DSL) embedded within R. This paper provides an overview of the design and rationale for NIMBLE along with illustrative examples including importance sampling, Markov chain Monte Carlo (MCMC) and Monte Carlo expectation maximization (MCEM).

...read moreread less

252 citations

Journal Article•DOI•

Estimation of extended mixed models using latent classes and latent processes: the R package lcmm

[...]

Cécile Proust-Lima¹, Viviane Philipps, Benoit Liquet•Institutions (1)

University of Bordeaux¹

03 Mar 2015-arXiv: Computation

TL;DR: The R package lcmm as mentioned in this paper provides a series of functions to estimate statistical models based on linear mixed model theory, including the estimation of mixed models and latent class mixed models for Gaussian longitudinal outcomes.

...read moreread less

Abstract: The R package lcmm provides a series of functions to estimate statistical models based on linear mixed model theory. It includes the estimation of mixed models and latent class mixed models for Gaussian longitudinal outcomes (hlme), curvilinear and ordinal univariate longitudinal outcomes (lcmm) and curvilinear multivariate outcomes (multlcmm), as well as joint latent class mixed models (Jointlcmm) for a (Gaussian or curvilinear) longitudinal outcome and a time-to-event that can be possibly left-truncated right-censored and defined in a competing setting. Maximum likelihood esimators are obtained using a modified Marquardt algorithm with strict convergence criteria based on the parameters and likelihood stability, and on the negativity of the second derivatives. The package also provides various post-fit functions including goodness-of-fit analyses, classification, plots, predicted trajectories, individual dynamic prediction of the event and predictive accuracy assessment. This paper constitutes a companion paper to the package by introducing each family of models, the estimation technique, some implementation details and giving examples through a dataset on cognitive aging.

...read moreread less

229 citations

Posted Content•

Pareto Smoothed Importance Sampling

[...]

Aki Vehtari, Andrew Gelman, Jonah Gabry

09 Jul 2015-arXiv: Computation

TL;DR: In this paper, a generalized Pareto distribution fit to the upper tail of the distribution of the simulated importance ratios is used to stabilize importance sampling estimates, including stabilized effective sample size estimates, Monte Carlo error estimates and convergence diagnostics.

...read moreread less

Abstract: Importance weighting is a general way to adjust Monte Carlo integration to account for draws from the wrong distribution, but the resulting estimate can be noisy when the importance ratios have a heavy right tail. This routinely occurs when there are aspects of the target distribution that are not well captured by the approximating distribution, in which case more stable estimates can be obtained by modifying extreme importance ratios. We present a new method for stabilizing importance weights using a generalized Pareto distribution fit to the upper tail of the distribution of the simulated importance ratios. The method, which empirically performs better than existing methods for stabilizing importance sampling estimates, includes stabilized effective sample size estimates, Monte Carlo error estimates and convergence diagnostics.

...read moreread less

165 citations

Journal Article•DOI•

blavaan: Bayesian structural equation models via parameter expansion

[...]

Edgar C. Merkle, Yves Rosseel

17 Nov 2015-arXiv: Computation

TL;DR: Blavaan, an R package for estimating Bayesian structural equation models (SEMs) via JAGS and for summarizing the results, describes a novel parameter expansion approach for estimating specific types of models with residual covariances, which facilitates estimation of these models in JAGs.

...read moreread less

Abstract: This article describes blavaan, an R package for estimating Bayesian structural equation models (SEMs) via JAGS and for summarizing the results. It also describes a novel parameter expansion approach for estimating specific types of models with residual covariances, which facilitates estimation of these models in JAGS. The methodology and software are intended to provide users with a general means of estimating Bayesian SEMs, both classical and novel, in a straightforward fashion. Users can estimate Bayesian versions of classical SEMs with lavaan syntax, they can obtain state-of-the-art Bayesian fit measures associated with the models, and they can export JAGS code to modify the SEMs as desired. These features and more are illustrated by example, and the parameter expansion approach is explained in detail.

...read moreread less

161 citations

Posted Content•

A fixed-point approach to barycenters in Wasserstein space

[...]

Pedro C. Alvarez-Esteban¹, E. del Barrio¹, Juan A. Cuesta-Albertos², Carlos Matrán¹•Institutions (2)

University of Valladolid¹, University of Cantabria²

17 Nov 2015-arXiv: Computation

TL;DR: Under very general conditions it is proved that the barycenter must be a fixed point for this operator and an iterative procedure is introduced which consistently approximates the bARYcenter.

...read moreread less

Abstract: Let $\mathcal{P}_{2,ac}$ be the set of Borel probabilities on $\mathbb{R}^d$ with finite second moment and absolutely continuous with respect to Lebesgue measure. We consider the problem of finding the barycenter (or Frechet mean) of a finite set of probabilities $ u_1,\ldots, u_k \in \mathcal{P}_{2,ac}$ with respect to the $L_2-$Wasserstein metric. For this task we introduce an operator on $\mathcal{P}_{2,ac}$ related to the optimal transport maps pushing forward any $\mu \in \mathcal{P}_{2,ac}$ to $ u_1,\ldots, u_k$. Under very general conditions we prove that the barycenter must be a fixed point for this operator and introduce an iterative procedure which consistently approximates the barycenter. The procedure allows effective computation of barycenters in any location-scatter family, including the Gaussian case. In such cases the barycenter must belong to the family, thus it is characterized by its mean and covariance matrix. While its mean is just the weighted mean of the means of the probabilities, the covariance matrix is characterized in terms of their covariance matrices $\Sigma_1,\dots,\Sigma_k$ through a nonlinear matrix equation. The performance of the iterative procedure in this case is illustrated through numerical simulations, which show fast convergence towards the barycenter.

...read moreread less

101 citations

Journal Article•DOI•

A simple sampler for the horseshoe estimator

[...]

Enes Makalic¹, Daniel F. Schmidt¹•Institutions (1)

University of Melbourne¹

17 Aug 2015-arXiv: Computation

TL;DR: In this article, a simple Bayesian sampler for linear regression with the horseshoe hierarchy is presented, and extensions to logistic regression and alternative hierarchies such as horseshoes$+$ are discussed.

...read moreread less

Abstract: In this note we derive a simple Bayesian sampler for linear regression with the horseshoe hierarchy. A new interpretation of the horseshoe model is presented, and extensions to logistic regression and alternative hierarchies, such as horseshoe$+$, are discussed. Due to the conjugacy of the proposed hierarchy, Chib's algorithm may be used to easily compute the marginal likelihood of the model.

...read moreread less

74 citations

Posted Content•

Importance Sampling: Intrinsic Dimension and Computational Cost

[...]

Sergios Agapiou, Omiros Papaspiliopoulos, Daniel Sanz-Alonso, Andrew M. Stuart

19 Nov 2015-arXiv: Computation

TL;DR: A general theory is presented, with a focus on the use of importance sampling in Bayesian inverse problems and filtering, and how many samples are required in order to guarantee accurate approximations is presented.

...read moreread less

Abstract: The basic idea of importance sampling is to use independent samples from a proposal measure in order to approximate expectations with respect to a target measure. It is key to understand how many samples are required in order to guarantee accurate approximations. Intuitively, some notion of distance between the target and the proposal should determine the computational cost of the method. A major challenge is to quantify this distance in terms of parameters or statistics that are pertinent for the practitioner. The subject has attracted substantial interest from within a variety of communities. The objective of this paper is to overview and unify the resulting literature by creating an overarching framework. A general theory is presented, with a focus on the use of importance sampling in Bayesian inverse problems and filtering.

...read moreread less

71 citations

Journal Article•DOI•

Generalized Multiple Importance Sampling

[...]

Victor Elvira¹, Luca Martino, David Luengo, Monica F. Bugallo•Institutions (1)

University of Edinburgh¹

10 Nov 2015-arXiv: Computation

TL;DR: In this article, a general framework for sampling and weighting procedures when more than one proposal is available is established, and the most relevant MIS schemes in the literature are encompassed within the new framework, and moreover novel valid valid schemes appear naturally.

...read moreread less

Abstract: Importance Sampling methods are broadly used to approximate posterior distributions or some of their moments. In its standard approach, samples are drawn from a single proposal distribution and weighted properly. However, since the performance depends on the mismatch between the targeted and the proposal distributions, several proposal densities are often employed for the generation of samples. Under this Multiple Importance Sampling (MIS) scenario, many works have addressed the selection or adaptation of the proposal distributions, interpreting the sampling and the weighting steps in different ways. In this paper, we establish a general framework for sampling and weighing procedures when more than one proposal are available. The most relevant MIS schemes in the literature are encompassed within the new framework, and, moreover novel valid schemes appear naturally. All the MIS schemes are compared and ranked in terms of the variance of the associated estimators. Finally, we provide illustrative examples which reveal that, even with a good choice of the proposal densities, a careful interpretation of the sampling and weighting procedures can make a significant difference in the performance of the method.

...read moreread less

66 citations

Posted Content•

Probabilistic Programming in Python using PyMC

[...]

John Salvatier, Thomas V. Wiecki, Christopher Fonnesbeck

29 Jul 2015-arXiv: Computation

TL;DR: PyMC3 is a new, open-source PP framework with an intutive and readable, yet powerful, syntax that is close to the natural syntax statisticians use to describe models.

...read moreread less

Abstract: Probabilistic programming (PP) allows flexible specification of Bayesian statistical models in code. PyMC3 is a new, open-source PP framework with an intutive and readable, yet powerful, syntax that is close to the natural syntax statisticians use to describe models. It features next-generation Markov chain Monte Carlo (MCMC) sampling algorithms such as the No-U-Turn Sampler (NUTS; Hoffman, 2014), a self-tuning variant of Hamiltonian Monte Carlo (HMC; Duane, 1987). Probabilistic programming in Python confers a number of advantages including multi-platform compatibility, an expressive yet clean and readable syntax, easy integration with other scientific libraries, and extensibility via C, C++, Fortran or Cython. These features make it relatively straightforward to write and use custom statistical distributions, samplers and transformation functions, as required by Bayesian analysis.

...read moreread less

Journal Article•DOI•

label.switching: An R Package for Dealing with the Label Switching Problem in MCMC Outputs

[...]

Panagiotis Papastamoulis

08 Mar 2015-arXiv: Computation

TL;DR: The \pkg{label.switching} package is introduced, which contains one probabilistic and seven deterministic relabelling algorithms in order to post-process a given MCMC sample, provided by the user.

...read moreread less

Abstract: Label switching is a well-known and fundamental problem in Bayesian estimation of mixture or hidden Markov models. In case that the prior distribution of the model parameters is the same for all states, then both the likelihood and posterior distribution are invariant to permutations of the parameters. This property makes Markov chain Monte Carlo (MCMC) samples simulated from the posterior distribution non-identifiable. In this paper, the \pkg{label.switching} package is introduced. It contains one probabilistic and seven deterministic relabelling algorithms in order to post-process a given MCMC sample, provided by the user. Each method returns a set of permutations that can be used to reorder the MCMC output. Then, any parametric function of interest can be inferred using the reordered MCMC sample. A set of user-defined permutations is also accepted, allowing the researcher to benchmark new relabelling methods against the available ones

...read moreread less

Posted Content•

Semismooth Newton Coordinate Descent Algorithm for Elastic-Net Penalized Huber Loss Regression and Quantile Regression

[...]

Congrui Yi¹, Jian Huang¹•Institutions (1)

University of Iowa¹

09 Sep 2015-arXiv: Computation

TL;DR: An algorithm, semismooth Newton coordinate descent (SNCD), for the elastic-net penalized Huber loss regression and quantile regression in high dimensional settings and an adaptive version of the “strong rule” for screening predictors to gain extra efficiency.

...read moreread less

Abstract: We propose an algorithm, semismooth Newton coordinate descent (SNCD), for the elastic-net penalized Huber loss regression and quantile regression in high dimensional settings. Unlike existing coordinate descent type algorithms, the SNCD updates each regression coefficient and its corresponding subgradient simultaneously in each iteration. It combines the strengths of the coordinate descent and the semismooth Newton algorithm, and effectively solves the computational challenges posed by dimensionality and nonsmoothness. We establish the convergence properties of the algorithm. In addition, we present an adaptive version of the "strong rule" for screening predictors to gain extra efficiency. Through numerical experiments, we demonstrate that the proposed algorithm is very efficient and scalable to ultra-high dimensions. We illustrate the application via a real data example.

...read moreread less

Posted Content•

Polynomial-Chaos-based Kriging

[...]

R. Schoebi, Bruno Sudret, Joe Wiart

13 Feb 2015-arXiv: Computation

TL;DR: PC-Kriging as discussed by the authors is a meta-modeling approach combining Polynomial Chaos Expansions (PCE) and Kriging, where PCE surrogates the computational model with a series of orthonormal polynomials in the input variables where polynomial are chosen in coherency with the probability distributions of those input variables.

...read moreread less

Abstract: Computer simulation has become the standard tool in many engineering fields for designing and optimizing systems, as well as for assessing their reliability. To cope with demanding analysis such as optimization and reliability, surrogate models (a.k.a meta-models) have been increasingly investigated in the last decade. Polynomial Chaos Expansions (PCE) and Kriging are two popular non-intrusive meta-modelling techniques. PCE surrogates the computational model with a series of orthonormal polynomials in the input variables where polynomials are chosen in coherency with the probability distributions of those input variables. On the other hand, Kriging assumes that the computer model behaves as a realization of a Gaussian random process whose parameters are estimated from the available computer runs, i.e. input vectors and response values. These two techniques have been developed more or less in parallel so far with little interaction between the researchers in the two fields. In this paper, PC-Kriging is derived as a new non-intrusive meta-modeling approach combining PCE and Kriging. A sparse set of orthonormal polynomials (PCE) approximates the global behavior of the computational model whereas Kriging manages the local variability of the model output. An adaptive algorithm similar to the least angle regression algorithm determines the optimal sparse set of polynomials. PC-Kriging is validated on various benchmark analytical functions which are easy to sample for reference results. From the numerical investigations it is concluded that PC-Kriging performs better than or at least as good as the two distinct meta-modeling techniques. A larger gain in accuracy is obtained when the experimental design has a limited size, which is an asset when dealing with demanding computational models.

...read moreread less

Journal Article•DOI•

Efficient Multiple Importance Sampling Estimators

[...]

Victor Elvira¹, Luca Martino², David Luengo³, Monica F. Bugallo⁴•Institutions (4)

Charles III University of Madrid¹, University of Helsinki², Technical University of Madrid³, Stony Brook University⁴

20 May 2015-arXiv: Computation

TL;DR: In this article, a new method that achieves an efficient compromise between variance reduction and computational complexity of the different approaches (classical vs. deterministic mixture) available for the weight calculation is introduced.

...read moreread less

Abstract: Multiple importance sampling (MIS) methods use a set of proposal distributions from which samples are drawn. Each sample is then assigned an importance weight that can be obtained according to different strategies. This work is motivated by the trade-off between variance reduction and computational complexity of the different approaches (classical vs. deterministic mixture) available for the weight calculation. A new method that achieves an efficient compromise between both factors is introduced in this paper. It is based on forming a partition of the set of proposal distributions and computing the weights accordingly. Computer simulations show the excellent performance of the associated \mbox{\emph{partial deterministic mixture} MIS estimator.

...read moreread less

Journal Article•DOI•

Three discussions of the paper "sequential quasi-Monte Carlo sampling", by M. Gerber and N. Chopin

[...]

Julyan Arbel¹, Igor Prünster¹, Christian P. Robert², Robin J. Ryder²•Institutions (2)

University of Turin¹, Paris Dauphine University²

24 May 2015-arXiv: Computation

TL;DR: Gerber and Chopin this article discussed the sequential quasi-Monte Carlo sampling technique and its application in the context of statistical analysis, following the presentation given before the Royal Statistical Society in London on December 10th, 2014.

...read moreread less

Abstract: This is a written discussion of the paper sequential quasi-Monte Carlo sampling" by M. Gerber and N. Chopin, following the presentation given before the Royal Statistical Society in London on December 10th, 2014.

...read moreread less

Journal Article•DOI•

Sequential Monte Carlo with Adaptive Weights for Approximate Bayesian Computation

[...]

Fernando V. Bonassi, Mike West

26 Mar 2015-arXiv: Computation

TL;DR: An ABC SMC method that uses data-based adaptive weights can very substantially improve acceptance rates, as is demonstrated in a series of examples with simulated and real data sets, including a currently topical example from dynamic modelling in systems biology applications.

...read moreread less

Abstract: Methods of approximate Bayesian computation (ABC) are increasingly used for analysis of complex models. A major challenge for ABC is over-coming the often inherent problem of high rejection rates in the accept/reject methods based on prior:predictive sampling. A number of recent developments aim to address this with extensions based on sequential Monte Carlo (SMC) strategies. We build on this here, introducing an ABC SMC method that uses data-based adaptive weights. This easily implemented and computationally trivial extension of ABC SMC can very substantially improve acceptance rates, as is demonstrated in a series of examples with simulated and real data sets, including a currently topical example from dynamic modelling in systems biology applications.

...read moreread less

Journal Article•DOI•

On a generalization of the preconditioned Crank-Nicolson Metropolis algorithm

[...]

Daniel Rudolf¹, Björn Sprungk²•Institutions (2)

University of Göttingen¹, Chemnitz University of Technology²

14 Apr 2015-arXiv: Computation

TL;DR: In this article, a generalization of the preconditioned Crank-Nicolson (pCN) proposal is introduced, which is able to incorporate information of the measure of interest, and a numerical simulation of a Bayesian inverse problem indicates that a Metropolis algorithm with such a proposal performs independent of the state space dimension and the variance of the observational noise.

...read moreread less

Abstract: Metropolis algorithms for approximate sampling of probability measures on infinite dimensional Hilbert spaces are considered and a generalization of the preconditioned Crank-Nicolson (pCN) proposal is introduced. The new proposal is able to incorporate information of the measure of interest. A numerical simulation of a Bayesian inverse problem indicates that a Metropolis algorithm with such a proposal performs independent of the state space dimension and the variance of the observational noise. Moreover, a qualitative convergence result is provided by a comparison argument for spectral gaps. In particular, it is shown that the generalization inherits geometric ergodicity from the Metropolis algorithm with pCN proposal.

...read moreread less

Journal Article•DOI•

An Empirical Comparison of Multiple Imputation Methods for Categorical Data

[...]

Olanrewaju Akande¹, Fan Li¹, Jerome P. Reiter¹•Institutions (1)

Duke University¹

24 Aug 2015-arXiv: Computation

TL;DR: In this article, the authors compare default chained equations approaches based on generalized linear models and Bayesian mixture models for multiple imputation of categorical data from the American Community Survey (ACS).

...read moreread less

Abstract: Multiple imputation is a common approach for dealing with missing values in statistical databases. The imputer fills in missing values with draws from predictive models estimated from the observed data, resulting in multiple, completed versions of the database. Researchers have developed a variety of default routines to implement multiple imputation; however, there has been limited research comparing the performance of these methods, particularly for categorical data. We use simulation studies to compare repeated sampling properties of three default multiple imputation methods for categorical data, including chained equations using generalized linear models, chained equations using classification and regression trees, and a fully Bayesian joint distribution based on Dirichlet Process mixture models. We base the simulations on categorical data from the American Community Survey. In the circumstances of this study, the results suggest that default chained equations approaches based on generalized linear models are dominated by the default regression tree and Bayesian mixture model approaches. They also suggest competing advantages for the regression tree and Bayesian mixture model approaches, making both reasonable default engines for multiple imputation of categorical data. A supplementary material for this article is available online.

...read moreread less

Posted Content•

Bayesian Additive Regression Trees using Bayesian Model Averaging

[...]

Belinda Hernández¹, Adrian E. Raftery², Stephen R. Pennington¹, Andrew C. Parnell¹•Institutions (2)

University College Dublin¹, University of Washington²

01 Jul 2015-arXiv: Computation

TL;DR: This work proposes an alternative fitting algorithm for BART called BART-BMA, which uses Bayesian model averaging and a greedy search algorithm to obtain a posterior distribution more efficiently than BART for datasets with large p.

...read moreread less

Abstract: Bayesian Additive Regression Trees (BART) is a statistical sum of trees model. It can be considered a Bayesian version of machine learning tree ensemble methods where the individual trees are the base learners. However for data sets where the number of variables $p$ is large (e.g. $p>5,000$) the algorithm can become prohibitively expensive, computationally. Another method which is popular for high dimensional data is random forests, a machine learning algorithm which grows trees using a greedy search for the best split points. However, as it is not a statistical model, it cannot produce probabilistic estimates or predictions. We propose an alternative algorithm for BART called BART-BMA, which uses Bayesian Model Averaging and a greedy search algorithm to produce a model which is much more efficient than BART for datasets with large $p$. BART-BMA incorporates elements of both BART and random forests to offer a model-based algorithm which can deal with high-dimensional data. We have found that BART-BMA can be run in a reasonable time on a standard laptop for the "small $n$ large $p$" scenario which is common in many areas of bioinformatics. We showcase this method using simulated data and data from two real proteomic experiments; one to distinguish between patients with cardiovascular disease and controls and another to classify agressive from non-agressive prostate cancer. We compare our results to their main competitors. Open source code written in R and Rcpp to run BART-BMA can be found at: this https URL

...read moreread less

Journal Article•DOI•

A Bayesian approach to constrained single- and multi-objective optimization

[...]

Paul Feliot¹, Julien Bect¹, Emmanuel Vazquez¹•Institutions (1)

Université Paris-Saclay¹

02 Oct 2015-arXiv: Computation

TL;DR: In this paper, an extended domination rule is used to handle objectives and constraints in a unified way, and a corresponding expected hyper-volume improvement sampling criterion is proposed, which is naturally adapted to the search of a feasible point when none is available, and reduces to existing Bayesian sampling criteria.

...read moreread less

Abstract: This article addresses the problem of derivative-free (single- or multi-objective) optimization subject to multiple inequality constraints. Both the objective and constraint functions are assumed to be smooth, non-linear and expensive to evaluate. As a consequence, the number of evaluations that can be used to carry out the optimization is very limited, as in complex industrial design optimization problems. The method we propose to overcome this difficulty has its roots in both the Bayesian and the multi-objective optimization literatures. More specifically, an extended domination rule is used to handle objectives and constraints in a unified way, and a corresponding expected hyper-volume improvement sampling criterion is proposed. This new criterion is naturally adapted to the search of a feasible point when none is available, and reduces to existing Bayesian sampling criteria---the classical Expected Improvement (EI) criterion and some of its constrained/multi-objective extensions---as soon as at least one feasible point is available. The calculation and optimization of the criterion are performed using Sequential Monte Carlo techniques. In particular, an algorithm similar to the subset simulation method, which is well known in the field of structural reliability, is used to estimate the criterion. The method, which we call BMOO (for Bayesian Multi-Objective Optimization), is compared to state-of-the-art algorithms for single- and multi-objective constrained optimization.

...read moreread less

Journal Article•DOI•

Improving the INLA approach for approximate Bayesian inference for latent Gaussian models

[...]

Egil Ferkingstad, Håvard Rue

25 Mar 2015-arXiv: Computation

TL;DR: A copula-based correction for generalized linear mixed models within the integrated nested Laplace approximation (INLA) approach for approximate Bayesian inference for latent Gaussian models is introduced.

...read moreread less

Abstract: We introduce a new copula-based correction for generalized linear mixed models (GLMMs) within the integrated nested Laplace approximation (INLA) approach for approximate Bayesian inference for latent Gaussian models. While INLA is usually very accurate, some (rather extreme) cases of GLMMs with e.g. binomial or Poisson data have been seen to be problematic. Inaccuracies can occur when there is a very low degree of smoothing or "borrowing strength" within the model, and we have therefore developed a correction aiming to push the boundaries of the applicability of INLA. Our new correction has been implemented as part of the R-INLA package, and adds only negligible computational cost. Empirical evaluations on both real and simulated data indicate that the method works well.

...read moreread less

Journal Article•DOI•

Online Updating of Statistical Inference in the Big Data Setting

[...]

Elizabeth D. Schifano¹, Jing Wu¹, Chun Wang¹, Jun Yan¹, Ming-Hui Chen¹ - Show less +1 more•Institutions (1)

University of Connecticut¹

23 May 2015-arXiv: Computation

TL;DR: In this article, the authors present statistical methods for big data arising from online analytical processing, where large amounts of data arrive in streams and require fast analysis without storage/access to the historical data.

...read moreread less

Abstract: We present statistical methods for big data arising from online analytical processing, where large amounts of data arrive in streams and require fast analysis without storage/access to the historical data. In particular, we develop iterative estimating algorithms and statistical inferences for linear models and estimating equations that update as new data arrive. These algorithms are computationally efficient, minimally storage-intensive, and allow for possible rank deficiencies in the subset design matrices due to rare-event covariates. Within the linear model setting, the proposed online-updating framework leads to predictive residual tests that can be used to assess the goodness-of-fit of the hypothesized model. We also propose a new online-updating estimator under the estimating equation setting. Theoretical properties of the goodness-of-fit tests and proposed estimators are examined in detail. In simulation studies and real data applications, our estimator compares favorably with competing approaches under the estimating equation setting.

...read moreread less

Journal Article•DOI•

Setting the stage for data science: integration of data management skills in introductory and second courses in statistics.

[...]

Nicholas J. Horton, Benjamin S. Baumer, Hadley Wickham

01 Feb 2015-arXiv: Computation

TL;DR: By introducing students to commonplace tools for data management, visualization, and reproducible analysis in data science and applying these to real-world scenarios, it is imperative that students develop data-related capacities, beginning with the introductory course.

...read moreread less

Abstract: Many have argued that statistics students need additional facility to express statistical computations. By introducing students to commonplace tools for data management, visualization, and reproducible analysis in data science and applying these to real-world scenarios, we prepare them to think statistically. In an era of increasingly big data, it is imperative that students develop data-related capacities, beginning with the introductory course. We believe that the integration of these precursors to data science into our curricula-early and often-will help statisticians be part of the dialogue regarding "Big Data" and "Big Questions".

...read moreread less

Posted Content•

Importance Sampling: Computational Complexity and Intrinsic Dimension

[...]

Sergios Agapiou, Omiros Papaspiliopoulos, Daniel Sanz-Alonso, Andrew M. Stuart

19 Nov 2015-arXiv: Computation

TL;DR: The objective of this paper is to overview and unify the resulting literature in the area by creating an overarching framework for importance sampling, and to find useful quantities which measure this difference in terms of parameters which are pertinent for the practitioner.

...read moreread less

Abstract: The basic idea of importance sampling is to use independent samples from one measure in order to approximate expectations with respect to another measure. Understanding how many samples are needed is key to understanding the computational complexity of the method, and hence to understanding when it will be effective and when it will not. It is intuitive that the size of the difference between the measure which is sampled, and the measure against which expectations are to be computed, is key to the computational complexity. An implicit challenge in many of the published works in this area is to find useful quantities which measure this difference in terms of parameters which are pertinent for the practitioner. The subject has attracted substantial interest recently from within a variety of communities. The objective of this paper is to overview and unify the resulting literature in the area by creating an overarching framework. The general setting is studied in some detail, followed by deeper development in the context of Bayesian inverse problems and filtering.

...read moreread less

Journal Article•DOI•

The Marginalized $\delta$-GLMB Filter

[...]

C. Fantacci, Ba-Tuong Vo, F. Papi, Ba Ngu Vo

05 Jan 2015-arXiv: Computation

TL;DR: In this paper, the authors proposed an efficient approximation to the multi-target Bayes filter which preserves both the PHD and cardinality distribution of the labeled posterior, which facilitates efficient multi-sensor tracking with detection-based measurements.

...read moreread less

Abstract: The multi-target Bayes filter proposed by Mahler is a principled solution to recursive Bayesian tracking based on RFS or FISST. The $\delta$-GLMB filter is an exact closed form solution to the multi-target Bayes recursion which yields joint state and label or trajectory estimates in the presence of clutter, missed detections and association uncertainty. Due to presence of explicit data associations in the $\delta$-GLMB filter, the number of components in the posterior grows without bound in time. In this work we propose an efficient approximation to the $\delta$-GLMB filter which preserves both the PHD and cardinality distribution of the labeled posterior. This approximation also facilitates efficient multi-sensor tracking with detection-based measurements. Simulation results are presented to verify the proposed approach.

...read moreread less

Journal Article•DOI•

R Markdown

[...]

Dana Udwin, Ben Baumer

07 Jan 2015-arXiv: Computation

TL;DR: R Markdown as discussed by the authors is an authoring syntax that combines the ease of Markdown with the statistical programming language R An R Markdown document or presentation interweaves computation, output and written analysis to the effect of transparency, clarity and an inherent invitation to reproduce (especially as sharing data is now as easy as the click of a button).

...read moreread less

Abstract: Reproducibility is increasingly important to statistical research, but many details are often omitted from the published version of complex statistical analyses A reader's comprehension is limited to what the author concludes, without exposure to the computational process Often, the industrious reader cannot expand upon or validate the author's results Even the author may struggle to reproduce their own results upon revisiting them R Markdown is an authoring syntax that combines the ease of Markdown with the statistical programming language R An R Markdown document or presentation interweaves computation, output and written analysis to the effect of transparency, clarity and an inherent invitation to reproduce (especially as sharing data is now as easy as the click of a button) It is an open-source tool that can be used either on its own or through the RStudio integrated development environment (IDE) In addition to facilitating reproducible research, R Markdown is a boon to collaboratively-minded data analysts, whose workflow can be streamlined by sharing only one master document that contains both code and content Statistics educators may also find that R Markdown is helpful as a homework template, for both ease-of-use and in discouraging students from copy-and-pasting results from classmates Training students in R Markdown will introduce to the workforce a new class of data analysts with an ingrained, foundational inclination toward reproducible research

...read moreread less

Posted Content•

Accelerating pseudo-marginal Metropolis-Hastings by correlating auxiliary variables

[...]

Johan Dahlin¹, Fredrik Lindsten, Joel Kronander, Thomas B. Schön•Institutions (1)

Linköping University¹

17 Nov 2015-arXiv: Computation

TL;DR: A modication to the pmMH algorithm is proposed in which a Crank-Nicolson (CN) proposal is used instead, which results in that a positive correlation in the auxiliary variables is introduced.

...read moreread less

Abstract: Pseudo-marginal Metropolis-Hastings (pmMH) is a powerful method for Bayesian inference in models where the posterior distribution is analytical intractable or computationally costly to evaluate directly. It operates by introducing additional auxiliary variables into the model and form an extended target distribution, which then can be evaluated point-wise. In many cases, the standard Metropolis-Hastings is then applied to sample from the extended target and the sought posterior can be obtained by marginalisation. However, in some implementations this approach suers from poor mixing as the auxiliary variables are sampled from an independent proposal. We propose a modication to the pmMH algorithm in which a Crank-Nicolson (CN) proposal is used instead. This results in that we introduce a positive correlation in the auxiliary variables. We investigate how to tune the CN proposal and its impact on the mixing of the resulting pmMH sampler. The conclusion is that the proposed modication

...read moreread less

Posted Content•

Nested Sequential Monte Carlo Methods

[...]

Christian A. Naesseth¹, Fredrik Lindsten², Thomas B. Schön³•Institutions (3)

Linköping University¹, University of Cambridge², Uppsala University³

09 Feb 2015-arXiv: Computation

TL;DR: Nested sequential Monte Carlo (NSMC) as discussed by the authors generalizes the SMC framework by requiring only approximate, properly weighted, samples from the proposal distribution, while still resulting in a correct SMC algorithm.

...read moreread less

Abstract: We propose nested sequential Monte Carlo (NSMC), a methodology to sample from sequences of probability distributions, even where the random variables are high-dimensional. NSMC generalises the SMC framework by requiring only approximate, properly weighted, samples from the SMC proposal distribution, while still resulting in a correct SMC algorithm. Furthermore, NSMC can in itself be used to produce such properly weighted samples. Consequently, one NSMC sampler can be used to construct an efficient high-dimensional proposal distribution for another NSMC sampler, and this nesting of the algorithm can be done to an arbitrary degree. This allows us to consider complex and high-dimensional models using SMC. We show results that motivate the efficacy of our approach on several filtering problems with dimensions in the order of 100 to 1 000.

...read moreread less

Posted Content•

Gibbs Flow for Approximate Transport with Applications to Bayesian Computation

[...]

Jeremy Heng¹, Arnaud Doucet², Yvo Pokern³•Institutions (3)

ESSEC Business School¹, University of Oxford², University College London³

29 Sep 2015-arXiv: Computation

TL;DR: The resulting distribution of mapped samples can be efficiently evaluated and used as a proposal within sequential Monte Carlo samplers at a fixed computational complexity on a variety of applications.

...read moreread less

Abstract: Let $\pi_{0}$ and $\pi_{1}$ be two distributions on the Borel space $(\mathbb{R}^{d},\mathcal{B}(\mathbb{R}^{d}))$. Any measurable function $T:\mathbb{R}^{d}\rightarrow\mathbb{R}^{d}$ such that $Y=T(X)\sim\pi_{1}$ if $X\sim\pi_{0}$ is called a transport map from $\pi_{0}$ to $\pi_{1}$. For any $\pi_{0}$ and $\pi_{1}$, if one could obtain an analytical expression for a transport map from $\pi_{0}$ to $\pi_{1}$, then this could be straightforwardly applied to sample from any distribution. One would map draws from an easy-to-sample distribution $\pi_{0}$ to the target distribution $\pi_{1}$ using this transport map. Although it is usually impossible to obtain an explicit transport map for complex target distributions, we show here how to build a tractable approximation of a novel transport map. This is achieved by moving samples from $\pi_{0}$ using an ordinary differential equation with a velocity field that depends on the full conditional distributions of the target. Even when this ordinary differential equation is time-discretized and the full conditional distributions are numerically approximated, the resulting distribution of mapped samples can be efficiently evaluated and used as a proposal within sequential Monte Carlo samplers. We demonstrate significant gains over state-of-the-art sequential Monte Carlo samplers at a fixed computational complexity on a variety of applications.

...read moreread less

Collapse