Showing papers in &quot;Statistics and Computing in 2015&quot;

Shape constrained additive models

TL;DR: Algorithms for fitting nonconvex penalties such as SCAD and MCP stably and efficiently and real data examples comparing and contrasting the statistical properties of these methods are presented.

...read moreread less

Abstract: Penalized regression is an attractive framework for variable selection problems. Often, variables possess a grouping structure, and the relevant selection problem is that of selecting groups, not individual variables. The group lasso has been proposed as a way of extending the ideas of the lasso to the problem of group selection. Nonconvex penalties such as SCAD and MCP have been proposed and shown to have several advantages over the lasso; these penalties may also be extended to the group selection problem, giving rise to group SCAD and group MCP methods. Here, we describe algorithms for fitting these models stably and efficiently. In addition, we present simulation results and real data examples comparing and contrasting the statistical properties of these methods.

...read moreread less

238 citations

Journal Article•DOI•

[...]

Natalya Pya¹, Simon N. Wood¹•Institutions (1)

University of Bath¹

01 May 2015-Statistics and Computing

TL;DR: A framework is presented for generalized additive modelling under shape constraints on the component functions of the linear predictor of the GAM by mildly non-linear extensions of P-splines that facilitates efficient estimation of smoothing parameters as an integral part of model estimation.

...read moreread less

Abstract: A framework is presented for generalized additive modelling under shape constraints on the component functions of the linear predictor of the GAM. We represent shape constrained model components by mildly non-linear extensions of P-splines. Models can contain multiple shape constrained and unconstrained terms as well as shape constrained multi-dimensional smooths. The constraints considered are on the sign of the first or/and the second derivatives of the smooth terms. A key advantage of the approach is that it facilitates efficient estimation of smoothing parameters as an integral part of model estimation, via GCV or AIC, and numerically robust algorithms for this are presented. We also derive simulation free approximate Bayesian confidence intervals for the smooth components, which are shown to achieve close to nominal coverage probabilities. Applications are presented using real data examples including the risk of disease in relation to proximity to municipal incinerators and the association between air pollution and health.

...read moreread less

211 citations

Journal Article•DOI•

Bayesian computation: a summary of the current state, and samples backwards and forwards

[...]

Peter H.R. Green¹, Krzysztof Latuszynski², Marcelo Pereyra¹, Christian P. Robert²•Institutions (2)

University of Bristol¹, University of Warwick²

A fast unified algorithm for solving group-lasso penalize learning problems

TL;DR: The difficulties of modelling and then handling ever more complex datasets most likely call for a new type of tool for computational inference that dramatically reduces the dimension and size of the raw data while capturing its essential aspects.

...read moreread less

Abstract: Recent decades have seen enormous improvements in computational inference for statistical models; there have been competitive continual enhancements in a wide range of computational tools. In Bayesian inference, first and foremost, MCMC techniques have continued to evolve, moving from random walk proposals to Langevin drift, to Hamiltonian Monte Carlo, and so on, with both theoretical and algorithmic innovations opening new opportunities to practitioners. However, this impressive evolution in capacity is confronted by an even steeper increase in the complexity of the datasets to be addressed. The difficulties of modelling and then handling ever more complex datasets most likely call for a new type of tool for computational inference that dramatically reduces the dimension and size of the raw data while capturing its essential aspects. Approximate models and algorithms may thus be at the core of the next computational revolution.

...read moreread less

202 citations

Journal Article•DOI•

[...]

Yi Yang¹, Hui Zou¹•Institutions (1)

University of Minnesota¹

Statistics and computing: the genesis of data science

TL;DR: A unified algorithm called groupwise-majorization-descent (GMD) for efficiently computing the solution paths of the corresponding group-lasso penalized learning problem and allows for general design matrices, without requiring the predictors to be group-wise orthonormal.

...read moreread less

Abstract: This paper concerns a class of group-lasso learning problems where the objective function is the sum of an empirical loss and the group-lasso penalty. For a class of loss function satisfying a quadratic majorization condition, we derive a unified algorithm called groupwise-majorization-descent (GMD) for efficiently computing the solution paths of the corresponding group-lasso penalized learning problem. GMD allows for general design matrices, without requiring the predictors to be group-wise orthonormal. As illustration examples, we develop concrete algorithms for solving the group-lasso penalized least squares and several group-lasso penalized large margin classifiers. These group-lasso models have been implemented in an R package gglasso publicly available from the Comprehensive R Archive Network (CRAN) at http://cran.r-project.org/web/packages/gglasso. On simulated and real data, gglasso consistently outperforms the existing software for computing the group-lasso that implements either the classical groupwise descent algorithm or Nesterov's method.

...read moreread less

184 citations

Journal Article•DOI•

[...]

David J. Hand¹•Institutions (1)

Imperial College London¹

Multiobjective optimization using Gaussian process emulators via stepwise uncertainty reduction

TL;DR: The two disciplines of statistics and computing are together the core technologies of data science and the journal Statistics and Computing has been instrumental in enhancing the interaction between them over the past quarter century.

...read moreread less

Abstract: The two disciplines of statistics and computing are together the core technologies of data science. The journal Statistics and Computing has been instrumental in enhancing the interaction between them over the past quarter century. This has been a period of dramatic change in each of the disciplines, where huge progress has been made, in both fundamental theory and in practice and applications. But it has also been a period of dramatic change in scientific publishing. The evolution of Statistics and Computing has reflected both changes, putting it at the cutting edge of progress. But these changes have not reached an end. We can confidently expect even more startling progress in the disciplines and change in the practice of scientific publishing in future years. It is vital that Statistics and Computing keeps pace.

...read moreread less

178 citations

Journal Article•DOI•

[...]

Victor Picheny¹•Institutions (1)

Institut national de la recherche agronomique¹

Estimation and selection for the latent block model on categorical data

TL;DR: The method is tested on several numerical examples and on an agronomy problem, showing that it provides an efficient trade-off between exploration and intensification.

...read moreread less

Abstract: Optimization of expensive computer models with the help of Gaussian process emulators is now commonplace. However, when several (competing) objectives are considered, choosing an appropriate sampling strategy remains an open question. We present here a new algorithm based on stepwise uncertainty reduction principles. Optimization is seen as a sequential reduction of the volume of the excursion sets below the current best solutions (Pareto set), and our sampling strategy chooses the points that give the highest expected reduction. The method is tested on several numerical examples and on an agronomy problem, showing that it provides an efficient trade-off between exploration and intensification.

...read moreread less

115 citations

Journal Article•DOI•

[...]

Christine Keribin¹, Vincent Brault¹, Gilles Celeux¹, Gérard Govaert•Institutions (1)

University of Paris-Sud¹

On parallel implementation of sequential Monte Carlo methods: the island particle model

TL;DR: Estimation procedures and model selection criteria derived for binary data are generalised and an exact expression of the integrated completed likelihood criterion requiring no asymptotic approximation is derived.

...read moreread less

Abstract: This paper deals with estimation and model selection in the Latent Block Model (LBM) for categorical data. First, after providing sufficient conditions ensuring the identifiability of this model, we generalise estimation procedures and model selection criteria derived for binary data. Secondly, we develop Bayesian inference through Gibbs sampling and with a well calibrated non informative prior distribution, in order to get the MAP estimator: this is proved to avoid the traps encountered by the LBM with the maximum likelihood methodology. Then model selection criteria are presented. In particular an exact expression of the integrated completed likelihood criterion requiring no asymptotic approximation is derived. Finally numerical experiments on both simulated and real data sets highlight the appeal of the proposed estimation and model selection procedures.

...read moreread less

109 citations

Journal Article•DOI•

[...]

Christelle Vergé¹, Cyrille Dubarry², Pierre Del Moral³, Eric Moulines⁴•Institutions (4)

École Polytechnique¹, Telecom SudParis², French Institute for Research in Computer Science and Automation³, Télécom ParisTech⁴

Cuts in Bayesian graphical models

TL;DR: It is shown that the evolution of each island is also driven by a Feynman-Kac semigroup, whose transition and potential can be explicitly related to ones of the original problem.

...read moreread less

Abstract: The approximation of the Feynman-Kac semigroups by systems of interacting particles is a very active research field, with applications in many different areas. In this paper, we study the parallelization of such approximations. The total population of particles is divided into sub-populations, referred to as islands. The particles within each island follow the usual selection/mutation dynamics. We show that the evolution of each island is also driven by a Feynman-Kac semigroup, whose transition and potential can be explicitly related to ones of the original problem. Therefore, the same genetic type approximation of the Feynman-Kac semi-group may be used at the island level; each island might undergo selection/mutation algorithm. We investigate the impact of the population size within each island and the number of islands, and study different type of interactions. We find conditions under which introducing interactions between islands is beneficial. The theoretical results are supported by some Monte Carlo experiments.

...read moreread less

98 citations

Journal Article•DOI•

[...]

Martyn Plummer¹•Institutions (1)

International Agency for Research on Cancer¹

Delayed acceptance particle MCMC for exact inference in stochastic kinetic models

TL;DR: It is shown that the MCMC algorithm applied by OpenBUGS in the presence of a cut function does not converge to a well-defined limiting distribution, but it may be improved by using tempered transitions.

...read moreread less

Abstract: The cut function defined by the OpenBUGS software is described as a "valve" that prevents feedback in Bayesian graphical models. It is shown that the MCMC algorithm applied by OpenBUGS in the presence of a cut function does not converge to a well-defined limiting distribution. However, it may be improved by using tempered transitions. The cut algorithm is compared with multiple imputation as a gold standard in a simple example.

...read moreread less

96 citations

Journal Article•DOI•

[...]

Andrew Golightly¹, Daniel A. Henderson¹, Chris Sherlock²•Institutions (2)

Newcastle University¹, Lancaster University²

A population Monte Carlo scheme with transformed weights and its application to stochastic kinetic models

TL;DR: The method is illustrated by considering inference for parameters governing a Lotka–Volterra system, a model of gene expression and a simple epidemic process to avoid expensive calculations for proposals that are likely to be rejected.

...read moreread less

Abstract: Recently-proposed particle MCMC methods provide a flexible way of performing Bayesian inference for parameters governing stochastic kinetic models defined as Markov (jump) processes (MJPs). Each iteration of the scheme requires an estimate of the marginal likelihood calculated from the output of a sequential Monte Carlo scheme (also known as a particle filter). Consequently, the method can be extremely computationally intensive. We therefore aim to avoid most instances of the expensive likelihood calculation through use of a fast approximation. We consider two approximations: the chemical Langevin equation diffusion approximation (CLE) and the linear noise approximation (LNA). Either an estimate of the marginal likelihood under the CLE, or the tractable marginal likelihood under the LNA can be used to calculate a first step acceptance probability. Only if a proposal is accepted under the approximation do we then run a sequential Monte Carlo scheme to compute an estimate of the marginal likelihood under the true MJP and construct a second stage acceptance probability that permits exact (simulation based) inference for the MJP. We therefore avoid expensive calculations for proposals that are likely to be rejected. We illustrate the method by considering inference for parameters governing a Lotka---Volterra system, a model of gene expression and a simple epidemic process.

...read moreread less

87 citations

Journal Article•DOI•

[...]

Eugenia Koblents¹, Joaquín Míguez¹•Institutions (1)

Charles III University of Madrid¹

Sequential Monte Carlo methods for Bayesian elliptic inverse problems

TL;DR: A new method is proposed that performs a nonlinear transformation of the importance weights of the Monte Carlo approximation of posterior probability distributions that avoids degeneracy and increases the efficiency of the IS scheme, specially when drawing from proposal functions which are poorly adapted to the true posterior.

...read moreread less

Abstract: This paper addresses the Monte Carlo approximation of posterior probability distributions. In particular, we consider the population Monte Carlo (PMC) technique, which is based on an iterative importance sampling (IS) approach. An important drawback of this methodology is the degeneracy of the importance weights (IWs) when the dimension of either the observations or the variables of interest is high. To alleviate this difficulty, we propose a new method that performs a nonlinear transformation of the IWs. This operation reduces the weight variation, hence it avoids degeneracy and increases the efficiency of the IS scheme, specially when drawing from proposal functions which are poorly adapted to the true posterior. For the sake of illustration, we have applied the proposed algorithm to the estimation of the parameters of a Gaussian mixture model. This is a simple problem that enables us to discuss the main features of the proposed technique. As a practical application, we have also considered the challenging problem of estimating the rate parameters of a stochastic kinetic model (SKM). SKMs are multivariate systems that model molecular interactions in biological and chemical problems. We introduce a particularization of the proposed algorithm to SKMs and present numerical results.

...read moreread less

Journal Article•DOI•

[...]

Alexandros Beskos¹, Ajay Jasra², Ege A. Muzaffer², Andrew M. Stuart³•Institutions (3)

University College London¹, National University of Singapore², University of Warwick³

TL;DR: In this article, the authors consider a Bayesian inverse problem associated to elliptic partial differential equations in two and three dimensions and prove that a basic sequential Monte Carlo (SMC) method has a Monte Carlo rate of convergence with constants which are independent of the dimension of the discretization of the problem.

...read moreread less

Abstract: In this article, we consider a Bayesian inverse problem associated to elliptic partial differential equations in two and three dimensions. This class of inverse problems is important in applications such as hydrology, but the complexity of the link function between unknown field and measurements can make it difficult to draw inference from the associated posterior. We prove that for this inverse problem a basic sequential Monte Carlo (SMC) method has a Monte Carlo rate of convergence with constants which are independent of the dimension of the discretization of the problem; indeed convergence of the SMC method is established in a function space setting. We also develop an enhancement of the SMC methods for inverse problems which were introduced in Kantas et al. (SIAM/ASA J Uncertain Quantif 2:464---489, 2014); the enhancement is designed to deal with the additional complexity of this elliptic inverse problem. The efficacy of the methodology and its desirable theoretical properties, are demonstrated for numerical examples in both two and three dimensions.

...read moreread less

Journal Article•DOI•

EM for mixtures

[...]

Jean-Patrick Baudry¹, Gilles Celeux²•Institutions (2)

University of Paris¹, University of Paris-Sud²

Minimax optimal designs via particle swarm optimization methods

TL;DR: Improvements are introduced, first, using a penalized log-likelihood of Gaussian mixture models in a Bayesian regularization perspective and, second, choosing the best among several relevant initialisation strategies which prove helpful.

...read moreread less

Abstract: Maximum likelihood through the EM algorithm is widely used to estimate the parameters in hidden structure models such as Gaussian mixture models. But the EM algorithm has well-documented drawbacks: its solution could be highly dependent from its initial position and it may fail as a result of degeneracies. We stress the practical dangers of theses limitations and how carefully they should be dealt with. Our main conclusion is that no method enables to address them satisfactory in all situations. But improvements are introduced, first, using a penalized log-likelihood of Gaussian mixture models in a Bayesian regularization perspective and, second, choosing the best among several relevant initialisation strategies. In this perspective, we also propose new recursive initialization strategies which prove helpful. They are compared with standard initialization procedures through numerical experiments and their effects on model selection criteria are analyzed.

...read moreread less

Journal Article•DOI•

[...]

Ray-Bing Chen¹, Shin-Perng Chang², Weichung Wang³, Heng-Chih Tung¹, Weng Kee Wong⁴ - Show less +1 more•Institutions (4)

National Cheng Kung University¹, Toko University², National Taiwan University³, University of California, Los Angeles⁴

Path storage in the particle filter

TL;DR: This work modify PSO techniques to find minimax optimal designs, which have been notoriously challenging to find to date even for linear models, and shows that the PSO methods can readily generate a variety of minimx optimal designs in a novel and interesting way, including adapting the algorithm to generate standardized maximin optimal designs.

...read moreread less

Abstract: Particle swarm optimization (PSO) techniques are widely used in applied fields to solve challenging optimization problems but they do not seem to have made an impact in mainstream statistical applications hitherto. PSO methods are popular because they are easy to implement and use, and seem increasingly capable of solving complicated problems without requiring any assumption on the objective function to be optimized. We modify PSO techniques to find minimax optimal designs, which have been notoriously challenging to find to date even for linear models, and show that the PSO methods can readily generate a variety of minimax optimal designs in a novel and interesting way, including adapting the algorithm to generate standardized maximin optimal designs.

...read moreread less

Journal Article•DOI•

[...]

Pierre Jacob¹, Lawrence M. Murray², Sylvain Rubenthaler•Institutions (2)

National University of Singapore¹, Commonwealth Scientific and Industrial Research Organisation²

High-dimensional regression with gaussian mixtures and partially-latent response variables

TL;DR: A theoretical result is provided bounding the expected memory cost by T+CNlogN where T is the time horizon, N is the number of particles and C is a constant, as well as an efficient algorithm to realise this.

...read moreread less

Abstract: This article considers the problem of storing the paths generated by a particle filter and more generally by a sequential Monte Carlo algorithm. It provides a theoretical result bounding the expected memory cost by T+CNlogN where T is the time horizon, N is the number of particles and C is a constant, as well as an efficient algorithm to realise this. The theoretical result and the algorithm are illustrated with numerical experiments.

...read moreread less

Journal Article•DOI•

[...]

Antoine Deleforge¹, Florence Forbes¹, Radu Horaud¹•Institutions (1)

French Institute for Research in Computer Science and Automation¹

Sparse estimation via nonconcave penalized likelihood in factor analysis model

TL;DR: In this article, an inverse regression framework is proposed, which exchanges the roles of input and response, such that the low-dimensional variable becomes the regressor, and which is tractable.

...read moreread less

Abstract: The problem of approximating high-dimensional data with a low-dimensional representation is addressed. The article makes the following contributions. An inverse regression framework is proposed, which exchanges the roles of input and response, such that the low-dimensional variable becomes the regressor, and which is tractable. A mixture of locally-linear probabilistic mapping model is introduced, that starts with estimating the parameters of the inverse regression, and follows with inferring closed-form solutions for the forward parameters of the high-dimensional regression problem of interest. Moreover, a partially-latent paradigm is introduced, such that the vector-valued response variable is composed of both observed and latent entries, thus being able to deal with data contaminated by experimental artifacts that cannot be explained with noise models. The proposed probabilistic formulation could be viewed as a latent-variable augmentation of regression. Expectation-maximization (EM) procedures are introduced, based on a data augmentation strategy which facilitates the maximum-likelihood search over the model parameters. Two augmentation schemes are proposed and the associated EM inference procedures are described in detail; they may well be viewed as generalizations of a number of EM regression, dimension reduction, and factor analysis algorithms. The proposed framework is validated with both synthetic and real data. Experimental evidence is provided that the method outperforms several existing regression techniques.

...read moreread less

Journal Article•DOI•

[...]

Kei Hirose¹, Michio Yamamoto²•Institutions (2)

Osaka University¹, Kyoto University²

Sampling from Dirichlet process mixture models with unknown concentration parameter: mixing issues in large data implementations

TL;DR: In this article, a nonconvex penalty on the factor loadings is introduced to solve the problem of sparse estimation in a factor analysis model, which can be viewed as a generalization of the traditional two-step approach and can produce sparser solutions than the rotation technique.

...read moreread less

Abstract: We consider the problem of sparse estimation in a factor analysis model. A traditional estimation procedure in use is the following two-step approach: the model is estimated by maximum likelihood method and then a rotation technique is utilized to find sparse factor loadings. However, the maximum likelihood estimates cannot be obtained when the number of variables is much larger than the number of observations. Furthermore, even if the maximum likelihood estimates are available, the rotation technique does not often produce a sufficiently sparse solution. In order to handle these problems, this paper introduces a penalized likelihood procedure that imposes a nonconvex penalty on the factor loadings. We show that the penalized likelihood procedure can be viewed as a generalization of the traditional two-step approach, and the proposed methodology can produce sparser solutions than the rotation technique. A new algorithm via the EM algorithm along with coordinate descent is introduced to compute the entire solution path, which permits the application to a wide variety of convex and nonconvex penalties. Monte Carlo simulations are conducted to investigate the performance of our modeling strategy. A real data example is also given to illustrate our procedure.

...read moreread less

Journal Article•DOI•

[...]

David I. Hastie¹, Silvia Liverani¹, Sylvia Richardson•Institutions (1)

Imperial College London¹

Comparing composite likelihood methods based on pairs for spatial Gaussian random fields

TL;DR: In this article, the Gibbs sampling algorithm is used to combine the slice sampling approach and the retrospective sampling approach of Papaspiliopoulos and Roberts, and is implemented as efficient open source C++ software, available as an R package.

...read moreread less

Abstract: We consider the question of Markov chain Monte Carlo sampling from a general stick-breaking Dirichlet process mixture model, with concentration parameter $$\alpha $$?. This paper introduces a Gibbs sampling algorithm that combines the slice sampling approach of Walker (Communications in Statistics - Simulation and Computation 36:45---54, 2007) and the retrospective sampling approach of Papaspiliopoulos and Roberts (Biometrika 95(1):169---186, 2008). Our general algorithm is implemented as efficient open source C++ software, available as an R package, and is based on a blocking strategy similar to that suggested by Papaspiliopoulos (A note on posterior sampling from Dirichlet mixture models, 2008) and implemented by Yau et al. (Journal of the Royal Statistical Society, Series B (Statistical Methodology) 73:37---57, 2011). We discuss the difficulties of achieving good mixing in MCMC samplers of this nature in large data sets and investigate sensitivity to initialisation. We additionally consider the challenges when an additional layer of hierarchy is added such that joint inference is to be made on $$\alpha $$?. We introduce a new label-switching move and compute the marginal partition posterior to help to surmount these difficulties. Our work is illustrated using a profile regression (Molitor et al. Biostatistics 11(3):484---498, 2010) application, where we demonstrate good mixing behaviour for both synthetic and real examples.

...read moreread less

Journal Article•DOI•

[...]

Moreno Bevilacqua¹, Carlo Gaetan²•Institutions (2)

Valparaiso University¹, Ca' Foscari University of Venice²

Particle Metropolis---Hastings using gradient and Hessian information

TL;DR: In this article, three types of weighted composite likelihood functions based on pairs are compared and compared with the method of covariance tapering, and asymptotic properties of the three estimation methods are derived.

...read moreread less

Abstract: In the last years there has been a growing interest in proposing methods for estimating covariance functions for geostatistical data. Among these, maximum likelihood estimators have nice features when we deal with a Gaussian model. However maximum likelihood becomes impractical when the number of observations is very large. In this work we review some solutions and we contrast them in terms of loss of statistical efficiency and computational burden. Specifically we focus on three types of weighted composite likelihood functions based on pairs and we compare them with the method of covariance tapering. Asymptotic properties of the three estimation methods are derived. We illustrate the effectiveness of the methods through theoretical examples, simulation experiments and by analyzing a data set on yearly total precipitation anomalies at weather stations in the United States.

...read moreread less

Journal Article•DOI•

[...]

Johan Dahlin¹, Fredrik Lindsten², Thomas B. Schön³•Institutions (3)

Linköping University¹, University of Cambridge², Uppsala University³

A simulated annealing approach to approximate Bayes computations

TL;DR: This work proposes a number of alternative versions of PMH that incorporate gradient and Hessian information about the posterior into the proposal, and shows how to estimate the required information using a fixed-lag particle smoother, with a computational cost growing linearly in the number of particles.

...read moreread less

Abstract: Particle Metropolis---Hastings (PMH) allows for Bayesian parameter inference in nonlinear state space models by combining Markov chain Monte Carlo (MCMC) and particle filtering. The latter is used to estimate the intractable likelihood. In its original formulation, PMH makes use of a marginal MCMC proposal for the parameters, typically a Gaussian random walk. However, this can lead to a poor exploration of the parameter space and an inefficient use of the generated particles. We propose a number of alternative versions of PMH that incorporate gradient and Hessian information about the posterior into the proposal. This information is more or less obtained as a byproduct of the likelihood estimation. Indeed, we show how to estimate the required information using a fixed-lag particle smoother, with a computational cost growing linearly in the number of particles. We conclude that the proposed methods can: (i) decrease the length of the burn-in phase, (ii) increase the mixing of the Markov chain at the stationary phase, and (iii) make the proposal distribution scale invariant which simplifies tuning.

...read moreread less

Journal Article•DOI•

[...]

Carlo Albert¹, Hans R. Künsch², Andreas Scheidegger¹•Institutions (2)

Swiss Federal Institute of Aquatic Science and Technology¹, ETH Zurich²

Fast smoothing parameter separation in multidimensional generalized P-splines: the SAP algorithm

TL;DR: This work presents a new class of particle algorithms for ABC, based on a sequence of Metropolis kernels, associated with a decreasing sequence of tolerances w.r.t. the data, which aims at converging as close as possible to the correct result with as few system updates as possible via minimizing the entropy production of the process.

...read moreread less

Abstract: Approximate Bayes computations (ABC) are used for parameter inference when the likelihood function of the model is expensive to evaluate but relatively cheap to sample from. In particle ABC, an ensemble of particles in the product space of model outputs and parameters is propagated in such a way that its output marginal approaches a delta function at the data and its parameter marginal approaches the posterior distribution. Inspired by Simulated Annealing, we present a new class of particle algorithms for ABC, based on a sequence of Metropolis kernels, associated with a decreasing sequence of tolerances w.r.t. the data. Unlike other algorithms, our class of algorithms is not based on importance sampling. Hence, it does not suffer from a loss of effective sample size due to re-sampling. We prove convergence under a condition on the speed at which the tolerance is decreased. Furthermore, we present a scheme that adapts the tolerance and the jump distribution in parameter space according to some mean-fields of the ensemble, which preserves the statistical independence of the particles, in the limit of infinite sample size. This adaptive scheme aims at converging as close as possible to the correct result with as few system updates as possible via minimizing the entropy production of the process. The performance of this new class of algorithms is compared against two other recent algorithms on two toy examples as well as on a real-world example from genetics.

...read moreread less

Journal Article•DOI•

[...]

María Xosé Rodríguez-Álvarez¹, Dae-Jin Lee², Thomas Kneib³, María Durbán⁴, Paul H. C. Eilers⁵ - Show less +1 more•Institutions (5)

University of Vigo¹, Basque Center for Applied Mathematics², University of Göttingen³, Charles III University of Madrid⁴, Erasmus University Rotterdam⁵

Scalable estimation strategies based on stochastic approximations: classical results and new insights

TL;DR: The proposed algorithm can be seen as a generalization of the algorithm by Schall (1991)—for variance components estimation—to deal with non-standard structures of the covariance matrix of the random effects.

...read moreread less

Abstract: A new computational algorithm for estimating the smoothing parameters of a multidimensional penalized spline generalized linear model with anisotropic penalty is presented. This new proposal is based on the mixed model representation of a multidimensional P-spline, in which the smoothing parameter for each covariate is expressed in terms of variance components. On the basis of penalized quasi-likelihood methods, closed-form expressions for the estimates of the variance components are obtained. This formulation leads to an efficient implementation that considerably reduces the computational burden. The proposed algorithm can be seen as a generalization of the algorithm by Schall (1991)--for variance components estimation--to deal with non-standard structures of the covariance matrix of the random effects. The practical performance of the proposed algorithm is evaluated by means of simulations, and comparisons with alternative methods are made on the basis of the mean square error criterion and the computing time. Finally, we illustrate our proposal with the analysis of two real datasets: a two dimensional example of historical records of monthly precipitation data in USA and a three dimensional one of mortality data from respiratory disease according to the age at death, the year of death and the month of death.

...read moreread less

Journal Article•DOI•

[...]

Panos Toulis¹, Edoardo M. Airoldi¹•Institutions (1)

Harvard University¹

Pre-processing for approximate Bayesian computation in image analysis

TL;DR: Stochastic gradient methods are argued to be poised to become benchmark principled estimation procedures for large datasets, especially those in the family of stable proximal methods, such as implicit stochastic gradient descent.

...read moreread less

Abstract: Estimation with large amounts of data can be facilitated by stochastic gradient methods, in which model parameters are updated sequentially using small batches of data at each step. Here, we review early work and modern results that illustrate the statistical properties of these methods, including convergence rates, stability, and asymptotic bias and variance. We then overview modern applications where these methods are useful, ranging from an online version of the EM algorithm to deep learning. In light of these results, we argue that stochastic gradient methods are poised to become benchmark principled estimation procedures for large datasets, especially those in the family of stable proximal methods, such as implicit stochastic gradient descent.

...read moreread less

Journal Article•DOI•

[...]

Matthew T. Moores¹, Christopher C. Drovandi¹, Kerrie Mengersen¹, Christian P. Robert²•Institutions (2)

Queensland University of Technology¹, University of Warwick²

Quantitative bounds of convergence for geometrically ergodic Markov chain in the Wasserstein distance with application to the Metropolis Adjusted Langevin Algorithm

TL;DR: This work applies ABC with a synthetic likelihood to the hidden Potts model with additive Gaussian noise, and demonstrates that the precomputed binding function dramatically improves the scalability of ABC.

...read moreread less

Abstract: Most of the existing algorithms for approximate Bayesian computation (ABC) assume that it is feasible to simulate pseudo-data from the model at each iteration. However, the computational cost of these simulations can be prohibitive for high dimensional data. An important example is the Potts model, which is commonly used in image analysis. Images encountered in real world applications can have millions of pixels, therefore scalability is a major concern. We apply ABC with a synthetic likelihood to the hidden Potts model with additive Gaussian noise. Using a pre-processing step, we fit a binding function to model the relationship between the model parameters and the synthetic likelihood parameters. Our numerical experiments demonstrate that the precomputed binding function dramatically improves the scalability of ABC, reducing the average runtime required for model fitting from 71 h to only 7 min. We also illustrate the method by estimating the smoothing parameter for remotely sensed satellite imagery. Without precomputation, Bayesian inference is impractical for datasets of that scale.

...read moreread less

Journal Article•DOI•

[...]

Alain Durmus¹, Eric Moulines²•Institutions (2)

École normale supérieure de Cachan¹, Institut Mines-Télécom²

Avoiding spurious local maximizers in mixture modeling

TL;DR: The proposed rate of convergence leads to useful insights for the analysis of MCMC algorithms, and suggests ways to construct sampler with good mixing rate even if the dimension of the underlying sampling space is large.

...read moreread less

Abstract: In this paper, we establish explicit convergence rates for Markov chains in Wasserstein distance. Compared to the more classical total variation bounds, the proposed rate of convergence leads to useful insights for the analysis of MCMC algorithms, and suggests ways to construct sampler with good mixing rate even if the dimension of the underlying sampling space is large. We illustrate these results by analyzing the Exponential Integrator version of the Metropolis Adjusted Langevin Algorithm. We illustrate our findings using a Bayesian linear inverse problem.

...read moreread less

Journal Article•DOI•

[...]

Luis Angel García-Escudero¹, Alfonso Gordaliza¹, Carlos Matrán¹, Agustín Mayo-Iscar¹•Institutions (1)

University of Valladolid¹

01 May 2015-Statistics and Computing

TL;DR: This work presents a constrained mixture fitting approach that allows for monitoring solutions in terms of the constant involved in the restrictions, which yields a natural way to discard spurious solutions and a valuable tool for data analysts.

...read moreread less

Abstract: The maximum likelihood estimation in the finite mixture of distributions setting is an ill-posed problem that is treatable, in practice, through the EM algorithm. However, the existence of spurious solutions (singularities and non-interesting local maximizers) makes difficult to find sensible mixture fits for non-expert practitioners. In this work, a constrained mixture fitting approach is presented with the aim of overcoming the troubles introduced by spurious solutions. Sound mathematical support is provided and, which is more relevant in practice, a feasible algorithm is also given. This algorithm allows for monitoring solutions in terms of the constant involved in the restrictions, which yields a natural way to discard spurious solutions and a valuable tool for data analysts.

...read moreread less

Journal Article•DOI•

Regularised PCA to denoise and visualise data

[...]

Marie Verbanck¹, Julie Josse¹, François Husson¹•Institutions (1)

Agrocampus Ouest¹

Posterior inference on parameters of stochastic differential equations via non-linear Gaussian filtering and adaptive MCMC

TL;DR: In this paper, a regularised version of PCA is proposed, which selects a certain number of dimensions and shrinks the corresponding singular values, and each singular value is multiplied by a term which can be seen as the ratio of the signal variance over the total variance of the associated dimension.

...read moreread less

Abstract: Principal component analysis (PCA) is a well-established dimensionality reduction method commonly used to denoise and visualise data. A classical PCA model is the fixed effect model in which data are generated as a fixed structure of low rank corrupted by noise. Under this model, PCA does not provide the best recovery of the underlying signal in terms of mean squared error. Following the same principle as in ridge regression, we suggest a regularised version of PCA that essentially selects a certain number of dimensions and shrinks the corresponding singular values. Each singular value is multiplied by a term which can be seen as the ratio of the signal variance over the total variance of the associated dimension. The regularised term is analytically derived using asymptotic results and can also be justified from a Bayesian treatment of the model. Regularised PCA provides promising results in terms of the recovery of the true signal and the graphical outputs in comparison with classical PCA and with a soft thresholding estimation strategy. The distinction between PCA and regularised PCA becomes especially important in the case of very noisy data.

...read moreread less

Journal Article•DOI•

[...]

Simo Särkkä, Jouni Hartikainen, Isambi S. Mbalawata, Heikki Haario

Variable selection and estimation for partially linear single-index models with longitudinal data

TL;DR: The results indicate that the sigma-point based Gaussian approximations lead to better approxims of the parameter posterior distribution than the Taylor series, and the accuracy of the approxims is comparable to that of the computationally significantly heavier particle MCMC approximation.

...read moreread less

Abstract: This article is concerned with Bayesian estimation of parameters in non-linear multivariate stochastic differential equation (SDE) models occurring, for example, in physics, engineering, and financial applications. In particular, we study the use of adaptive Markov chain Monte Carlo (AMCMC) based numerical integration methods with non-linear Kalman-type approximate Gaussian filters for parameter estimation in non-linear SDEs. We study the accuracy and computational efficiency of gradient-free sigma-point approximations (Gaussian quadratures) in the context of parameter estimation, and compare them with Taylor series and particle MCMC approximations. The results indicate that the sigma-point based Gaussian approximations lead to better approximations of the parameter posterior distribution than the Taylor series, and the accuracy of the approximations is comparable to that of the computationally significantly heavier particle MCMC approximations.

...read moreread less

Journal Article•DOI•

[...]

Gaorong Li¹, Peng Lai², Heng Lian³•Institutions (3)

Beijing University of Technology¹, Nanjing University of Information Science and Technology², Nanyang Technological University³

01 May 2015-Statistics and Computing

TL;DR: In this article, a penalized procedure combined with two bias correction methods was proposed to deal with the variable selection problem in the partially linear single-index models with longitudinal data. But the bias correction method was not considered in this paper.

...read moreread less

Abstract: In this paper, we consider the partially linear single-index models with longitudinal data. To deal with the variable selection problem in this context, we propose a penalized procedure combined with two bias correction methods, resulting in the bias-corrected generalized estimating equation and the bias-corrected quadratic inference function, which can take into account the correlations. Asymptotic properties of these methods are demonstrated. We also evaluate the finite sample performance of the proposed methods via Monte Carlo simulation studies and a real data analysis.

...read moreread less

Journal Article•DOI•

Bivariate conditioning approximations for multivariate normal probabilities

[...]

Giang Trinh¹, Alan Genz¹•Institutions (1)

Washington State University¹