Open accessJournal Article

# Large-sample asymptotics of the pseudo-marginal method

02 Mar 2021-Biometrika (Oxford University Press (OUP))-Vol. 108, Iss: 1, pp 37-51
Abstract: SummaryThe pseudo-marginal algorithm is a variant of the Metropolis–Hastings algorithm which samples asymptotically from a probability distribution when it is only possible to estimate unbiasedly an unnormalized version of its density. Practically, one has to trade off the computational resources used to obtain this estimator against the asymptotic variances of the ergodic averages obtained by the pseudo-marginal algorithm. Recent works on optimizing this trade-off rely on some strong assumptions, which can cast doubts over their practical relevance. In particular, they all assume that the distribution of the difference between the log-density, and its estimate is independent of the parameter value at which it is evaluated. Under regularity conditions we show that as the number of data points tends to infinity, a space-rescaled version of the pseudo-marginal chain converges weakly to another pseudo-marginal chain for which this assumption indeed holds. A study of this limiting chain allows us to provide parameter dimension-dependent guidelines on how to optimally scale a normal random walk proposal, and the number of Monte Carlo samples for the pseudo-marginal method in the large-sample regime. These findings complement and validate currently available results.

##### Citations
More

15 results found

Open accessJournal Article
Abstract: Hamiltonian Monte Carlo (HMC) samples efficiently from high-dimensional posterior distributions with proposed parameter draws obtained by iterating on a discretized version of the Hamiltonian dynam ...

Topics: Hybrid Monte Carlo (70%),

39 Citations

Open accessPosted Content
08 Jul 2016-arXiv: Methodology
Abstract: Bayesian inference in the presence of an intractable likelihood function is computationally challenging. When following a Markov chain Monte Carlo (MCMC) approach to approximate the posterior distribution in this context, one typically either uses MCMC schemes which target the joint posterior of the parameters and some auxiliary latent variables, or pseudo-marginal Metropolis--Hastings (MH) schemes. The latter mimic a MH algorithm targeting the marginal posterior of the parameters by approximating unbiasedly the intractable likelihood. However, in scenarios where the parameters and auxiliary variables are strongly correlated under the posterior and/or this posterior is multimodal, Gibbs sampling or Hamiltonian Monte Carlo (HMC) will perform poorly and the pseudo-marginal MH algorithm, as any other MH scheme, will be inefficient for high dimensional parameters. We propose here an original MCMC algorithm, termed pseudo-marginal HMC, which combines the advantages of both HMC and pseudo-marginal schemes. Specifically, the pseudo-marginal HMC method is controlled by a precision parameter N, controlling the approximation of the likelihood and, for any N, it samples the marginal posterior of the parameters. Additionally, as N tends to infinity, its sample trajectories and acceptance probability converge to those of an ideal, but intractable, HMC algorithm which would have access to the marginal posterior of parameters and its gradient. We demonstrate through experiments that pseudo-marginal HMC can outperform significantly both standard HMC and pseudo-marginal MH schemes.

25 Citations

Open accessProceedings Article
11 Apr 2019-
Abstract: We consider the approximation of expectations with respect to the distribution of a latent Markov process given noisy measurements. This is known as the smoothing problem and is often approached with particle and Markov chain Monte Carlo (MCMC) methods. These methods provide consistent but biased estimators when run for a finite time. We propose a simple way of coupling two MCMC chains built using Particle Independent Metropolis-Hastings (PIMH) to produce unbiased smoothing estimators. Unbiased estimators are appealing in the context of parallel computing, and facilitate the construction of confidence intervals. The proposed scheme only requires access to off-the-shelf Particle Filters (PF) and is thus easier to implement than recently proposed unbiased smoothers. The approach is demonstrated on a Levy-driven stochastic volatility model and a stochastic kinetic model.

16 Citations

Open accessJournal Article
Abstract: Performing numerical integration when the integrand itself cannot be evaluated point-wise is a challenging task that arises in statistical analysis, notably in Bayesian inference for models with intractable likelihood functions. Markov chain Monte Carlo (MCMC) algorithms have been proposed for this setting, such as the pseudo-marginal method for latent variable models and the exchange algorithm for a class of undirected graphical models. As with any MCMC algorithm, the resulting estimators are justified asymptotically in the limit of the number of iterations, but exhibit a bias for any fixed number of iterations due to the Markov chains starting outside of stationarity. This “burn-in” bias is known to complicate the use of parallel processors for MCMC computations. We show how to use coupling techniques to generate unbiased estimators in finite time, building on recent advances for generic MCMC algorithms. We establish the theoretical validity of some of these procedures, by extending existing results to cover the case of polynomially ergodic Markov chains. The efficiency of the proposed estimators is compared with that of standard MCMC estimators, with theoretical arguments and numerical experiments including state space models and Ising models.

16 Citations

Open accessPosted Content
02 Aug 2017-arXiv: Computation
Abstract: Hamiltonian Monte Carlo (HMC) samples efficiently from high-dimensional posterior distributions with proposed parameter draws obtained by iterating on a discretized version of the Hamiltonian dynamics. The iterations make HMC computationally costly, especially in problems with large datasets, since it is necessary to compute posterior densities and their derivatives with respect to the parameters. Naively computing the Hamiltonian dynamics on a subset of the data causes HMC to lose its key ability to generate distant parameter proposals with high acceptance probability. The key insight in our article is that efficient subsampling HMC for the parameters is possible if both the dynamics and the acceptance probability are computed from the same data subsample in each complete HMC iteration. We show that this is possible to do in a principled way in a HMC-within-Gibbs framework where the subsample is updated using a pseudo marginal MH step and the parameters are then updated using an HMC step, based on the current subsample. We show that our subsampling methods are fast and compare favorably to two popular sampling algorithms that utilize gradient estimates from data subsampling. We also explore the current limitations of subsampling HMC algorithms by varying the quality of the variance reducing control variates used in the estimators of the posterior density and its gradients.

Topics: Hybrid Monte Carlo (55%), Control variates (54%), Gibbs sampling (51%) ... show more

13 Citations

##### References
More

40 results found

Open accessJournal Article
01 Jan 2014-MSOR connections
Abstract: Copyright (©) 1999–2012 R Foundation for Statistical Computing. Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are preserved on all copies. Permission is granted to copy and distribute modified versions of this manual under the conditions for verbatim copying, provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one. Permission is granted to copy and distribute translations of this manual into another language, under the above conditions for modified versions, except that this permission notice may be stated in a translation approved by the R Core Team.

Topics:

229,202 Citations

Journal Article
Abstract: There are two formalisms for mathematically describing the time behavior of a spatially homogeneous chemical system: The deterministic approach regards the time evolution as a continuous, wholly predictable process which is governed by a set of coupled, ordinary differential equations (the “reaction-rate equations”); the stochastic approach regards the time evolution as a kind of random-walk process which is governed by a single differential-difference equation (the “master equation”). Fairly simple kinetic theory arguments show that the stochastic formulation of chemical kinetics has a firmer physical basis than the deterministic formulation, but unfortunately the stochastic master equation is often mathematically intractable. There is, however, a way to make exact numerical calculations within the framework of the stochastic formulation without having to deal with the master equation directly. It is a relatively simple digital computer algorithm which uses a rigorously derived Monte Carlo procedure to numerically simulate the time evolution of the given chemical system. Like the master equation, this “stochastic simulation algorithm” correctly accounts for the inherent fluctuations and correlations that are necessarily ignored in the deterministic formulation. In addition, unlike most procedures for numerically solving the deterministic reaction-rate equations, this algorithm never approximates infinitesimal time increments df by finite time steps At. The feasibility and utility of the simulation algorithm are demonstrated by applying it to several well-known model chemical systems, including the Lotka model, the Brusselator, and the Oregonator.

Topics: Tau-leaping (56%), , Gillespie algorithm (51%)

9,512 Citations

Open accessBook
01 Jan 1997-
Abstract: * Measure Theory-Basic Notions * Measure Theory-Key Results * Processes, Distributions, and Independence * Random Sequences, Series, and Averages * Characteristic Functions and Classical Limit Theorems * Conditioning and Disintegration * Martingales and Optional Times * Markov Processes and Discrete-Time Chains * Random Walks and Renewal Theory * Stationary Processes and Ergodic Theory * Special Notions of Symmetry and Invariance * Poisson and Pure Jump-Type Markov Processes * Gaussian Processes and Brownian Motion * Skorohod Embedding and Invariance Principles * Independent Increments and Infinite Divisibility * Convergence of Random Processes, Measures, and Sets * Stochastic Integrals and Quadratic Variation * Continuous Martingales and Brownian Motion * Feller Processes and Semigroups * Ergodic Properties of Markov Processes * Stochastic Differential Equations and Martingale Problems * Local Time, Excursions, and Additive Functionals * One-Dimensional SDEs and Diffusions * Connections with PDEs and Potential Theory * Predictability, Compensation, and Excessive Functions * Semimartingales and General Stochastic Integration * Large Deviations * Appendix 1: Advanced Measure Theory * Appendix 2: Some Special Spaces * Historical and Bibliographical Notes * Bibliography * Indices

4,248 Citations

Book Chapter
01 Jan 2011-
Abstract: The author's preface gives an outline: "This book is about weakconvergence methods in metric spaces, with applications sufficient to show their power and utility. The Introduction motivates the definitions and indicates how the theory will yield solutions to problems arising outside it. Chapter 1 sets out the basic general theorems, which are then specialized in Chapter 2 to the space C[0, l ] of continuous functions on the unit interval and in Chapter 3 to the space D [0, 1 ] of functions with discontinuities of the first kind. The results of the first three chapters are used in Chapter 4 to derive a variety of limit theorems for dependent sequences of random variables. " The book develops and expands on Donsker's 1951 and 1952 papers on the invariance principle and empirical distributions. The basic random variables remain real-valued although, of course, measures on C[0, l ] and D[0, l ] are vitally used. Within this framework, there are various possibilities for a different and apparently better treatment of the material. More of the general theory of weak convergence of probabilities on separable metric spaces would be useful. Metrizability of the convergence is not brought up until late in the Appendix. The close relation of the Prokhorov metric and a metric for convergence in probability is (hence) not mentioned (see V. Strassen, Ann. Math. Statist. 36 (1965), 423-439; the reviewer, ibid. 39 (1968), 1563-1572). This relation would illuminate and organize such results as Theorems 4.1, 4.2 and 4.4 which give isolated, ad hoc connections between weak convergence of measures and nearness in probability. In the middle of p. 16, it should be noted that C*(S) consists of signed measures which need only be finitely additive if 5 is not compact. On p. 239, where the author twice speaks of separable subsets having nonmeasurable cardinal, he means "discrete" rather than "separable." Theorem 1.4 is Ulam's theorem that a Borel probability on a complete separable metric space is tight. Theorem 1 of Appendix 3 weakens completeness to topological completeness. After mentioning that probabilities on the rationals are tight, the author says it is an

3,119 Citations

Open accessJournal Article
Abstract: This paper considers the problem of scaling the proposal distribution of a multidimensional random walk Metropolis algorithm in order to maximize the efficiency of the algorithm. The main result is a weak convergence result as the dimension of a sequence of target densities, n, converges to $\infty$. When the proposal variance is appropriately scaled according to n, the sequence of stochastic processes formed by the first component of each Markov chain converges to the appropriate limiting Langevin diffusion process. The limiting diffusion approximation admits a straightforward efficiency maximization problem, and the resulting asymptotically optimal policy is related to the asymptotic acceptance rate of proposed moves for the algorithm. The asymptotically optimal acceptance rate is 0.234 under quite general conditions. The main result is proved in the case where the target density has a symmetric product form. Extensions of the result are discussed.

1,639 Citations

##### Performance
###### Metrics
No. of citations received by the Paper in previous years
YearCitations
20214
20203
20194
20181
20171
20161
##### Network Information
###### Related Papers (5)
Large Sample Asymptotics of the Pseudo-Marginal Method26 Jun 2018, arXiv: Computation

Sebastian M. Schmon, George Deligiannidis +2 more

100% related
The pseudo-marginal approach for efficient Monte Carlo computations01 Apr 2009, Annals of Statistics

Christophe Andrieu, Gareth O. Roberts

75% related
Explicit and combined estimators for stable distributions parameters14 Nov 2018

Jacques Lévy Véhel, Anne Philippe +1 more

71% related
Consistency of the PLFit estimator for power-law data17 Feb 2020, arXiv: Statistics Theory

Ayan Bhattacharya, Bohan Chen +2 more

70% related