scispace - formally typeset
Search or ask a question
Posted Content

Pseudo-Marginal Hamiltonian Monte Carlo

TL;DR: An original MCMC algorithm, termed pseudo-marginal HMC, is proposed, which approximates the HMC algorithm targeting the marginal posterior of the parameters and can outperform significantly both standard HMC and pseudo- Marginal MH schemes.
Abstract: Bayesian inference in the presence of an intractable likelihood function is computationally challenging. When following a Markov chain Monte Carlo (MCMC) approach to approximate the posterior distribution in this context, one typically either uses MCMC schemes which target the joint posterior of the parameters and some auxiliary latent variables, or pseudo-marginal Metropolis--Hastings (MH) schemes. The latter mimic a MH algorithm targeting the marginal posterior of the parameters by approximating unbiasedly the intractable likelihood. However, in scenarios where the parameters and auxiliary variables are strongly correlated under the posterior and/or this posterior is multimodal, Gibbs sampling or Hamiltonian Monte Carlo (HMC) will perform poorly and the pseudo-marginal MH algorithm, as any other MH scheme, will be inefficient for high dimensional parameters. We propose here an original MCMC algorithm, termed pseudo-marginal HMC, which combines the advantages of both HMC and pseudo-marginal schemes. Specifically, the pseudo-marginal HMC method is controlled by a precision parameter N, controlling the approximation of the likelihood and, for any N, it samples the marginal posterior of the parameters. Additionally, as N tends to infinity, its sample trajectories and acceptance probability converge to those of an ideal, but intractable, HMC algorithm which would have access to the marginal posterior of parameters and its gradient. We demonstrate through experiments that pseudo-marginal HMC can outperform significantly both standard HMC and pseudo-marginal MH schemes.
Citations
More filters
Journal Article
TL;DR: The methodology proposed automatically adapts to the local structure when simulating paths across this manifold, providing highly efficient convergence and exploration of the target density, and substantial improvements in the time‐normalized effective sample size are reported when compared with alternative sampling approaches.
Abstract: The paper proposes Metropolis adjusted Langevin and Hamiltonian Monte Carlo sampling methods defined on the Riemann manifold to resolve the shortcomings of existing Monte Carlo algorithms when sampling from target densities that may be high dimensional and exhibit strong correlations. The methods provide fully automated adaptation mechanisms that circumvent the costly pilot runs that are required to tune proposal densities for Metropolis-Hastings or indeed Hamiltonian Monte Carlo and Metropolis adjusted Langevin algorithms. This allows for highly efficient sampling even in very high dimensions where different scalings may be required for the transient and stationary phases of the Markov chain. The methodology proposed exploits the Riemann geometry of the parameter space of statistical models and thus automatically adapts to the local structure when simulating paths across this manifold, providing highly efficient convergence and exploration of the target density. The performance of these Riemann manifold Monte Carlo methods is rigorously assessed by performing inference on logistic regression models, log-Gaussian Cox point processes, stochastic volatility models and Bayesian estimation of dynamic systems described by non-linear differential equations. Substantial improvements in the time-normalized effective sample size are reported when compared with alternative sampling approaches. MATLAB code that is available from http://www.ucl.ac.uk/statistics/research/rmhmc allows replication of all the results reported.

1,031 citations

Posted Content
TL;DR: The sequential Monte Carlo (SMC) method as mentioned in this paper is a random sampling-based class of methods for approximate inference, which can be used for variational inference and inference evaluation.
Abstract: A core problem in statistics and probabilistic machine learning is to compute probability distributions and expectations. This is the fundamental problem of Bayesian statistics and machine learning, which frames all inference as expectations with respect to the posterior distribution. The key challenge is to approximate these intractable expectations. In this tutorial, we review sequential Monte Carlo (SMC), a random-sampling-based class of methods for approximate inference. First, we explain the basics of SMC, discuss practical issues, and review theoretical results. We then examine two of the main user design choices: the proposal distributions and the so called intermediate target distributions. We review recent results on how variational inference and amortization can be used to learn efficient proposals and target distributions. Next, we discuss the SMC estimate of the normalizing constant, how this can be used for pseudo-marginal inference and inference evaluation. Throughout the tutorial we illustrate the use of SMC on various models commonly used in machine learning, such as stochastic recurrent neural networks, probabilistic graphical models, and probabilistic programs.

43 citations

Journal Article
TL;DR: In this article, the Hamiltonian Monte Carlo (HMC) samples efficiently from high-dimensional posterior distributions with proposed parameter draws obtained by iterating on a discretized version of the HMC.
Abstract: Hamiltonian Monte Carlo (HMC) samples efficiently from high-dimensional posterior distributions with proposed parameter draws obtained by iterating on a discretized version of the Hamiltonian dynam ...

39 citations

Posted Content
TL;DR: It is shown under regularity conditions that the parameters of this scheme can be selected such that the relative variance of this likelihood ratio estimator is controlled when N increases sublinearly with T and the efficiency of computations for Bayesian inference relative to the pseudomarginal method empirically increases with T.
Abstract: The pseudo-marginal algorithm is a popular variant of the Metropolis--Hastings scheme which allows us to sample asymptotically from a target probability density $\pi$, when we are only able to estimate an unnormalized version of $\pi$ pointwise unbiasedly. It has found numerous applications in Bayesian statistics as there are many scenarios where the likelihood function is intractable but can be estimated unbiasedly using Monte Carlo samples. Using many samples will typically result in averages computed under this chain with lower asymptotic variances than the corresponding averages that use fewer samples. For a fixed computing time, it has been shown in several recent contributions that an efficient implementation of the pseudo-marginal method requires the variance of the log-likelihood ratio estimator appearing in the acceptance probability of the algorithm to be of order 1, which in turn usually requires scaling the number $N$ of Monte Carlo samples linearly with the number $T$ of data points. We propose a modification of the pseudo-marginal algorithm, termed the correlated pseudo-marginal algorithm, which is based on a novel log-likelihood ratio estimator computed using the difference of two positively correlated log-likelihood estimators. We show that the parameters of this scheme can be selected such that the variance of this estimator is order $1$ as $N,T\rightarrow\infty$ whenever $N/T\rightarrow 0$. By combining these results with the Bernstein-von Mises theorem, we provide an analysis of the performance of the correlated pseudo-marginal algorithm in the large $T$ regime. In our numerical examples, the efficiency of computations is increased relative to the standard pseudo-marginal algorithm by more than 20 fold for values of $T$ of a few hundreds to more than 100 fold for values of $T$ of around 10,000-20,000.

31 citations

Posted Content
TL;DR: In this article, the authors consider how different choices of kinetic energy in Hamiltonian Monte Carlo algorithms affect algorithm performance and show that the standard choice which results in a Gaussian momentum distribution is not always optimal in terms of either robustness or efficiency.
Abstract: We consider how different choices of kinetic energy in Hamiltonian Monte Carlo affect algorithm performance. To this end, we introduce two quantities which can be easily evaluated, the composite gradient and the implicit noise. Results are established on integrator stability and geometric convergence, and we show that choices of kinetic energy that result in heavy-tailed momentum distributions can exhibit an undesirable negligible moves property, which we define. A general efficiency-robustness trade off is outlined, and implementations which rely on approximate gradients are also discussed. Two numerical studies illustrate our theoretical findings, showing that the standard choice which results in a Gaussian momentum distribution is not always optimal in terms of either robustness or efficiency.

28 citations

References
More filters
Proceedings Article
01 Jan 2014
TL;DR: A stochastic variational inference and learning algorithm that scales to large datasets and, under some mild differentiability conditions, even works in the intractable case is introduced.
Abstract: How can we perform efficient inference and learning in directed probabilistic models, in the presence of continuous latent variables with intractable posterior distributions, and large datasets? We introduce a stochastic variational inference and learning algorithm that scales to large datasets and, under some mild differentiability conditions, even works in the intractable case. Our contributions is two-fold. First, we show that a reparameterization of the variational lower bound yields a lower bound estimator that can be straightforwardly optimized using standard stochastic gradient methods. Second, we show that for i.i.d. datasets with continuous latent variables per datapoint, posterior inference can be made especially efficient by fitting an approximate inference model (also called a recognition model) to the intractable posterior using the proposed lower bound estimator. Theoretical advantages are reflected in experimental results.

20,769 citations


"Pseudo-Marginal Hamiltonian Monte C..." refers methods in this paper

  • ...This is closely related to the reparametrization trick commonly used in variational inference for unbiased gradient estimation [Kingma and Welling, 2014]....

    [...]

Journal ArticleDOI
TL;DR: In this article, a hybrid (molecular dynamics/Langevin) algorithm is used to guide a Monte Carlo simulation of lattice field theory, which is especially efficient for quantum chromodynamics which contain fermionic degrees of freedom.

3,377 citations


"Pseudo-Marginal Hamiltonian Monte C..." refers background in this paper

  • ...Hamiltonian Monte Carlo (HMC) methods (Duane et al., 1987) offer a possible remedy, but can also struggle in cases where there are strong non-linear dependencies between variables, or when the joint posterior is multimodal (Neal, 2011, Section 5....

    [...]

  • ...Hamiltonian Monte Carlo (HMC) methods (Duane et al., 1987) offer a possible remedy, but can also struggle in cases where there are strong non-linear dependencies between variables, or when the joint posterior is multimodal (Neal, 2011, Section 5.5.7)....

    [...]

Journal ArticleDOI
TL;DR: Discrete Choice Methods with Simulation by Kenneth Train has been available in the second edition since 2009 and contains two additional chapters, one on endogenous regressors and one on the expectation–maximization (EM) algorithm.
Abstract: Discrete Choice Methods with Simulation by Kenneth Train has been available in the second edition since 2009. The book is published by Cambridge University Press and is also available for download ...

2,977 citations


"Pseudo-Marginal Hamiltonian Monte C..." refers background in this paper

  • ...For example, discrete choice models are a widely popular class of models in health economics, e-commerce, marketing and social sciences used to analyze choices made by consumers, individuals or businesses (Train, 2009)....

    [...]

BookDOI
TL;DR: In this paper, the authors discuss theoretical and practical aspects of Hamiltonian Monte Carlo, and present some of its variations, including using windows of states for deciding on acceptance or rejection, computing trajectories using fast approximations, tempering during the course of a trajectory to handle isolated modes, and short-cut methods that prevent useless trajectories from taking much computation time.
Abstract: Hamiltonian dynamics can be used to produce distant proposals for the Metropolis algorithm, thereby avoiding the slow exploration of the state space that results from the diffusive behaviour of simple random-walk proposals. Though originating in physics, Hamiltonian dynamics can be applied to most problems with continuous state spaces by simply introducing fictitious "momentum" variables. A key to its usefulness is that Hamiltonian dynamics preserves volume, and its trajectories can thus be used to define complex mappings without the need to account for a hard-to-compute Jacobian factor - a property that can be exactly maintained even when the dynamics is approximated by discretizing time. In this review, I discuss theoretical and practical aspects of Hamiltonian Monte Carlo, and present some of its variations, including using windows of states for deciding on acceptance or rejection, computing trajectories using fast approximations, tempering during the course of a trajectory to handle isolated modes, and short-cut methods that prevent useless trajectories from taking much computation time.

2,501 citations


"Pseudo-Marginal Hamiltonian Monte C..." refers background or methods in this paper

  • ...Typically, the Verlet method, also known as the Leapfrog method, is used due to its favourable properties in the context of HMC (Leimkuhler and Matthews, 2015, p. 60; Neal, 2011, Section 5.2.3.3)....

    [...]

  • ...We refer to Neal (2011) for details and a more comprehensive introduction....

    [...]

  • ...Hamiltonian Monte Carlo (HMC) methods (Duane et al., 1987) offer a possible remedy, but can also struggle in cases where there are strong non-linear dependencies between variables, or when the joint posterior is multimodal (Neal, 2011, Section 5.5.7)....

    [...]

  • ...However, it is possible to circumvent this problem by making use of a splitting technique which exploits the structure of the extended target, see (Beskos et al., 2011; Leimkuhler and Matthews, 2015, Section 2.4.1; Neal, 2011, Section 5.5.1; Shahbaba et al., 2014)....

    [...]

Proceedings Article
28 Jun 2011
TL;DR: This paper proposes a new framework for learning from large scale datasets based on iterative learning from small mini-batches by adding the right amount of noise to a standard stochastic gradient optimization algorithm and shows that the iterates will converge to samples from the true posterior distribution as the authors anneal the stepsize.
Abstract: In this paper we propose a new framework for learning from large scale datasets based on iterative learning from small mini-batches. By adding the right amount of noise to a standard stochastic gradient optimization algorithm we show that the iterates will converge to samples from the true posterior distribution as we anneal the stepsize. This seamless transition between optimization and Bayesian posterior sampling provides an inbuilt protection against overfitting. We also propose a practical method for Monte Carlo estimates of posterior statistics which monitors a "sampling threshold" and collects samples after it has been surpassed. We apply the method to three models: a mixture of Gaussians, logistic regression and ICA with natural gradients.

2,080 citations


"Pseudo-Marginal Hamiltonian Monte C..." refers methods in this paper

  • ...Similarly to stochastic gradient MCMC (and in contrast with PM-HMC), this results in an approximate MCMC which does not preserve the distribution of interest....

    [...]

  • ...Stochastic gradient MCMC (Welling and Teh, 2011; Chen et al., 2014; Ding et al., 2014; Leimkuhler and Shang, 2016)—including HMC-like methods—are a popular class of algorithms for approximate posterior sampling when an unbiased estimate of the log-likelihood gradient is available....

    [...]

  • ...Even a disconnected marginal which is hard to explore for any MCMC method may, when extended in this way, be connected in the extended space and easier to explore....

    [...]

  • ...In these scenarios, current MCMC methods will be inefficient....

    [...]

  • ...However, the kernel-based approximation gives rise to a bias in the gradients which is difficult to control and there is no guarantee that the trajectories closely follow the ideal HMC. Kernel HMC requires the selection of a kernel and, furthermore, some appropriate approximation thereof, since the computational cost of a full kernel-based approximation grows cubically with the number of MCMC iterations; see Strathmann et al. (2015) for details....

    [...]