scispace - formally typeset
Search or ask a question
BookDOI

Handbook of Markov Chain Monte Carlo

TL;DR: A Markov chain Monte Carlo based analysis of a multilevel model for functional MRI data and its applications in environmental epidemiology, educational research, and fisheries science are studied.
Abstract: Foreword Stephen P. Brooks, Andrew Gelman, Galin L. Jones, and Xiao-Li Meng Introduction to MCMC, Charles J. Geyer A short history of Markov chain Monte Carlo: Subjective recollections from in-complete data, Christian Robert and George Casella Reversible jump Markov chain Monte Carlo, Yanan Fan and Scott A. Sisson Optimal proposal distributions and adaptive MCMC, Jeffrey S. Rosenthal MCMC using Hamiltonian dynamics, Radford M. Neal Inference and Monitoring Convergence, Andrew Gelman and Kenneth Shirley Implementing MCMC: Estimating with confidence, James M. Flegal and Galin L. Jones Perfection within reach: Exact MCMC sampling, Radu V. Craiu and Xiao-Li Meng Spatial point processes, Mark Huber The data augmentation algorithm: Theory and methodology, James P. Hobert Importance sampling, simulated tempering and umbrella sampling, Charles J.Geyer Likelihood-free Markov chain Monte Carlo, Scott A. Sisson and Yanan Fan MCMC in the analysis of genetic data on related individuals, Elizabeth Thompson A Markov chain Monte Carlo based analysis of a multilevel model for functional MRI data, Brian Caffo, DuBois Bowman, Lynn Eberly, and Susan Spear Bassett Partially collapsed Gibbs sampling & path-adaptive Metropolis-Hastings in high-energy astrophysics, David van Dyk and Taeyoung Park Posterior exploration for computationally intensive forward models, Dave Higdon, C. Shane Reese, J. David Moulton, Jasper A. Vrugt and Colin Fox Statistical ecology, Ruth King Gaussian random field models for spatial data, Murali Haran Modeling preference changes via a hidden Markov item response theory model, Jong Hee Park Parallel Bayesian MCMC imputation for multiple distributed lag models: A case study in environmental epidemiology, Brian Caffo, Roger Peng, Francesca Dominici, Thomas A. Louis, and Scott Zeger MCMC for state space models, Paul Fearnhead MCMC in educational research, Roy Levy, Robert J. Mislevy, and John T. Behrens Applications of MCMC in fisheries science, Russell B. Millar Model comparison and simulation for hierarchical models: analyzing rural-urban migration in Thailand, Filiz Garip and Bruce Western
Citations
More filters
Journal ArticleDOI
TL;DR: The brms package implements Bayesian multilevel models in R using the probabilistic programming language Stan, allowing users to fit linear, robust linear, binomial, Poisson, survival, ordinal, zero-inflated, hurdle, and even non-linear models all in a multileVEL context.
Abstract: The brms package implements Bayesian multilevel models in R using the probabilistic programming language Stan. A wide range of distributions and link functions are supported, allowing users to fit - among others - linear, robust linear, binomial, Poisson, survival, ordinal, zero-inflated, hurdle, and even non-linear models all in a multilevel context. Further modeling options include autocorrelation of the response variable, user defined covariance structures, censored data, as well as meta-analytic standard errors. Prior specifications are flexible and explicitly encourage users to apply prior distributions that actually reflect their beliefs. In addition, model fit can easily be assessed and compared with the Watanabe-Akaike information criterion and leave-one-out cross-validation.

4,353 citations


Cites background or methods from "Handbook of Markov Chain Monte Carl..."

  • ...Currently, these are the static Hamiltonian Monte-Carlo (HMC) Sampler sometimes also referred to as Hybrid Monte-Carlo (Neal 2011, 2003; Duane et al. 1987) and its extension the No-U-Turn Sampler (NUTS) by Hoffman and Gelman (2014)....

    [...]

  • ...One of the main problems of these algorithms is their rather slow convergence for high-dimensional models with correlated parameters (Neal 2011; Hoffman and Gelman 2014; Gelman, Carlin, Stern, and Rubin 2014)....

    [...]

  • ...In contrast, Stan implements Hamiltonian Monte Carlo (Duane, Kennedy, Pendleton, and Roweth 1987; Neal 2011) and its extension, the No-U-Turn Sampler (NUTS) (Hoffman and Gelman 2014)....

    [...]

Proceedings Article
28 Jun 2011
TL;DR: This paper proposes a new framework for learning from large scale datasets based on iterative learning from small mini-batches by adding the right amount of noise to a standard stochastic gradient optimization algorithm and shows that the iterates will converge to samples from the true posterior distribution as the authors anneal the stepsize.
Abstract: In this paper we propose a new framework for learning from large scale datasets based on iterative learning from small mini-batches. By adding the right amount of noise to a standard stochastic gradient optimization algorithm we show that the iterates will converge to samples from the true posterior distribution as we anneal the stepsize. This seamless transition between optimization and Bayesian posterior sampling provides an inbuilt protection against overfitting. We also propose a practical method for Monte Carlo estimates of posterior statistics which monitors a "sampling threshold" and collects samples after it has been surpassed. We apply the method to three models: a mixture of Gaussians, logistic regression and ICA with natural gradients.

2,080 citations

Journal Article
TL;DR: The No-U-Turn Sampler (NUTS), an extension to HMC that eliminates the need to set a number of steps L, and derives a method for adapting the step size parameter {\epsilon} on the fly based on primal-dual averaging.
Abstract: Hamiltonian Monte Carlo (HMC) is a Markov chain Monte Carlo (MCMC) algorithm that avoids the random walk behavior and sensitivity to correlated parameters that plague many MCMC methods by taking a series of steps informed by first-order gradient information. These features allow it to converge to high-dimensional target distributions much more quickly than simpler methods such as random walk Metropolis or Gibbs sampling. However, HMC's performance is highly sensitive to two user-specified parameters: a step size e and a desired number of steps L. In particular, if L is too small then the algorithm exhibits undesirable random walk behavior, while if L is too large the algorithm wastes computation. We introduce the No-U-Turn Sampler (NUTS), an extension to HMC that eliminates the need to set a number of steps L. NUTS uses a recursive algorithm to build a set of likely candidate points that spans a wide swath of the target distribution, stopping automatically when it starts to double back and retrace its steps. Empirically, NUTS performs at least as efficiently as (and sometimes more effciently than) a well tuned standard HMC method, without requiring user intervention or costly tuning runs. We also derive a method for adapting the step size parameter e on the fly based on primal-dual averaging. NUTS can thus be used with no hand-tuning at all, making it suitable for applications such as BUGS-style automatic inference engines that require efficient "turnkey" samplers.

1,988 citations

Journal ArticleDOI
TL;DR: Brms provides an intuitive and powerful formula syntax, which extends the well known formula syntax of lme4, which is introduced in detail and demonstrated its usefulness with four examples, each showing other relevant aspects of the syntax.
Abstract: The brms package allows R users to easily specify a wide range of Bayesian single-level and multilevel models, which are fitted with the probabilistic programming language Stan behind the scenes. Several response distributions are supported, of which all parameters (e.g., location, scale, and shape) can be predicted at the same time thus allowing for distributional regression. Non-linear relationships may be specified using non-linear predictor terms or semi-parametric approaches such as splines or Gaussian processes. To make all of these modeling options possible in a multilevel framework, brms provides an intuitive and powerful formula syntax, which extends the well known formula syntax of lme4. The purpose of the present paper is to introduce this syntax in detail and to demonstrate its usefulness with four examples, each showing other relevant aspects of the syntax.

1,463 citations


Cites methods from "Handbook of Markov Chain Monte Carl..."

  • ...Possibly the most powerful program for performing full Bayesian inference available to date is Stan (Stan Development Team, 2017c; Carpenter et al., 2017), which implements Hamiltonian Monte Carlo (Duane et al., 1987; Neal, 2011; Betancourt et al., 2014) and its extension, the No-UTurn (NUTS) Sampler (Hoffman and Gelman, 2014)....

    [...]

  • ...An excellent non-mathematical introduction to Hamiltonian Monte Carlo can be found in Betancourt (2017)....

    [...]

  • ...…for performing full Bayesian inference available to date is Stan (Stan Development Team, 2017c; Carpenter et al., 2017), which implements Hamiltonian Monte Carlo (Duane et al., 1987; Neal, 2011; Betancourt et al., 2014) and its extension, the No-UTurn (NUTS) Sampler (Hoffman and Gelman, 2014)....

    [...]

  • ...It implements Hamiltonian Monte Carlo (Duane et al., 1987; Neal, 2011; Betancourt et al., 2014) and its extension, the No-U-Turn (NUTS) Sampler (Hoffman and Gelman, 2014)....

    [...]

Journal ArticleDOI
28 May 2015-Nature
TL;DR: This Review provides an introduction to this framework, and discusses some of the state-of-the-art advances in the field, namely, probabilistic programming, Bayesian optimization, data compression and automatic model discovery.
Abstract: How can a machine learn from experience? Probabilistic modelling provides a framework for understanding what learning is, and has therefore emerged as one of the principal theoretical and practical approaches for designing machines that learn from data acquired through experience. The probabilistic framework, which describes how to represent and manipulate uncertainty about models and predictions, has a central role in scientific data analysis, machine learning, robotics, cognitive science and artificial intelligence. This Review provides an introduction to this framework, and discusses some of the state-of-the-art advances in the field, namely, probabilistic programming, Bayesian optimization, data compression and automatic model discovery.

1,457 citations

References
More filters
Journal ArticleDOI
TL;DR: A generalization of the sampling method introduced by Metropolis et al. as mentioned in this paper is presented along with an exposition of the relevant theory, techniques of application and methods and difficulties of assessing the error in Monte Carlo estimates.
Abstract: SUMMARY A generalization of the sampling method introduced by Metropolis et al. (1953) is presented along with an exposition of the relevant theory, techniques of application and methods and difficulties of assessing the error in Monte Carlo estimates. Examples of the methods, including the generation of random orthogonal matrices and potential applications of the methods to numerical problems arising in statistics, are discussed. For numerical problems in a large number of dimensions, Monte Carlo methods are often more efficient than conventional numerical methods. However, implementation of the Monte Carlo methods requires sampling from high dimensional probability distributions and this may be very difficult and expensive in analysis and computer time. General methods for sampling from, or estimating expectations with respect to, such distributions are as follows. (i) If possible, factorize the distribution into the product of one-dimensional conditional distributions from which samples may be obtained. (ii) Use importance sampling, which may also be used for variance reduction. That is, in order to evaluate the integral J = X) p(x)dx = Ev(f), where p(x) is a probability density function, instead of obtaining independent samples XI, ..., Xv from p(x) and using the estimate J, = Zf(xi)/N, we instead obtain the sample from a distribution with density q(x) and use the estimate J2 = Y{f(xj)p(x1)}/{q(xj)N}. This may be advantageous if it is easier to sample from q(x) thanp(x), but it is a difficult method to use in a large number of dimensions, since the values of the weights w(xi) = p(x1)/q(xj) for reasonable values of N may all be extremely small, or a few may be extremely large. In estimating the probability of an event A, however, these difficulties may not be as serious since the only values of w(x) which are important are those for which x -A. Since the methods proposed by Trotter & Tukey (1956) for the estimation of conditional expectations require the use of importance sampling, the same difficulties may be encountered in their use. (iii) Use a simulation technique; that is, if it is difficult to sample directly from p(x) or if p(x) is unknown, sample from some distribution q(y) and obtain the sample x values as some function of the corresponding y values. If we want samples from the conditional dis

14,965 citations

Journal ArticleDOI
TL;DR: The focus is on applied inference for Bayesian posterior distributions in real problems, which often tend toward normal- ity after transformations and marginalization, and the results are derived as normal-theory approximations to exact Bayesian inference, conditional on the observed simulations.
Abstract: The Gibbs sampler, the algorithm of Metropolis and similar iterative simulation methods are potentially very helpful for summarizing multivariate distributions. Used naively, however, iterative simulation can give misleading answers. Our methods are simple and generally applicable to the output of any iterative simulation; they are designed for researchers primarily interested in the science underlying the data and models they are analyzing, rather than for researchers interested in the probability theory underlying the iterative simulations themselves. Our recommended strategy is to use several independent sequences, with starting points sampled from an overdispersed distribution. At each step of the iterative simulation, we obtain, for each univariate estimand of interest, a distributional estimate and an estimate of how much sharper the distributional estimate might become if the simulations were continued indefinitely. Because our focus is on applied inference for Bayesian posterior distributions in real problems, which often tend toward normality after transformations and marginalization, we derive our results as normal-theory approximations to exact Bayesian inference, conditional on the observed simulations. The methods are illustrated on a random-effects mixture model applied to experimental measurements of reaction times of normal and schizophrenic patients.

13,884 citations


"Handbook of Markov Chain Monte Carl..." refers methods in this paper

  • ...A two-way ANOVA decomposition of the variance of such a functional is formed over multiple chain replications, from which the potential scale reduction factor (PSRF) (Gelman and Rubin, 1992) can be constructed and monitored....

    [...]

Journal ArticleDOI
TL;DR: In this article, the authors propose a new framework for the construction of reversible Markov chain samplers that jump between parameter subspaces of differing dimensionality, which is flexible and entirely constructive.
Abstract: Markov chain Monte Carlo methods for Bayesian computation have until recently been restricted to problems where the joint distribution of all variables has a density with respect to some fixed standard underlying measure. They have therefore not been available for application to Bayesian model determination, where the dimensionality of the parameter vector is typically not fixed. This paper proposes a new framework for the construction of reversible Markov chain samplers that jump between parameter subspaces of differing dimensionality, which is flexible and entirely constructive. It should therefore have wide applicability in model determination problems. The methodology is illustrated with applications to multiple change-point analysis in one and two dimensions, and to a Bayesian comparison of binomial experiments.

6,188 citations


"Handbook of Markov Chain Monte Carl..." refers background or methods or result in this paper

  • ...Richardson and Green (1997) analyzed these data using a mixture of normal densities to identify subgroups of slow or fast metabolizers....

    [...]

  • ...For example, Green (1995) analyzed mining disaster count data using a Poisson process with the rate parameter described as a step function with an unknown number and location of steps....

    [...]

  • ...Green (2003) proposed a reversible jump analogy of the random-walk Metropolis sampler of Roberts (2003). Suppose that estimates of the first- and second-order moments of θk are available, for each of a small number of models, k ∈ K, denoted by μk and BkB k respectively, where Bk is an nk × nk matrix....

    [...]

  • ...where lj1 and lj2 respectively denote the number of observations allocated to components j1 and j2, and where δ, α, β, ξ and κ are hyperparameters as defined by Richardson and Green (1997)....

    [...]

  • ...12 01 /b 10 90 5- 5 Example: Dimension Matching Consider the illustrative example given in Green (1995) and Brooks (1998)....

    [...]

Journal ArticleDOI
TL;DR: In this paper, the Gibbs sampler is used to indirectly sample from the multinomial posterior distribution on the set of possible subset choices to identify the promising subsets by their more frequent appearance in the Gibbs sample.
Abstract: A crucial problem in building a multiple regression model is the selection of predictors to include. The main thrust of this article is to propose and develop a procedure that uses probabilistic considerations for selecting promising subsets. This procedure entails embedding the regression setup in a hierarchical normal mixture model where latent variables are used to identify subset choices. In this framework the promising subsets of predictors can be identified as those with higher posterior probability. The computational burden is then alleviated by using the Gibbs sampler to indirectly sample from this multinomial posterior distribution on the set of possible subset choices. Those subsets with higher probability—the promising ones—can then be identified by their more frequent appearance in the Gibbs sample.

2,780 citations


"Handbook of Markov Chain Monte Carl..." refers background in this paper

  • ...12 01 /b 10 90 5- 5 space (Berger and Pericchi, 2001; DiMatteo et al., 2001; George and McCulloch, 1993; Tadesse et al., 2005)....

    [...]

  • ...For examples and algorithms in this setting and beyond, see, for example, George and McCulloch (1993), Smith and Kohn (1996), and Nott and Leonte (2004)....

    [...]

  • ...A Gibbs sampler directly on model space is then available for γ (George and McCulloch, 1993; Nott and Green, 2004; Smith and Kohn, 1996)....

    [...]

Journal ArticleDOI
TL;DR: An adaptive Metropolis (AM) algorithm, where the Gaussian proposal distribution is updated along the process using the full information cumulated so far, which establishes here that it has the correct ergodic properties.
Abstract: A proper choice of a proposal distribution for Markov chain Monte Carlo methods, for example for the Metropolis-Hastings algorithm, is well known to be a crucial factor for the convergence of the algorithm. In this paper we introduce an adaptive Metropolis (AM) algorithm, where the Gaussian proposal distribution is updated along the process using the full information cumulated so far. Due to the adaptive nature of the process, the AM algorithm is non-Markovian, but we establish here that it has the correct ergodic properties. We also include the results of our numerical tests, which indicate that the AM algorithm competes well with traditional Metropolis-Hastings algorithms, and demonstrate that the AM algorithm is easy to use in practical computation.

2,511 citations


"Handbook of Markov Chain Monte Carl..." refers methods in this paper

  • ...As with standard MCMC, any adaptations must be implemented with care—transition kernels dependent on the entire history of the Markov chain can only be used under diminishing adaptation conditions (Haario et al., 2001; Roberts and Rosenthal, 2009)....

    [...]