scispace - formally typeset
Search or ask a question
Author

Xiao-Li Meng

Bio: Xiao-Li Meng is an academic researcher from Harvard University. The author has contributed to research in topics: Markov chain Monte Carlo & Monte Carlo method. The author has an hindex of 40, co-authored 153 publications receiving 20328 citations. Previous affiliations of Xiao-Li Meng include University of Rouen & University of Chicago.


Papers
More filters
BookDOI
10 May 2011
TL;DR: A Markov chain Monte Carlo based analysis of a multilevel model for functional MRI data and its applications in environmental epidemiology, educational research, and fisheries science are studied.
Abstract: Foreword Stephen P. Brooks, Andrew Gelman, Galin L. Jones, and Xiao-Li Meng Introduction to MCMC, Charles J. Geyer A short history of Markov chain Monte Carlo: Subjective recollections from in-complete data, Christian Robert and George Casella Reversible jump Markov chain Monte Carlo, Yanan Fan and Scott A. Sisson Optimal proposal distributions and adaptive MCMC, Jeffrey S. Rosenthal MCMC using Hamiltonian dynamics, Radford M. Neal Inference and Monitoring Convergence, Andrew Gelman and Kenneth Shirley Implementing MCMC: Estimating with confidence, James M. Flegal and Galin L. Jones Perfection within reach: Exact MCMC sampling, Radu V. Craiu and Xiao-Li Meng Spatial point processes, Mark Huber The data augmentation algorithm: Theory and methodology, James P. Hobert Importance sampling, simulated tempering and umbrella sampling, Charles J.Geyer Likelihood-free Markov chain Monte Carlo, Scott A. Sisson and Yanan Fan MCMC in the analysis of genetic data on related individuals, Elizabeth Thompson A Markov chain Monte Carlo based analysis of a multilevel model for functional MRI data, Brian Caffo, DuBois Bowman, Lynn Eberly, and Susan Spear Bassett Partially collapsed Gibbs sampling & path-adaptive Metropolis-Hastings in high-energy astrophysics, David van Dyk and Taeyoung Park Posterior exploration for computationally intensive forward models, Dave Higdon, C. Shane Reese, J. David Moulton, Jasper A. Vrugt and Colin Fox Statistical ecology, Ruth King Gaussian random field models for spatial data, Murali Haran Modeling preference changes via a hidden Markov item response theory model, Jong Hee Park Parallel Bayesian MCMC imputation for multiple distributed lag models: A case study in environmental epidemiology, Brian Caffo, Roger Peng, Francesca Dominici, Thomas A. Louis, and Scott Zeger MCMC for state space models, Paul Fearnhead MCMC in educational research, Roy Levy, Robert J. Mislevy, and John T. Behrens Applications of MCMC in fisheries science, Russell B. Millar Model comparison and simulation for hierarchical models: analyzing rural-urban migration in Thailand, Filiz Garip and Bruce Western

2,415 citations

Journal ArticleDOI
TL;DR: In this paper, the authors provide simple but accurate methods for comparing correlation coefficients between a dependent variable and a set of independent variables using the Fisher z transformation and include a test and confidence interval for comparing two correlated correlations, a test for heterogeneity, and a contrast among k ≥ 2 correlated correlations.
Abstract: The purpose of this article is to provide simple but accurate methods for comparing correlation coefficients between a dependent variable and a set of independent variables. The methods are simple extensions of Dunn & Clark's (1969) work using the Fisher z transformation and include a test and confidence interval for comparing two correlated correlations, a test for heterogeneity, and a test and confidence interval for a contrast among k (>2) correlated correlations. Also briefly discussed is why the traditional Hotelling's t test for comparing correlated correlations is generally not appropriate in practice

2,300 citations

01 Jan 1996
TL;DR: In this article, the authors consider Bayesian counterparts of the classical tests for good-ness of fit and their use in judging the fit of a single Bayesian model to the observed data.
Abstract: This paper considers Bayesian counterparts of the classical tests for good- ness of fit and their use in judging the fit of a single Bayesian model to the observed data. We focus on posterior predictive assessment, in a framework that also includes conditioning on auxiliary statistics. The Bayesian formulation facilitates the con- struction and calculation of a meaningful reference distribution not only for any (classical) statistic, but also for any parameter-dependent "statistic" or discrep- ancy. The latter allows us to propose the realized discrepancy assessment of model fitness, which directly measures the true discrepancy between data and the posited model, for any aspect of the model which we want to explore. The computation required for the realized discrepancy assessment is a straightforward byproduct of the posterior simulation used for the original Bayesian analysis. We illustrate with three applied examples. The first example, which serves mainly to motivate the work, illustrates the difficulty of classical tests in assessing the fitness of a Poisson model to a positron emission tomography image that is constrained to be nonnegative. The second and third examples illustrate the details of the posterior predictive approach in two problems: estimation in a model with inequality constraints on the parameters, and estimation in a mixture model. In all three examples, standard test statistics (either a χ 2 or a likelihood ratio) are not pivotal: the difficulty is not just how to compute the reference distribution for the test, but that in the classical framework no such distribution exists, independent of the unknown model parameters.

2,065 citations

Journal ArticleDOI
TL;DR: In many cases, complete-data maximum likelihood estimation is relatively simple when conditional on some function of the parameters being estimated as mentioned in this paper, and convergence is stable, with each iteration increasing the likelihood.
Abstract: Two major reasons for the popularity of the EM algorithm are that its maximum step involves only complete-data maximum likelihood estimation, which is often computationally simple, and that its convergence is stable, with each iteration increasing the likelihood. When the associated complete-data maximum likelihood estimation itself is complicated, EM is less attractive because the M-step is computationally unattractive. In many cases, however, complete-data maximum likelihood estimation is relatively simple when conditional on some function of the parameters being estimated

1,816 citations

Journal ArticleDOI
TL;DR: It is shown that the acceptance ratio method and thermodynamic integration are natural generalizations of importance sampling, which is most familiar to statistical audiences.
Abstract: Computing (ratios of) normalizing constants of probability models is a fundamental computational problem for many statistical and scientific studies. Monte Carlo simulation is an effective technique, es- pecially with complex and high-dimensional models. This paper aims to bring to the attention of general statistical audiences of some effective methods originating from theoretical physics and at the same time to ex- plore these methods from a more statistical perspective, through estab- lishing theoretical connections and illustrating their uses with statistical problems. We show that the acceptance ratio method and thermodynamic integration are natural generalizations of importance sampling, which is most familiar to statistical audiences. The former generalizes importance sampling through the use of a single "bridge" density and is thus a case of bridge sampling in the sense of Meng and Wong. Thermodynamic integration, which is also known in the numerical analysis literature as Ogata's method for high-dimensional integration, corresponds to the use of infinitely many and continuously connected bridges (and thus a "path"). Our path sampling formulation offers more flexibility and thus potential efficiency to thermodynamic integration, and the search of op- timal paths turns out to have close connections with the Jeffreys prior density and the Rao and Hellinger distances between two densities. We provide an informative theoretical example as well as two empirical ex- amples (involving 17- to 70-dimensional integrations) to illustrate the potential and implementation of path sampling. We also discuss some open problems.

1,035 citations


Cited by
More filters
Proceedings Article
01 Jan 2014
TL;DR: A stochastic variational inference and learning algorithm that scales to large datasets and, under some mild differentiability conditions, even works in the intractable case is introduced.
Abstract: How can we perform efficient inference and learning in directed probabilistic models, in the presence of continuous latent variables with intractable posterior distributions, and large datasets? We introduce a stochastic variational inference and learning algorithm that scales to large datasets and, under some mild differentiability conditions, even works in the intractable case. Our contributions is two-fold. First, we show that a reparameterization of the variational lower bound yields a lower bound estimator that can be straightforwardly optimized using standard stochastic gradient methods. Second, we show that for i.i.d. datasets with continuous latent variables per datapoint, posterior inference can be made especially efficient by fitting an approximate inference model (also called a recognition model) to the intractable posterior using the proposed lower bound estimator. Theoretical advantages are reflected in experimental results.

20,769 citations

Journal ArticleDOI
TL;DR: The focus is on applied inference for Bayesian posterior distributions in real problems, which often tend toward normal- ity after transformations and marginalization, and the results are derived as normal-theory approximations to exact Bayesian inference, conditional on the observed simulations.
Abstract: The Gibbs sampler, the algorithm of Metropolis and similar iterative simulation methods are potentially very helpful for summarizing multivariate distributions. Used naively, however, iterative simulation can give misleading answers. Our methods are simple and generally applicable to the output of any iterative simulation; they are designed for researchers primarily interested in the science underlying the data and models they are analyzing, rather than for researchers interested in the probability theory underlying the iterative simulations themselves. Our recommended strategy is to use several independent sequences, with starting points sampled from an overdispersed distribution. At each step of the iterative simulation, we obtain, for each univariate estimand of interest, a distributional estimate and an estimate of how much sharper the distributional estimate might become if the simulations were continued indefinitely. Because our focus is on applied inference for Bayesian posterior distributions in real problems, which often tend toward normality after transformations and marginalization, we derive our results as normal-theory approximations to exact Bayesian inference, conditional on the observed simulations. The methods are illustrated on a random-effects mixture model applied to experimental measurements of reaction times of normal and schizophrenic patients.

13,884 citations

Journal ArticleDOI
TL;DR: In this paper, the authors consider the problem of comparing complex hierarchical models in which the number of parameters is not clearly defined and derive a measure pD for the effective number in a model as the difference between the posterior mean of the deviances and the deviance at the posterior means of the parameters of interest, which is related to other information criteria and has an approximate decision theoretic justification.
Abstract: Summary. We consider the problem of comparing complex hierarchical models in which the number of parameters is not clearly defined. Using an information theoretic argument we derive a measure pD for the effective number of parameters in a model as the difference between the posterior mean of the deviance and the deviance at the posterior means of the parameters of interest. In general pD approximately corresponds to the trace of the product of Fisher's information and the posterior covariance, which in normal models is the trace of the ‘hat’ matrix projecting observations onto fitted values. Its properties in exponential families are explored. The posterior mean deviance is suggested as a Bayesian measure of fit or adequacy, and the contributions of individual observations to the fit and complexity can give rise to a diagnostic plot of deviance residuals against leverages. Adding pD to the posterior mean deviance gives a deviance information criterion for comparing models, which is related to other information criteria and has an approximate decision theoretic justification. The procedure is illustrated in some examples, and comparisons are drawn with alternative Bayesian and classical proposals. Throughout it is emphasized that the quantities required are trivial to compute in a Markov chain Monte Carlo analysis.

11,691 citations

Journal ArticleDOI
TL;DR: 2 general approaches that come highly recommended: maximum likelihood (ML) and Bayesian multiple imputation (MI) are presented and may eventually extend the ML and MI methods that currently represent the state of the art.
Abstract: Statistical procedures for missing data have vastly improved, yet misconception and unsound practice still abound. The authors frame the missing-data problem, review methods, offer advice, and raise issues that remain unresolved. They clear up common misunderstandings regarding the missing at random (MAR) concept. They summarize the evidence against older procedures and, with few exceptions, discourage their use. They present, in both technical and practical language, 2 general approaches that come highly recommended: maximum likelihood (ML) and Bayesian multiple imputation (MI). Newer developments are discussed, including some for dealing with missing data that are not MAR. Although not yet in the mainstream, these procedures may eventually extend the ML and MI methods that currently represent the state of the art.

10,568 citations

Journal ArticleDOI
TL;DR: Mice adds new functionality for imputing multilevel data, automatic predictor selection, data handling, post-processing imputed values, specialized pooling routines, model selection tools, and diagnostic graphs.
Abstract: The R package mice imputes incomplete multivariate data by chained equations. The software mice 1.0 appeared in the year 2000 as an S-PLUS library, and in 2001 as an R package. mice 1.0 introduced predictor selection, passive imputation and automatic pooling. This article documents mice, which extends the functionality of mice 1.0 in several ways. In mice, the analysis of imputed data is made completely general, whereas the range of models under which pooling works is substantially extended. mice adds new functionality for imputing multilevel data, automatic predictor selection, data handling, post-processing imputed values, specialized pooling routines, model selection tools, and diagnostic graphs. Imputation of categorical data is improved in order to bypass problems caused by perfect prediction. Special attention is paid to transformations, sum scores, indices and interactions using passive imputation, and to the proper setup of the predictor matrix. mice can be downloaded from the Comprehensive R Archive Network. This article provides a hands-on, stepwise approach to solve applied incomplete data problems.

10,234 citations