scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

01 Sep 2017-Statistics and Computing (Springer US)-Vol. 27, Iss: 5, pp 1413-1432
TL;DR: In this paper, leave-one-out cross-validation (LOO) and the widely applicable information criterion (WAIC) are used to estimate pointwise out-of-sample prediction accuracy from a fitted Bayesian model using the log-likelihood evaluated at the posterior simulations of the parameter values.
Abstract: Leave-one-out cross-validation (LOO) and the widely applicable information criterion (WAIC) are methods for estimating pointwise out-of-sample prediction accuracy from a fitted Bayesian model using the log-likelihood evaluated at the posterior simulations of the parameter values. LOO and WAIC have various advantages over simpler estimates of predictive error such as AIC and DIC but are less used in practice because they involve additional computational steps. Here we lay out fast and stable computations for LOO and WAIC that can be performed using existing simulation draws. We introduce an efficient computation of LOO using Pareto-smoothed importance sampling (PSIS), a new procedure for regularizing importance weights. Although WAIC is asymptotically equal to LOO, we demonstrate that PSIS-LOO is more robust in the finite case with weak priors or influential observations. As a byproduct of our calculations, we also obtain approximate standard errors for estimated predictive errors and for comparison of predictive errors between two models. We implement the computations in an R package called loo and demonstrate using models fit with the Bayesian inference package Stan.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
Rupert R A Bourne1, Seth Flaxman2, Tasanee Braithwaite1, Maria V Cicinelli, Aditi Das, Jost B. Jonas3, Jill E Keeffe4, John H Kempen5, Janet L Leasher6, Hans Limburg, Kovin Naidoo7, Kovin Naidoo8, Konrad Pesudovs9, Serge Resnikoff10, Serge Resnikoff8, Alexander J Silvester11, Gretchen A Stevens12, Nina Tahhan8, Nina Tahhan10, Tien Yin Wong13, Hugh R. Taylor14, Rupert R A Bourne1, Peter Ackland, Aries Arditi, Yaniv Barkana, Banu Bozkurt15, Alain M. Bron16, Donald L. Budenz17, Feng Cai, Robert J Casson18, Usha Chakravarthy19, Jaewan Choi, Maria Vittoria Cicinelli, Nathan Congdon19, Reza Dana20, Rakhi Dandona21, Lalit Dandona22, Iva Dekaris, Monte A. Del Monte23, Jenny deva24, Laura Dreer25, Leon B. Ellwein26, Marcela Frazier25, Kevin D. Frick27, David S. Friedman27, João M. Furtado28, H. Gao29, Gus Gazzard30, Ronnie George, Stephen Gichuhi31, Victor H. Gonzalez, Billy R. Hammond32, Mary Elizabeth Hartnett33, Minguang He14, James F. Hejtmancik26, Flavio E. Hirai34, John J Huang35, April D. Ingram36, Jonathan C. Javitt27, Jost B. Jonas3, Charlotte E. Joslin, John H. Kempen20, John H. Kempen37, Moncef Khairallah, Rohit C Khanna4, Judy E. Kim38, George N. Lambrou39, Van C. Lansingh, Paolo Lanzetta40, Jennifer I. Lim41, Kaweh Mansouri, Anu A. Mathew42, Alan R. Morse, Beatriz Munoz27, David C. Musch23, Vinay Nangia, Maria Palaiou20, Maurizio Battaglia Parodi, Fernando Yaacov Pena42, Tunde Peto19, Harry A. Quigley27, Murugesan Raju43, Pradeep Y. Ramulu27, Alan L. Robin27, Luca Rossetti44, Jinan B. Saaddine45, Mya Sandar46, Janet B. Serle47, Tueng T. Shen22, Rajesh K. Shetty48, Pamela C. Sieving26, Juan Carlos Silva49, Rita S. Sitorus50, Dwight Stambolian37, Gretchen Stevens12, Hugh Taylor14, Jaime Tejedor, James M. Tielsch27, Miltiadis K. Tsilimbaris51, Jan C. van Meurs52, Rohit Varma53, Gianni Virgili54, Jimmy Volmink55, Ya Xing Wang, Ningli Wang56, Sheila K. West27, Peter Wiedemann57, Tien Wong13, Richard Wormald58, Yingfeng Zheng46 
Anglia Ruskin University1, University of Oxford2, Heidelberg University3, L V Prasad Eye Institute4, Massachusetts Eye and Ear Infirmary5, Nova Southeastern University6, University of KwaZulu-Natal7, Brien Holden Vision Institute8, Flinders University9, University of New South Wales10, Royal Liverpool University Hospital11, World Health Organization12, National University of Singapore13, University of Melbourne14, Selçuk University15, University of Burgundy16, University of Miami17, University of Adelaide18, Queen's University Belfast19, Harvard University20, The George Institute for Global Health21, University of Washington22, University of Michigan23, Universiti Tunku Abdul Rahman24, University of Alabama25, National Institutes of Health26, Johns Hopkins University27, University of São Paulo28, Henry Ford Health System29, University College London30, University of Nairobi31, University of Georgia32, University of Utah33, Federal University of São Paulo34, Yale University35, Alberta Children's Hospital36, University of Pennsylvania37, Medical College of Wisconsin38, Novartis39, University of Udine40, University of Illinois at Urbana–Champaign41, Royal Children's Hospital42, University of Missouri43, University of Milan44, Centers for Disease Control and Prevention45, Singapore National Eye Center46, Icahn School of Medicine at Mount Sinai47, Mayo Clinic48, Pan American Health Organization49, University of Indonesia50, University of Crete51, Erasmus University Rotterdam52, University of Southern California53, University of Florence54, Stellenbosch University55, Capital Medical University56, Leipzig University57, Moorfields Eye Hospital58
TL;DR: There is an ongoing reduction in the age-standardised prevalence of blindness and visual impairment, yet the growth and ageing of the world's population is causing a substantial increase in number of people affected, highlighting the need to scale up vision impairment alleviation efforts at all levels.

1,473 citations

Journal ArticleDOI
15 May 2020-Science
TL;DR: Modeling and Bayesian inference reveal the time dependence of SARS-CoV-2 interventions on the number of new infections using the example of Germany and the impact of these measures on the disease spread using change point analysis.
Abstract: As COVID-19 is rapidly spreading across the globe, short-term modeling forecasts provide time-critical information for decisions on containment and mitigation strategies. A major challenge for short-term forecasts is the assessment of key epidemiological parameters and how they change when first interventions show an effect. By combining an established epidemiological model with Bayesian inference, we analyze the time dependence of the effective growth rate of new infections. Focusing on COVID-19 spread in Germany, we detect change points in the effective growth rate that correlate well with the times of publicly announced interventions. Thereby, we can quantify the effect of interventions, and we can incorporate the corresponding change points into forecasts of future scenarios and case numbers. Our code is freely available and can be readily adapted to any country or region.

704 citations

Journal ArticleDOI
21 Jun 2018-PeerJ
TL;DR: Through MixSIAR, an inclusive, rich, and flexible Bayesian tracer mixing model framework implemented as an open-source R package, the disparate array of mixing model tools are consolidated into a single platform, diversified the set of available parameterizations, and provided developers a platform upon which to continue improving mixing model analyses in the future.
Abstract: The ongoing evolution of tracer mixing models has resulted in a confusing array of software tools that differ in terms of data inputs, model assumptions, and associated analytic products. Here we introduce MixSIAR, an inclusive, rich, and flexible Bayesian tracer (e.g., stable isotope) mixing model framework implemented as an open-source R package. Using MixSIAR as a foundation, we provide guidance for the implementation of mixing model analyses. We begin by outlining the practical differences between mixture data error structure formulations and relate these error structures to common mixing model study designs in ecology. Because Bayesian mixing models afford the option to specify informative priors on source proportion contributions, we outline methods for establishing prior distributions and discuss the influence of prior specification on model outputs. We also discuss the options available for source data inputs (raw data versus summary statistics) and provide guidance for combining sources. We then describe a key advantage of MixSIAR over previous mixing model software-the ability to include fixed and random effects as covariates explaining variability in mixture proportions and calculate relative support for multiple models via information criteria. We present a case study of Alligator mississippiensis diet partitioning to demonstrate the power of this approach. Finally, we conclude with a discussion of limitations to mixing model applications. Through MixSIAR, we have consolidated the disparate array of mixing model tools into a single platform, diversified the set of available parameterizations, and provided developers a platform upon which to continue improving mixing model analyses in the future.

580 citations

Journal ArticleDOI
TL;DR: The heads-and-hearts hypothesis, which holds that meaningful reductions in achievement gaps only occur when course designs combine deliberate practice with inclusive teaching, is proposed and supports calls to replace traditional lecturing with evidence-based, active-learning course designs across the STEM disciplines.
Abstract: We tested the hypothesis that underrepresented students in active-learning classrooms experience narrower achievement gaps than underrepresented students in traditional lecturing classrooms, averaged across all science, technology, engineering, and mathematics (STEM) fields and courses. We conducted a comprehensive search for both published and unpublished studies that compared the performance of underrepresented students to their overrepresented classmates in active-learning and traditional-lecturing treatments. This search resulted in data on student examination scores from 15 studies (9,238 total students) and data on student failure rates from 26 studies (44,606 total students). Bayesian regression analyses showed that on average, active learning reduced achievement gaps in examination scores by 33% and narrowed gaps in passing rates by 45%. The reported proportion of time that students spend on in-class activities was important, as only classes that implemented high-intensity active learning narrowed achievement gaps. Sensitivity analyses showed that the conclusions are robust to sampling bias and other issues. To explain the extensive variation in efficacy observed among studies, we propose the heads-and-hearts hypothesis, which holds that meaningful reductions in achievement gaps only occur when course designs combine deliberate practice with inclusive teaching. Our results support calls to replace traditional lecturing with evidence-based, active-learning course designs across the STEM disciplines and suggest that innovations in instructional strategies can increase equity in higher education.

478 citations

Journal ArticleDOI
TL;DR: In this article, the authors propose an alternative definition of R2 for Bayesian fits, where the numerator can be larger than the denominator, which is a problem for Bayes fits.
Abstract: The usual definition of R2 (variance of the predicted values divided by the variance of the data) has a problem for Bayesian fits, as the numerator can be larger than the denominator. We propose an...

452 citations

References
More filters
Book
01 Jan 1995
TL;DR: Detailed notes on Bayesian Computation Basics of Markov Chain Simulation, Regression Models, and Asymptotic Theorems are provided.
Abstract: FUNDAMENTALS OF BAYESIAN INFERENCE Probability and Inference Single-Parameter Models Introduction to Multiparameter Models Asymptotics and Connections to Non-Bayesian Approaches Hierarchical Models FUNDAMENTALS OF BAYESIAN DATA ANALYSIS Model Checking Evaluating, Comparing, and Expanding Models Modeling Accounting for Data Collection Decision Analysis ADVANCED COMPUTATION Introduction to Bayesian Computation Basics of Markov Chain Simulation Computationally Efficient Markov Chain Simulation Modal and Distributional Approximations REGRESSION MODELS Introduction to Regression Models Hierarchical Linear Models Generalized Linear Models Models for Robust Inference Models for Missing Data NONLINEAR AND NONPARAMETRIC MODELS Parametric Nonlinear Models Basic Function Models Gaussian Process Models Finite Mixture Models Dirichlet Process Models APPENDICES A: Standard Probability Distributions B: Outline of Proofs of Asymptotic Theorems C: Computation in R and Stan Bibliographic Notes and Exercises appear at the end of each chapter.

16,079 citations

Book ChapterDOI
01 Jan 1973
TL;DR: In this paper, it is shown that the classical maximum likelihood principle can be considered to be a method of asymptotic realization of an optimum estimate with respect to a very general information theoretic criterion.
Abstract: In this paper it is shown that the classical maximum likelihood principle can be considered to be a method of asymptotic realization of an optimum estimate with respect to a very general information theoretic criterion. This observation shows an extension of the principle to provide answers to many practical problems of statistical model fitting.

15,424 citations

Journal ArticleDOI
TL;DR: In this paper, the authors consider the problem of comparing complex hierarchical models in which the number of parameters is not clearly defined and derive a measure pD for the effective number in a model as the difference between the posterior mean of the deviances and the deviance at the posterior means of the parameters of interest, which is related to other information criteria and has an approximate decision theoretic justification.
Abstract: Summary. We consider the problem of comparing complex hierarchical models in which the number of parameters is not clearly defined. Using an information theoretic argument we derive a measure pD for the effective number of parameters in a model as the difference between the posterior mean of the deviance and the deviance at the posterior means of the parameters of interest. In general pD approximately corresponds to the trace of the product of Fisher's information and the posterior covariance, which in normal models is the trace of the ‘hat’ matrix projecting observations onto fitted values. Its properties in exponential families are explored. The posterior mean deviance is suggested as a Bayesian measure of fit or adequacy, and the contributions of individual observations to the fit and complexity can give rise to a diagnostic plot of deviance residuals against leverages. Adding pD to the posterior mean deviance gives a deviance information criterion for comparing models, which is related to other information criteria and has an approximate decision theoretic justification. The procedure is illustrated in some examples, and comparisons are drawn with alternative Bayesian and classical proposals. Throughout it is emphasized that the quantities required are trivial to compute in a Markov chain Monte Carlo analysis.

11,691 citations

Book
01 Jan 2006
TL;DR: Data Analysis Using Regression and Multilevel/Hierarchical Models is a comprehensive manual for the applied researcher who wants to perform data analysis using linear and nonlinear regression and multilevel models.
Abstract: Data Analysis Using Regression and Multilevel/Hierarchical Models is a comprehensive manual for the applied researcher who wants to perform data analysis using linear and nonlinear regression and multilevel models. The book introduces a wide variety of models, whilst at the same time instructing the reader in how to fit these models using available software packages. The book illustrates the concepts by working through scores of real data examples that have arisen from the authors' own applied research, with programming codes provided for each one. Topics covered include causal inference, including regression, poststratification, matching, regression discontinuity, and instrumental variables, as well as multilevel logistic regression and missing-data imputation. Practical tips regarding building, fitting, and understanding are provided throughout.

9,098 citations

Journal ArticleDOI
TL;DR: The theory of proper scoring rules on general probability spaces is reviewed and developed, and the intuitively appealing interval score is proposed as a utility function in interval estimation that addresses width as well as coverage.
Abstract: Scoring rules assess the quality of probabilistic forecasts, by assigning a numerical score based on the predictive distribution and on the event or value that materializes. A scoring rule is proper if the forecaster maximizes the expected score for an observation drawn from the distributionF if he or she issues the probabilistic forecast F, rather than G ≠ F. It is strictly proper if the maximum is unique. In prediction problems, proper scoring rules encourage the forecaster to make careful assessments and to be honest. In estimation problems, strictly proper scoring rules provide attractive loss and utility functions that can be tailored to the problem at hand. This article reviews and develops the theory of proper scoring rules on general probability spaces, and proposes and discusses examples thereof. Proper scoring rules derive from convex functions and relate to information measures, entropy functions, and Bregman divergences. In the case of categorical variables, we prove a rigorous version of the ...

4,644 citations