Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

doi:10.1007/S11222-016-9696-4

Home
/
Papers
/
Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

Journal Article•DOI•

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

Aki Vehtari¹, Andrew Gelman², Jonah Gabry²•Institutions (2)

Helsinki Institute for Information Technology¹, Columbia University²

01 Sep 2017-Statistics and Computing (Springer US)-Vol. 27, Iss: 5, pp 1413-1432

TL;DR: In this paper, leave-one-out cross-validation (LOO) and the widely applicable information criterion (WAIC) are used to estimate pointwise out-of-sample prediction accuracy from a fitted Bayesian model using the log-likelihood evaluated at the posterior simulations of the parameter values.

read less

Abstract: Leave-one-out cross-validation (LOO) and the widely applicable information criterion (WAIC) are methods for estimating pointwise out-of-sample prediction accuracy from a fitted Bayesian model using the log-likelihood evaluated at the posterior simulations of the parameter values. LOO and WAIC have various advantages over simpler estimates of predictive error such as AIC and DIC but are less used in practice because they involve additional computational steps. Here we lay out fast and stable computations for LOO and WAIC that can be performed using existing simulation draws. We introduce an efficient computation of LOO using Pareto-smoothed importance sampling (PSIS), a new procedure for regularizing importance weights. Although WAIC is asymptotically equal to LOO, we demonstrate that PSIS-LOO is more robust in the finite case with weak priors or influential observations. As a byproduct of our calculations, we also obtain approximate standard errors for estimated predictive errors and for comparison of predictive errors between two models. We implement the computations in an R package called loo and demonstrate using models fit with the Bayesian inference package Stan.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Magnitude, temporal trends, and projections of the global prevalence of blindness and distance and near vision impairment: a systematic review and meta-analysis

[...]

Rupert R A Bourne¹, Seth Flaxman², Tasanee Braithwaite¹, Maria V Cicinelli, Aditi Das, Jost B. Jonas³, Jill E Keeffe⁴, John H Kempen⁵, Janet L Leasher⁶, Hans Limburg, Kovin Naidoo⁷, Kovin Naidoo⁸, Konrad Pesudovs⁹, Serge Resnikoff¹⁰, Serge Resnikoff⁸, Alexander J Silvester¹¹, Gretchen A Stevens¹², Nina Tahhan⁸, Nina Tahhan¹⁰, Tien Yin Wong¹³, Hugh R. Taylor¹⁴, Rupert R A Bourne¹, Peter Ackland, Aries Arditi, Yaniv Barkana, Banu Bozkurt¹⁵, Alain M. Bron¹⁶, Donald L. Budenz¹⁷, Feng Cai, Robert J Casson¹⁸, Usha Chakravarthy¹⁹, Jaewan Choi, Maria Vittoria Cicinelli, Nathan Congdon¹⁹, Reza Dana²⁰, Rakhi Dandona²¹, Lalit Dandona²², Iva Dekaris, Monte A. Del Monte²³, Jenny deva²⁴, Laura Dreer²⁵, Leon B. Ellwein²⁶, Marcela Frazier²⁵, Kevin D. Frick²⁷, David S. Friedman²⁷, João M. Furtado²⁸, H. Gao²⁹, Gus Gazzard³⁰, Ronnie George, Stephen Gichuhi³¹, Victor H. Gonzalez, Billy R. Hammond³², Mary Elizabeth Hartnett³³, Minguang He¹⁴, James F. Hejtmancik²⁶, Flavio E. Hirai³⁴, John J Huang³⁵, April D. Ingram³⁶, Jonathan C. Javitt²⁷, Jost B. Jonas³, Charlotte E. Joslin, John H. Kempen²⁰, John H. Kempen³⁷, Moncef Khairallah, Rohit C Khanna⁴, Judy E. Kim³⁸, George N. Lambrou³⁹, Van C. Lansingh, Paolo Lanzetta⁴⁰, Jennifer I. Lim⁴¹, Kaweh Mansouri, Anu A. Mathew⁴², Alan R. Morse, Beatriz Munoz²⁷, David C. Musch²³, Vinay Nangia, Maria Palaiou²⁰, Maurizio Battaglia Parodi, Fernando Yaacov Pena⁴², Tunde Peto¹⁹, Harry A. Quigley²⁷, Murugesan Raju⁴³, Pradeep Y. Ramulu²⁷, Alan L. Robin²⁷, Luca Rossetti⁴⁴, Jinan B. Saaddine⁴⁵, Mya Sandar⁴⁶, Janet B. Serle⁴⁷, Tueng T. Shen²², Rajesh K. Shetty⁴⁸, Pamela C. Sieving²⁶, Juan Carlos Silva⁴⁹, Rita S. Sitorus⁵⁰, Dwight Stambolian³⁷, Gretchen Stevens¹², Hugh Taylor¹⁴, Jaime Tejedor, James M. Tielsch²⁷, Miltiadis K. Tsilimbaris⁵¹, Jan C. van Meurs⁵², Rohit Varma⁵³, Gianni Virgili⁵⁴, Jimmy Volmink⁵⁵, Ya Xing Wang, Ningli Wang⁵⁶, Sheila K. West²⁷, Peter Wiedemann⁵⁷, Tien Wong¹³, Richard Wormald⁵⁸, Yingfeng Zheng⁴⁶ - Show less +106 more•Institutions (58)

Anglia Ruskin University¹, University of Oxford², Heidelberg University³, L V Prasad Eye Institute⁴, Massachusetts Eye and Ear Infirmary⁵, Nova Southeastern University⁶, University of KwaZulu-Natal⁷, Brien Holden Vision Institute⁸, Flinders University⁹, University of New South Wales¹⁰, Royal Liverpool University Hospital¹¹, World Health Organization¹², National University of Singapore¹³, University of Melbourne¹⁴, Selçuk University¹⁵, University of Burgundy¹⁶, University of Miami¹⁷, University of Adelaide¹⁸, Queen's University Belfast¹⁹, Harvard University²⁰, The George Institute for Global Health²¹, University of Washington²², University of Michigan²³, Universiti Tunku Abdul Rahman²⁴, University of Alabama²⁵, National Institutes of Health²⁶, Johns Hopkins University²⁷, University of São Paulo²⁸, Henry Ford Health System²⁹, University College London³⁰, University of Nairobi³¹, University of Georgia³², University of Utah³³, Federal University of São Paulo³⁴, Yale University³⁵, Alberta Children's Hospital³⁶, University of Pennsylvania³⁷, Medical College of Wisconsin³⁸, Novartis³⁹, University of Udine⁴⁰, University of Illinois at Urbana–Champaign⁴¹, Royal Children's Hospital⁴², University of Missouri⁴³, University of Milan⁴⁴, Centers for Disease Control and Prevention⁴⁵, Singapore National Eye Center⁴⁶, Icahn School of Medicine at Mount Sinai⁴⁷, Mayo Clinic⁴⁸, Pan American Health Organization⁴⁹, University of Indonesia⁵⁰, University of Crete⁵¹, Erasmus University Rotterdam⁵², University of Southern California⁵³, University of Florence⁵⁴, Stellenbosch University⁵⁵, Capital Medical University⁵⁶, Leipzig University⁵⁷, Moorfields Eye Hospital⁵⁸

01 Sep 2017-The Lancet Global Health

TL;DR: There is an ongoing reduction in the age-standardised prevalence of blindness and visual impairment, yet the growth and ageing of the world's population is causing a substantial increase in number of people affected, highlighting the need to scale up vision impairment alleviation efforts at all levels.

...read moreread less

1,473 citations

Journal Article•DOI•

Inferring change points in the spread of COVID-19 reveals the effectiveness of interventions.

[...]

Jonas Dehning¹, Johannes Zierenberg¹, F. Paul Spitzner¹, Michael Wibral², Joao Pinheiro Neto¹, Michael Wilczek², Michael Wilczek¹, Viola Priesemann¹, Viola Priesemann² - Show less +5 more•Institutions (2)

Max Planck Society¹, University of Göttingen²

15 May 2020-Science

TL;DR: Modeling and Bayesian inference reveal the time dependence of SARS-CoV-2 interventions on the number of new infections using the example of Germany and the impact of these measures on the disease spread using change point analysis.

...read moreread less

Abstract: As COVID-19 is rapidly spreading across the globe, short-term modeling forecasts provide time-critical information for decisions on containment and mitigation strategies. A major challenge for short-term forecasts is the assessment of key epidemiological parameters and how they change when first interventions show an effect. By combining an established epidemiological model with Bayesian inference, we analyze the time dependence of the effective growth rate of new infections. Focusing on COVID-19 spread in Germany, we detect change points in the effective growth rate that correlate well with the times of publicly announced interventions. Thereby, we can quantify the effect of interventions, and we can incorporate the corresponding change points into forecasts of future scenarios and case numbers. Our code is freely available and can be readily adapted to any country or region.

...read moreread less

704 citations

Journal Article•DOI•

Analyzing mixing systems using a new generation of Bayesian tracer mixing models.

[...]

Brian C. Stock¹, Andrew L. Jackson², Eric J. Ward³, Andrew C. Parnell⁴, Donald L. Phillips, Brice X. Semmens¹ - Show less +2 more•Institutions (4)

University of California, San Diego¹, Trinity College, Dublin², National Oceanic and Atmospheric Administration³, University College Dublin⁴

21 Jun 2018-PeerJ

TL;DR: Through MixSIAR, an inclusive, rich, and flexible Bayesian tracer mixing model framework implemented as an open-source R package, the disparate array of mixing model tools are consolidated into a single platform, diversified the set of available parameterizations, and provided developers a platform upon which to continue improving mixing model analyses in the future.

...read moreread less

Abstract: The ongoing evolution of tracer mixing models has resulted in a confusing array of software tools that differ in terms of data inputs, model assumptions, and associated analytic products. Here we introduce MixSIAR, an inclusive, rich, and flexible Bayesian tracer (e.g., stable isotope) mixing model framework implemented as an open-source R package. Using MixSIAR as a foundation, we provide guidance for the implementation of mixing model analyses. We begin by outlining the practical differences between mixture data error structure formulations and relate these error structures to common mixing model study designs in ecology. Because Bayesian mixing models afford the option to specify informative priors on source proportion contributions, we outline methods for establishing prior distributions and discuss the influence of prior specification on model outputs. We also discuss the options available for source data inputs (raw data versus summary statistics) and provide guidance for combining sources. We then describe a key advantage of MixSIAR over previous mixing model software-the ability to include fixed and random effects as covariates explaining variability in mixture proportions and calculate relative support for multiple models via information criteria. We present a case study of Alligator mississippiensis diet partitioning to demonstrate the power of this approach. Finally, we conclude with a discussion of limitations to mixing model applications. Through MixSIAR, we have consolidated the disparate array of mixing model tools into a single platform, diversified the set of available parameterizations, and provided developers a platform upon which to continue improving mixing model analyses in the future.

...read moreread less

580 citations

Journal Article•DOI•

Active learning narrows achievement gaps for underrepresented students in undergraduate science, technology, engineering, and math.

[...]

Elli J. Theobald¹, Mariah J. Hill¹, Elisa Tran¹, Sweta Agrawal¹, E. Nicole Arroyo¹, Shawn Behling¹, Nyasha Chambwe², Dianne Laboy Cintrón¹, Jacob D. Cooper¹, Gideon P. Dunster¹, Jared A. Grummer¹, Kelly M. Hennessey¹, Jennifer Hsiao¹, Nicole N. Iranon¹, Leonard N. Jones¹, Hannah Jordt¹, Marlowe Keller¹, Melissa E. Lacey¹, Caitlin E. Littlefield¹, Alexander T. Lowe¹, Shannon Newman¹, Vera Okolo¹, Savannah L. Olroyd¹, Brandon R. Peecook¹, Sarah B. Pickett¹, David L. Slager¹, Itzue W. Caviedes-Solis¹, Kathryn E. Stanchak¹, Vasudha Sundaravardan³, Camila Valdebenito¹, Claire R. Williams¹, Kaitlin Zinsli¹, Scott Freeman¹ - Show less +29 more•Institutions (3)

University of Washington¹, Institute for Systems Biology², Shoreline Community College³

24 Mar 2020-Proceedings of the National Academy of Sciences of the United States of America

TL;DR: The heads-and-hearts hypothesis, which holds that meaningful reductions in achievement gaps only occur when course designs combine deliberate practice with inclusive teaching, is proposed and supports calls to replace traditional lecturing with evidence-based, active-learning course designs across the STEM disciplines.

...read moreread less

Abstract: We tested the hypothesis that underrepresented students in active-learning classrooms experience narrower achievement gaps than underrepresented students in traditional lecturing classrooms, averaged across all science, technology, engineering, and mathematics (STEM) fields and courses. We conducted a comprehensive search for both published and unpublished studies that compared the performance of underrepresented students to their overrepresented classmates in active-learning and traditional-lecturing treatments. This search resulted in data on student examination scores from 15 studies (9,238 total students) and data on student failure rates from 26 studies (44,606 total students). Bayesian regression analyses showed that on average, active learning reduced achievement gaps in examination scores by 33% and narrowed gaps in passing rates by 45%. The reported proportion of time that students spend on in-class activities was important, as only classes that implemented high-intensity active learning narrowed achievement gaps. Sensitivity analyses showed that the conclusions are robust to sampling bias and other issues. To explain the extensive variation in efficacy observed among studies, we propose the heads-and-hearts hypothesis, which holds that meaningful reductions in achievement gaps only occur when course designs combine deliberate practice with inclusive teaching. Our results support calls to replace traditional lecturing with evidence-based, active-learning course designs across the STEM disciplines and suggest that innovations in instructional strategies can increase equity in higher education.

...read moreread less

478 citations

Journal Article•DOI•

R-squared for Bayesian regression models

[...]

Andrew Gelman¹, Ben Goodrich¹, Jonah Gabry¹, Aki Vehtari²•Institutions (2)

Columbia University¹, Aalto University²

13 May 2019-The American Statistician

TL;DR: In this article, the authors propose an alternative definition of R2 for Bayesian fits, where the numerator can be larger than the denominator, which is a problem for Bayes fits.

...read moreread less

Abstract: The usual definition of R2 (variance of the predicted values divided by the variance of the data) has a problem for Bayesian fits, as the numerator can be larger than the denominator. We propose an...

...read moreread less

452 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Book•

Bayesian Data Analysis

[...]

Andrew Gelman, John B. Carlin, Hal S. Stern, David B. Dunson, Aki Vehtari, Donald B. Rubin¹ - Show less +2 more•Institutions (1)

University of California, Irvine¹

01 Jan 1995

TL;DR: Detailed notes on Bayesian Computation Basics of Markov Chain Simulation, Regression Models, and Asymptotic Theorems are provided.

...read moreread less

Abstract: FUNDAMENTALS OF BAYESIAN INFERENCE Probability and Inference Single-Parameter Models Introduction to Multiparameter Models Asymptotics and Connections to Non-Bayesian Approaches Hierarchical Models FUNDAMENTALS OF BAYESIAN DATA ANALYSIS Model Checking Evaluating, Comparing, and Expanding Models Modeling Accounting for Data Collection Decision Analysis ADVANCED COMPUTATION Introduction to Bayesian Computation Basics of Markov Chain Simulation Computationally Efficient Markov Chain Simulation Modal and Distributional Approximations REGRESSION MODELS Introduction to Regression Models Hierarchical Linear Models Generalized Linear Models Models for Robust Inference Models for Missing Data NONLINEAR AND NONPARAMETRIC MODELS Parametric Nonlinear Models Basic Function Models Gaussian Process Models Finite Mixture Models Dirichlet Process Models APPENDICES A: Standard Probability Distributions B: Outline of Proofs of Asymptotic Theorems C: Computation in R and Stan Bibliographic Notes and Exercises appear at the end of each chapter.

...read moreread less

16,079 citations

Book Chapter•DOI•

Information Theory and an Extension of the Maximum Likelihood Principle

[...]

Hirotugu Akaike

01 Jan 1973

TL;DR: In this paper, it is shown that the classical maximum likelihood principle can be considered to be a method of asymptotic realization of an optimum estimate with respect to a very general information theoretic criterion.

...read moreread less

Abstract: In this paper it is shown that the classical maximum likelihood principle can be considered to be a method of asymptotic realization of an optimum estimate with respect to a very general information theoretic criterion. This observation shows an extension of the principle to provide answers to many practical problems of statistical model fitting.

...read moreread less

15,424 citations

Journal Article•DOI•

Bayesian measures of model complexity and fit

[...]

David Spiegelhalter¹, Nicola G. Best², Bradley P. Carlin³, Angelika van der Linde⁴•Institutions (4)

Medical Research Council¹, Imperial College London², University of Minnesota³, University of Bremen⁴

01 Oct 2002-Journal of The Royal Statistical Society Series B-statistical Methodology

TL;DR: In this paper, the authors consider the problem of comparing complex hierarchical models in which the number of parameters is not clearly defined and derive a measure pD for the effective number in a model as the difference between the posterior mean of the deviances and the deviance at the posterior means of the parameters of interest, which is related to other information criteria and has an approximate decision theoretic justification.

...read moreread less

Abstract: Summary. We consider the problem of comparing complex hierarchical models in which the number of parameters is not clearly defined. Using an information theoretic argument we derive a measure pD for the effective number of parameters in a model as the difference between the posterior mean of the deviance and the deviance at the posterior means of the parameters of interest. In general pD approximately corresponds to the trace of the product of Fisher's information and the posterior covariance, which in normal models is the trace of the ‘hat’ matrix projecting observations onto fitted values. Its properties in exponential families are explored. The posterior mean deviance is suggested as a Bayesian measure of fit or adequacy, and the contributions of individual observations to the fit and complexity can give rise to a diagnostic plot of deviance residuals against leverages. Adding pD to the posterior mean deviance gives a deviance information criterion for comparing models, which is related to other information criteria and has an approximate decision theoretic justification. The procedure is illustrated in some examples, and comparisons are drawn with alternative Bayesian and classical proposals. Throughout it is emphasized that the quantities required are trivial to compute in a Markov chain Monte Carlo analysis.

...read moreread less

11,691 citations

Book•

Data Analysis Using Regression and Multilevel/Hierarchical Models

[...]

Andrew Gelman¹, Yu-Sung Su¹•Institutions (1)

Columbia University¹

01 Jan 2006

TL;DR: Data Analysis Using Regression and Multilevel/Hierarchical Models is a comprehensive manual for the applied researcher who wants to perform data analysis using linear and nonlinear regression and multilevel models.

...read moreread less

Abstract: Data Analysis Using Regression and Multilevel/Hierarchical Models is a comprehensive manual for the applied researcher who wants to perform data analysis using linear and nonlinear regression and multilevel models. The book introduces a wide variety of models, whilst at the same time instructing the reader in how to fit these models using available software packages. The book illustrates the concepts by working through scores of real data examples that have arisen from the authors' own applied research, with programming codes provided for each one. Topics covered include causal inference, including regression, poststratification, matching, regression discontinuity, and instrumental variables, as well as multilevel logistic regression and missing-data imputation. Practical tips regarding building, fitting, and understanding are provided throughout.

...read moreread less

9,098 citations

Journal Article•DOI•

Strictly Proper Scoring Rules, Prediction, and Estimation

[...]

Tilmann Gneiting, Adrian E. Raftery

01 Sep 2004-Journal of the American Statistical Association

TL;DR: The theory of proper scoring rules on general probability spaces is reviewed and developed, and the intuitively appealing interval score is proposed as a utility function in interval estimation that addresses width as well as coverage.

...read moreread less

Abstract: Scoring rules assess the quality of probabilistic forecasts, by assigning a numerical score based on the predictive distribution and on the event or value that materializes. A scoring rule is proper if the forecaster maximizes the expected score for an observation drawn from the distributionF if he or she issues the probabilistic forecast F, rather than G ≠ F. It is strictly proper if the maximum is unique. In prediction problems, proper scoring rules encourage the forecaster to make careful assessments and to be honest. In estimation problems, strictly proper scoring rules provide attractive loss and utility functions that can be tailored to the problem at hand. This article reviews and develops the theory of proper scoring rules on general probability spaces, and proposes and discusses examples thereof. Proper scoring rules derive from convex functions and relate to information measures, entropy functions, and Bregman divergences. In the case of categorical variables, we prove a rigorous version of the ...

...read moreread less

4,644 citations