scispace - formally typeset
Search or ask a question

Showing papers on "Bayesian probability published in 2020"


Journal ArticleDOI
TL;DR: Evidence relevant to Earth's equilibrium climate sensitivity per doubling of atmospheric CO2, characterized by an effective sensitivity S, is assessed, using a Bayesian approach to produce a probability density function for S given all the evidence, and promising avenues for further narrowing the range are identified.
Abstract: We assess evidence relevant to Earth's equilibrium climate sensitivity per doubling of atmospheric CO2, characterized by an effective sensitivity S. This evidence includes feedback process understanding, the historical climate record, and the paleoclimate record. An S value lower than 2 K is difficult to reconcile with any of the three lines of evidence. The amount of cooling during the Last Glacial Maximum provides strong evidence against values of S greater than 4.5 K. Other lines of evidence in combination also show that this is relatively unlikely. We use a Bayesian approach to produce a probability density function (PDF) for S given all the evidence, including tests of robustness to difficult-to-quantify uncertainties and different priors. The 66% range is 2.6-3.9 K for our Baseline calculation and remains within 2.3-4.5 K under the robustness tests; corresponding 5-95% ranges are 2.3-4.7 K, bounded by 2.0-5.7 K (although such high-confidence ranges should be regarded more cautiously). This indicates a stronger constraint on S than reported in past assessments, by lifting the low end of the range. This narrowing occurs because the three lines of evidence agree and are judged to be largely independent and because of greater confidence in understanding feedback processes and in combining evidence. We identify promising avenues for further narrowing the range in S, in particular using comprehensive models and process understanding to address limitations in the traditional forcing-feedback paradigm for interpreting past changes.

480 citations


Posted Content
TL;DR: In this article, deep ensembles provide an effective mechanism for approximate Bayesian marginalization, and propose a related approach that further improves the predictive distribution by marginalizing within basins of attraction, without significant overhead.
Abstract: The key distinguishing property of a Bayesian approach is marginalization, rather than using a single setting of weights. Bayesian marginalization can particularly improve the accuracy and calibration of modern deep neural networks, which are typically underspecified by the data, and can represent many compelling but different solutions. We show that deep ensembles provide an effective mechanism for approximate Bayesian marginalization, and propose a related approach that further improves the predictive distribution by marginalizing within basins of attraction, without significant overhead. We also investigate the prior over functions implied by a vague distribution over neural network weights, explaining the generalization properties of such models from a probabilistic perspective. From this perspective, we explain results that have been presented as mysterious and distinct to neural network generalization, such as the ability to fit images with random labels, and show that these results can be reproduced with Gaussian processes. We also show that Bayesian model averaging alleviates double descent, resulting in monotonic performance improvements with increased flexibility. Finally, we provide a Bayesian perspective on tempering for calibrating predictive distributions.

328 citations


Posted Content
TL;DR: This tutorial provides deep learning practitioners with an overview of the relevant literature and a complete toolset to design, implement, train, use and evaluate Bayesian neural networks, i.e., stochastic artificial neural networks trained using Bayesian methods.
Abstract: Modern deep learning methods have equipped researchers and engineers with incredibly powerful tools to tackle problems that previously seemed impossible. However, since deep learning methods operate as black boxes, the uncertainty associated with their predictions is often challenging to quantify. Bayesian statistics offer a formalism to understand and quantify the uncertainty associated with deep neural networks predictions. This paper provides a tutorial for researchers and scientists who are using machine learning, especially deep learning, with an overview of the relevant literature and a complete toolset to design, implement, train, use and evaluate Bayesian neural networks.

266 citations


Proceedings Article
12 Jul 2020
TL;DR: This work demonstrates through careful MCMC sampling that the posterior predictive induced by the Bayes posterior yields systematically worse predictions compared to simpler methods including point estimates obtained from SGD and argues that it is timely to focus on understanding the origin of the improved performance of cold posteriors.
Abstract: During the past five years the Bayesian deep learning community has developed increasingly accurate and efficient approximate inference procedures that allow for Bayesian inference in deep neural networks. However, despite this algorithmic progress and the promise of improved uncertainty quantification and sample efficiency there are—as of early 2020—no publicized deployments of Bayesian neural networks in industrial practice. In this work we cast doubt on the current understanding of Bayes posteriors in popular deep neural networks: we demonstrate through careful MCMC sampling that the posterior predictive induced by the Bayes posterior yields systematically worse predictions compared to simpler methods including point estimates obtained from SGD. Furthermore, we demonstrate that predictive performance is improved significantly through the use of a “cold posterior” that overcounts evidence. Such cold posteriors sharply deviate from the Bayesian paradigm but are commonly used as heuristic in Bayesian deep learning papers. We put forward several hypotheses that could explain cold posteriors and evaluate the hypotheses through experiments. Our work questions the goal of accurate posterior approximations in Bayesian deep learning: If the true Bayes posterior is poor, what is the use of more accurate approximations? Instead, we argue that it is timely to focus on understanding the origin of the improved performance of cold posteriors. Code available on GitHub.

198 citations


Journal ArticleDOI
TL;DR: The findings indicate the promising performance of using LSTM-CNN to predict real-time crash risk on arterials and suggest that the proposed model outperforms others in terms of Area Under the Curve (AUC) value, sensitivity, and false alarm rate.

173 citations


Journal ArticleDOI
TL;DR: It is impressive that the proposed Bayesian causal forest model, in many settings, has better bias reduction, more consistent 95% coverage probability and shorter uncertainty intervals compared to the vanilla BART, which boasts better performance in a host of modern causal inference studies.
Abstract: This paper presents a novel nonlinear regression model for estimating heterogeneous treatment effects, geared specifically towards situations with small effect sizes, heterogeneous effects, and strong confounding by observables. Standard nonlinear regression models, which may work quite well for prediction, have two notable weaknesses when used to estimate heterogeneous treatment effects. First, they can yield badly biased estimates of treatment effects when fit to data with strong confounding. The Bayesian causal forest model presented in this paper avoids this problem by directly incorporating an estimate of the propensity function in the specification of the response model, implicitly inducing a covariate-dependent prior on the regression function. Second, standard approaches to response surface modeling do not provide adequate control over the strength of regularization over effect heterogeneity. The Bayesian causal forest model permits treatment effect heterogeneity to be regularized separately from the prognostic effect of control variables, making it possible to informatively “shrink to homogeneity”. While we focus on observational data, our methods are equally useful for inferring heterogeneous treatment effects from randomized controlled experiments where careful regularization is somewhat less complicated but no less important. We illustrate these benefits via the reanalysis of an observational study assessing the causal effects of smoking on medical expenditures as well as extensive simulation studies.

172 citations



Journal ArticleDOI
TL;DR: By providing coded examples using integrated nested Laplace approximations and Template Model Builder for Bayesian and frequentist analysis via the R packages R-INLA and glmmTMB, it is hoped to make efficient estimation of RSFs and SSFs with random effects accessible to anyone in the field.
Abstract: Popular frameworks for studying habitat selection include resource-selection functions (RSFs) and step-selection functions (SSFs), estimated using logistic and conditional logistic regression, respectively. Both frameworks compare environmental covariates associated with locations animals visit with environmental covariates at a set of locations assumed available to the animals. Conceptually, slopes that vary by individual, that is, random coefficient models, could be used to accommodate inter-individual heterogeneity with either approach. While fitting such models for RSFs is possible with standard software for generalized linear mixed-effects models (GLMMs), straightforward and efficient one-step procedures for fitting SSFs with random coefficients are currently lacking. To close this gap, we take advantage of the fact that the conditional logistic regression model (i.e. the SSF) is likelihood-equivalent to a Poisson model with stratum-specific fixed intercepts. By interpreting the intercepts as a random effect with a large (fixed) variance, inference for random-slope models becomes feasible with standard Bayesian techniques, or with frequentist methods that allow one to fix the variance of a random effect. We compare this approach to other commonly applied alternatives, including models without random slopes and mixed conditional regression models fit using a two-step algorithm. Using data from mountain goats (Oreamnos americanus) and Eurasian otters (Lutra lutra), we illustrate that our models lead to valid and feasible inference. In addition, we conduct a simulation study to compare different estimation approaches for SSFs and to demonstrate the importance of including individual-specific slopes when estimating individual- and population-level habitat-selection parameters. By providing coded examples using integrated nested Laplace approximations (INLA) and Template Model Builder (TMB) for Bayesian and frequentist analysis via the R packages R-INLA and glmmTMB, we hope to make efficient estimation of RSFs and SSFs with random effects accessible to anyone in the field. SSFs with individual-specific coefficients are particularly attractive since they can provide insights into movement and habitat-selection processes at fine-spatial and temporal scales, but these models had previously been very challenging to fit.

147 citations


Proceedings Article
30 Apr 2020
TL;DR: This work develops Cyclical Stochastic Gradient MCMC (SG-MCMC), a cyclical stepsize schedule, where larger steps discover new modes, and smaller steps characterize each mode, and proves non-asymptotic convergence of the proposed algorithm.
Abstract: The posteriors over neural network weights are high dimensional and multimodal. Each mode typically characterizes a meaningfully different representation of the data. We develop Cyclical Stochastic Gradient MCMC (SG-MCMC) to automatically explore such distributions. In particular, we propose a cyclical stepsize schedule, where larger steps discover new modes, and smaller steps characterize each mode. We prove non-asymptotic convergence theory of our proposed algorithm. Moreover, we provide extensive experimental results, including ImageNet, to demonstrate the effectiveness of cyclical SG-MCMC in learning complex multimodal distributions, especially for fully Bayesian inference with modern deep neural networks.

129 citations


Proceedings Article
12 Jul 2020
TL;DR: It is shown that a sufficient condition for a calibrated uncertainty on a ReLU network is "to be a bit Bayesian", which validate the usage of last-layer Bayesian approximation and motivate a range of a fidelity-cost trade-off.
Abstract: The point estimates of ReLU classification networks---arguably the most widely used neural network architecture---have been shown to yield arbitrarily high confidence far away from the training data. This architecture, in conjunction with a maximum a posteriori estimation scheme, is thus not calibrated nor robust. Approximate Bayesian inference has been empirically demonstrated to improve predictive uncertainty in neural networks, although the theoretical analysis of such Bayesian approximations is limited. We theoretically analyze approximate Gaussian distributions on the weights of ReLU networks and show that they fix the overconfidence problem. Furthermore, we show that even a simplistic, thus cheap, Bayesian approximation, also fixes these issues. This indicates that a sufficient condition for a calibrated uncertainty on a ReLU network is "to be a bit Bayesian". These theoretical results validate the usage of last-layer Bayesian approximation and motivate a range of a fidelity-cost trade-off. We further validate these findings empirically via various standard experiments using common deep ReLU networks and Laplace approximations.

126 citations


Journal ArticleDOI
TL;DR: There is a natural Bayesian mechanics for any system that possesses a Markov blanket, which means that there is an explicit link between the inference performed by internal states and their energetics—as characterized by their stochastic thermodynamics.
Abstract: This paper considers the relationship between thermodynamics, information and inference. In particular, it explores the thermodynamic concomitants of belief updating, under a variational (free energy) principle for self-organization. In brief, any (weakly mixing) random dynamical system that possesses a Markov blanket-i.e. a separation of internal and external states-is equipped with an information geometry. This means that internal states parametrize a probability density over external states. Furthermore, at non-equilibrium steady-state, the flow of internal states can be construed as a gradient flow on a quantity known in statistics as Bayesian model evidence. In short, there is a natural Bayesian mechanics for any system that possesses a Markov blanket. Crucially, this means that there is an explicit link between the inference performed by internal states and their energetics-as characterized by their stochastic thermodynamics. This article is part of the theme issue 'Harmonizing energy-autonomous computing and intelligence'.

Journal ArticleDOI
TL;DR: In this paper, the authors provide a guide for executing and interpreting a Bayesian ANOVA with JASP, an open-source statistical software program with a graphical user interface, using two empirical examples.
Abstract: Analysis of variance (ANOVA) is the standard procedure for statistical inference in factorial designs. Typically, ANOVAs are executed using frequentist statistics, where p-values determine statistical significance in an all-or-none fashion. In recent years, the Bayesian approach to statistics is increasingly viewed as a legitimate alternative to the p-value. However, the broad adoption of Bayesian statistics-and Bayesian ANOVA in particular-is frustrated by the fact that Bayesian concepts are rarely taught in applied statistics courses. Consequently, practitioners may be unsure how to conduct a Bayesian ANOVA and interpret the results. Here we provide a guide for executing and interpreting a Bayesian ANOVA with JASP, an open-source statistical software program with a graphical user interface. We explain the key concepts of the Bayesian ANOVA using two empirical examples.

Journal ArticleDOI
TL;DR: This systematic literature review is the first study aggregating information from numerous simulation studies to present an overview of the performance of Bayesian and frequentist estimation for structural equation models with small sample sizes, and recommends against naively using Bayesian estimation when samples are small.
Abstract: In small sample contexts, Bayesian estimation is often suggested as a viable alternative to frequentist estimation, such as maximum likelihood estimation. Our systematic literature review is the fi...

Journal ArticleDOI
TL;DR: A flexible t-prior for standardized effect size is proposed that allows computation of the Bayes factor by evaluating a single numerical integral and two measures for informed prior distributions that quantify the departure from the objective Bayes factors desiderata of predictive matching and information consistency are proposed.
Abstract: Across the empirical sciences, few statistical procedures rival the popularity of the frequentist t -test. In contrast, the Bayesian versions of the t -test have languished in obscurity. In recent ...

Journal ArticleDOI
TL;DR: H hierarchical regression and post‐stratification models with code in Stan are demonstrated and their application to a controversial recent study of SARS‐CoV‐2 antibodies in a sample of people from the Stanford University area is discussed.
Abstract: When testing for a rare disease, prevalence estimates can be highly sensitive to uncertainty in the specificity and sensitivity of the test Bayesian inference is a natural way to propagate these uncertainties, with hierarchical modelling capturing variation in these parameters across experiments Another concern is the people in the sample not being representative of the general population Statistical adjustment cannot without strong assumptions correct for selection bias in an opt-in sample, but multilevel regression and post-stratification can at least adjust for known differences between the sample and the population We demonstrate hierarchical regression and post-stratification models with code in Stan and discuss their application to a controversial recent study of SARS-CoV-2 antibodies in a sample of people from the Stanford University area Wide posterior intervals make it impossible to evaluate the quantitative claims of that study regarding the number of unreported infections For future studies, the methods described here should facilitate more accurate estimates of disease prevalence from imperfect tests performed on non-representative samples

Book
17 Feb 2020
TL;DR: The integrated nested Laplace approximation (INLA) as mentioned in this paper is a recent computational method that can fit Bayesian models in a fraction of the time required by typical Markov chain Monte Carlo (MCMC) methods.
Abstract: The integrated nested Laplace approximation (INLA) is a recent computational method that can fit Bayesian models in a fraction of the time required by typical Markov chain Monte Carlo (MCMC) methods. INLA focuses on marginal inference on the model parameters of latent Gaussian Markov random fields models and exploits conditional independence properties in the model for computational speed. Bayesian Inference with INLA provides a description of INLA and its associated R package for model fitting. This book describes the underlying methodology as well as how to fit a wide range of models with R. Topics covered include generalized linear mixed-effects models, multilevel models, spatial and spatio-temporal models, smoothing methods, survival analysis, imputation of missing values, and mixture models. Advanced features of the INLA package and how to extend the number of priors and latent models available in the package are discussed. All examples in the book are fully reproducible and datasets and R code are available from the book website. This book will be helpful to researchers from different areas with some background in Bayesian inference that want to apply the INLA method in their work. The examples cover topics on biostatistics, econometrics, education, environmental science, epidemiology, public health, and the social sciences.

Journal ArticleDOI
TL;DR: This contribution explains in a straightforward manner how Bayesian inference can be used to identify material parameters of material models for solids in order to allow a one-to-one comparison between the true parameter values and the identified parameter distributions.
Abstract: The aim of this contribution is to explain in a straightforward manner how Bayesian inference can be used to identify material parameters of material models for solids. Bayesian approaches have already been used for this purpose, but most of the literature is not necessarily easy to understand for those new to the field. The reason for this is that most literature focuses either on complex statistical and machine learning concepts and/or on relatively complex mechanical models. In order to introduce the approach as gently as possible, we only focus on stress–strain measurements coming from uniaxial tensile tests and we only treat elastic and elastoplastic material models. Furthermore, the stress–strain measurements are created artificially in order to allow a one-to-one comparison between the true parameter values and the identified parameter distributions.

Journal ArticleDOI
TL;DR: Bayesian additive regression trees (BART) provides a flexible approach to fitting a variety of regression models while avoiding strong parametric assumptions.
Abstract: Bayesian additive regression trees (BART) provides a flexible approach to fitting a variety of regression models while avoiding strong parametric assumptions. The sum-of-trees model is embedded in ...

Journal ArticleDOI
TL;DR: Using deep-learning techniques to instantly produce the posterior p(θ|D) for the source parameters θ, given the detector data D, has broad relevance to gravitational-wave applications such as low-latency parameter estimation and characterizing the science returns of future experiments.
Abstract: We seek to achieve the holy grail of Bayesian inference for gravitational-wave astronomy: using deep-learning techniques to instantly produce the posterior p(θ|D) for the source parameters θ, given the detector data D. To do so, we train a deep neural network to take as input a signal + noise dataset (drawn from the astrophysical source-parameter prior and the sampling distribution of detector noise), and to output a parametrized approximation of the corresponding posterior. We rely on a compact representation of the data based on reduced-order modeling, which we generate efficiently using a separate neural-network waveform interpolant [A. J. K. Chua, C. R. Galley, and M. Vallisneri, Phys. Rev. Lett. 122, 211101 (2019)PRLTAO0031-900710.1103/PhysRevLett.122.211101]. Our scheme has broad relevance to gravitational-wave applications such as low-latency parameter estimation and characterizing the science returns of future experiments.

Journal ArticleDOI
TL;DR: In this paper, the authors consider parallel global optimization of derivative-free expensive-to-evaluate functions, and propose an efficient method based on stochastic approximation for implementing a conceptual Bayesian optimization algorithm.
Abstract: We consider parallel global optimization of derivative-free expensive-to-evaluate functions, and proposes an efficient method based on stochastic approximation for implementing a conceptual Bayesian optimization algorithm proposed by [10]. To accomplish this, we use infinitessimal perturbation analysis (IPA) to construct a stochastic gradient estimator and show that this estimator is unbiased.

Journal ArticleDOI
TL;DR: A novel intelligent system that comprises the feature selection with a hybrid approach of the Rough set theory and the Bayes theorem is designed, which reduces false alarm rate, computational complexity, training complexity and increases detection rate.

Journal ArticleDOI
Riko Kelter1
TL;DR: A non-technical introduction to Bayesian hypothesis testing in JASP is provided by comparing traditional tests and statistical methods with their Bayesian counterparts by showing the strengths and limitations of JASp for frequentist NHST and Bayesian inference.
Abstract: Although null hypothesis significance testing (NHST) is the agreed gold standard in medical decision making and the most widespread inferential framework used in medical research, it has several drawbacks. Bayesian methods can complement or even replace frequentist NHST, but these methods have been underutilised mainly due to a lack of easy-to-use software. JASP is an open-source software for common operating systems, which has recently been developed to make Bayesian inference more accessible to researchers, including the most common tests, an intuitive graphical user interface and publication-ready output plots. This article provides a non-technical introduction to Bayesian hypothesis testing in JASP by comparing traditional tests and statistical methods with their Bayesian counterparts. The comparison shows the strengths and limitations of JASP for frequentist NHST and Bayesian inference. Specifically, Bayesian hypothesis testing via Bayes factors can complement and even replace NHST in most situations in JASP. While p-values can only reject the null hypothesis, the Bayes factor can state evidence for both the null and the alternative hypothesis, making confirmation of hypotheses possible. Also, effect sizes can be precisely estimated in the Bayesian paradigm via JASP. Bayesian inference has not been widely used by now due to the dearth of accessible software. Medical decision making can be complemented by Bayesian hypothesis testing in JASP, providing richer information than single p-values and thus strengthening the credibility of an analysis. Through an easy point-and-click interface researchers used to other graphical statistical packages like SPSS can seemlessly transition to JASP and benefit from the listed advantages with only few limitations.

Posted Content
TL;DR: In this paper, the authors discuss methods for constructing Bayesian networks from prior knowledge and summarize Bayesian statistical methods for using data to improve these models, including techniques for learning with incomplete data.
Abstract: A Bayesian network is a graphical model that encodes probabilistic relationships among variables of interest. When used in conjunction with statistical techniques, the graphical model has several advantages for data analysis. One, because the model encodes dependencies among all variables, it readily handles situations where some data entries are missing. Two, a Bayesian network can be used to learn causal relationships, and hence can be used to gain understanding about a problem domain and to predict the consequences of intervention. Three, because the model has both a causal and probabilistic semantics, it is an ideal representation for combining prior knowledge (which often comes in causal form) and data. Four, Bayesian statistical methods in conjunction with Bayesian networks offer an efficient and principled approach for avoiding the overfitting of data. In this paper, we discuss methods for constructing Bayesian networks from prior knowledge and summarize Bayesian statistical methods for using data to improve these models. With regard to the latter task, we describe methods for learning both the parameters and structure of a Bayesian network, including techniques for learning with incomplete data. In addition, we relate Bayesian-network methods for learning to techniques for supervised and unsupervised learning. We illustrate the graphical-modeling approach using a real-world case study.

Journal ArticleDOI
TL;DR: In an application that involves 20 macroeconomic variables, it is found that these BVARs with more flexible covariance structures outperform the standard variant with independent, homoscedastic Gaussian innovations in both in-sample model-fit and out-of-sample forecast performance.
Abstract: We introduce a class of large Bayesian vector autoregressions (BVARs) that allows for non-Gaussian, heteroscedastic, and serially dependent innovations. To make estimation computationally tractable...

Proceedings Article
03 Jun 2020
TL;DR: The authors proposed a modification to the usual ensembling process that results in approximate Bayesian inference, regularising parameters about values drawn from a distribution which can be set equal to the prior.
Abstract: Understanding the uncertainty of a neural network's (NN) predictions is essential for many purposes. The Bayesian framework provides a principled approach to this, however applying it to NNs is challenging due to large numbers of parameters and data. Ensembling NNs provides an easily implementable, scalable method for uncertainty quantification, however, it has been criticised for not being Bayesian. This work proposes one modification to the usual process that we argue does result in approximate Bayesian inference; regularising parameters about values drawn from a distribution which can be set equal to the prior. A theoretical analysis of the procedure in a simplified setting suggests the recovered posterior is centred correctly but tends to have an underestimated marginal variance, and overestimated correlation. However, two conditions can lead to exact recovery. We argue that these conditions are partially present in NNs. Empirical evaluations demonstrate it has an advantage over standard ensembling, and is competitive with variational methods.

Posted Content
TL;DR: The key distinguishing property of a Bayesian approach is marginalization instead of optimization, not the prior, or Bayes rule, which reflects the inductive biases of neural networks that help them generalize.
Abstract: The key distinguishing property of a Bayesian approach is marginalization instead of optimization, not the prior, or Bayes rule. Bayesian inference is especially compelling for deep neural networks. (1) Neural networks are typically underspecified by the data, and can represent many different but high performing models corresponding to different settings of parameters, which is exactly when marginalization will make the biggest difference for both calibration and accuracy. (2) Deep ensembles have been mistaken as competing approaches to Bayesian methods, but can be seen as approximate Bayesian marginalization. (3) The structure of neural networks gives rise to a structured prior in function space, which reflects the inductive biases of neural networks that help them generalize. (4) The observed correlation between parameters in flat regions of the loss and a diversity of solutions that provide good generalization is further conducive to Bayesian marginalization, as flat regions occupy a large volume in a high dimensional space, and each different solution will make a good contribution to a Bayesian model average. (5) Recent practical advances for Bayesian deep learning provide improvements in accuracy and calibration compared to standard training, while retaining scalability.

Journal ArticleDOI
TL;DR: This work proposes an innovative physics-constrained Bayesian deep learning approach to reconstruct flow fields from sparse, noisy velocity data, where equation-based constraints are imposed through the likelihood function and uncertainty of the reconstructed flow can be estimated.

Posted Content
TL;DR: A PyTorch-based package that implements SBI algorithms based on neural networks facilitates inference on black-box simulators for practising scientists and engineers by providing a unified interface to state-of-the-art algorithms together with documentation and tutorials.
Abstract: Scientists and engineers employ stochastic numerical simulators to model empirically observed phenomena. In contrast to purely statistical models, simulators express scientific principles that provide powerful inductive biases, improve generalization to new data or scenarios and allow for fewer, more interpretable and domain-relevant parameters. Despite these advantages, tuning a simulator's parameters so that its outputs match data is challenging. Simulation-based inference (SBI) seeks to identify parameter sets that a) are compatible with prior knowledge and b) match empirical observations. Importantly, SBI does not seek to recover a single 'best' data-compatible parameter set, but rather to identify all high probability regions of parameter space that explain observed data, and thereby to quantify parameter uncertainty. In Bayesian terminology, SBI aims to retrieve the posterior distribution over the parameters of interest. In contrast to conventional Bayesian inference, SBI is also applicable when one can run model simulations, but no formula or algorithm exists for evaluating the probability of data given parameters, i.e. the likelihood. We present $\texttt{sbi}$, a PyTorch-based package that implements SBI algorithms based on neural networks. $\texttt{sbi}$ facilitates inference on black-box simulators for practising scientists and engineers by providing a unified interface to state-of-the-art algorithms together with documentation and tutorials.

Journal ArticleDOI
Riko Kelter1
TL;DR: A chapter-by-chapter recension and general comments about Richard McElreath’s second edition of Statistical Rethinking: A Bayesian Course with Examples in R and STAN highlight the flexibility and usefulness of considering the Bayesian approach to statistical modeling.
Abstract: In this book review, I offer a chapter-by-chapter recension and general comments about Richard McElreath’s second edition of Statistical Rethinking: A Bayesian Course with Examples in R and STAN. T...

Posted Content
TL;DR: This article showed that the posterior predictive induced by the Bayes posterior yields systematically worse predictions compared to simpler methods including point estimates obtained from SGD, and showed that predictive performance is improved significantly through the use of a "cold posterior" that overcounts evidence.
Abstract: During the past five years the Bayesian deep learning community has developed increasingly accurate and efficient approximate inference procedures that allow for Bayesian inference in deep neural networks. However, despite this algorithmic progress and the promise of improved uncertainty quantification and sample efficiency there are---as of early 2020---no publicized deployments of Bayesian neural networks in industrial practice. In this work we cast doubt on the current understanding of Bayes posteriors in popular deep neural networks: we demonstrate through careful MCMC sampling that the posterior predictive induced by the Bayes posterior yields systematically worse predictions compared to simpler methods including point estimates obtained from SGD. Furthermore, we demonstrate that predictive performance is improved significantly through the use of a "cold posterior" that overcounts evidence. Such cold posteriors sharply deviate from the Bayesian paradigm but are commonly used as heuristic in Bayesian deep learning papers. We put forward several hypotheses that could explain cold posteriors and evaluate the hypotheses through experiments. Our work questions the goal of accurate posterior approximations in Bayesian deep learning: If the true Bayes posterior is poor, what is the use of more accurate approximations? Instead, we argue that it is timely to focus on understanding the origin of the improved performance of cold posteriors.