scispace - formally typeset
Search or ask a question

Showing papers on "Bayesian probability published in 2019"


Journal ArticleDOI
TL;DR: The Bayesian framework for statistics is quickly gaining in popularity among scientists, for reasons such as reliability and accuracy, the possibility of incorporating prior knowledge into the analysis, and the intuitive interpretation of results.
Abstract: The Bayesian framework for statistics is quickly gaining in popularity among scientists, for reasons such as reliability and accuracy (particularly in noisy data and small samples), the possibility of incorporating prior knowledge into the analysis, and the intuitive interpretation of results (Andrews & Baguley, 2013; Etz & Vandekerckhove, 2016; Kruschke, 2010; Kruschke, Aguinis, & Joo, 2012; Wagenmakers et al., 2017). Adopting the Bayesian framework is more of a shift in the paradigm than a change in the methodology; all the common statistical procedures (t-tests, correlations, ANOVAs, regressions, etc.) can also be achieved within the Bayesian framework. One of the core difference is that in the frequentist view, the effects are fixed (but unknown) and data are random. On the other hand, instead of having single estimates of the “true effect”, the Bayesian inference process computes the probability of different effects given the observed data, resulting in a distribution of possible values for the parameters, called the posterior distribution. The bayestestR package provides tools to describe these posterior distributions.

505 citations


Journal ArticleDOI
TL;DR: In this article, the authors propose an alternative definition of R2 for Bayesian fits, where the numerator can be larger than the denominator, which is a problem for Bayes fits.
Abstract: The usual definition of R2 (variance of the predicted values divided by the variance of the data) has a problem for Bayesian fits, as the numerator can be larger than the denominator. We propose an...

452 citations


Journal ArticleDOI
TL;DR: Visualization is helpful in each of these stages of the Bayesian workflow and it is indispensable when drawing inferences from the types of modern, high dimensional models that are used by applied researchers.
Abstract: Bayesian data analysis is about more than just computing a posterior distribution, and Bayesian visualization is about more than trace plots of Markov chains. Practical Bayesian data analysis, like all data analysis, is an iterative process of model building, inference, model checking and evaluation, and model expansion. Visualization is helpful in each of these stages of the Bayesian workflow and it is indispensable when drawing inferences from the types of modern, high dimensional models that are used by applied researchers.

440 citations


Posted Content
TL;DR: TGAN, which uses a conditional generative adversarial network to address challenges of modeling the probability distribution of rows in tabular data, outperforms Bayesian methods on most of the real datasets whereas other deep learning methods could not.
Abstract: Modeling the probability distribution of rows in tabular data and generating realistic synthetic data is a non-trivial task. Tabular data usually contains a mix of discrete and continuous columns. Continuous columns may have multiple modes whereas discrete columns are sometimes imbalanced making the modeling difficult. Existing statistical and deep neural network models fail to properly model this type of data. We design TGAN, which uses a conditional generative adversarial network to address these challenges. To aid in a fair and thorough comparison, we design a benchmark with 7 simulated and 8 real datasets and several Bayesian network baselines. TGAN outperforms Bayesian methods on most of the real datasets whereas other deep learning methods could not.

354 citations


Journal ArticleDOI
TL;DR: This study describes and compares several Bayesian indices, provide intuitive visual representation of their “behavior” in relationship with common sources of variance such as sample size, magnitude of effects and also frequentist significance, and contributes to the development of an intuitive understanding of the values that researchers report, critical for the standardization of scientific reporting.
Abstract: Turmoil has engulfed psychological science. Causes and consequences of the reproducibility crisis are in dispute. With the hope of addressing some of its aspects, Bayesian methods are gaining increasing attention in psychological science. Some of their advantages, as opposed to the frequentist framework, are the ability to describe parameters in probabilistic terms and explicitly incorporate prior knowledge about them into the model. These issues are crucial in particular regarding the current debate about statistical significance. Bayesian methods are not necessarily the only remedy against incorrect interpretations or wrong conclusions, but there is an increasing agreement that they are one of the keys to avoid such fallacies. Nevertheless, its flexible nature is its power and weakness, for there is no agreement about what indices of "significance" should be computed or reported. This lack of a consensual index or guidelines, such as the frequentist p-value, further contributes to the unnecessary opacity that many non-familiar readers perceive in Bayesian statistics. Thus, this study describes and compares several Bayesian indices, provide intuitive visual representation of their "behavior" in relationship with common sources of variance such as sample size, magnitude of effects and also frequentist significance. The results contribute to the development of an intuitive understanding of the values that researchers report, allowing to draw sensible recommendations for Bayesian statistics description, critical for the standardization of scientific reporting.

327 citations


Journal ArticleDOI
TL;DR: Variational inference (VI) as mentioned in this paper approximates a high-dimensional Bayesian posterior with a simpler variational distribution by solving an optimization problem, which has been successfully applied to various models and large-scale applications.
Abstract: Many modern unsupervised or semi-supervised machine learning algorithms rely on Bayesian probabilistic models. These models are usually intractable and thus require approximate inference. Variational inference (VI) lets us approximate a high-dimensional Bayesian posterior with a simpler variational distribution by solving an optimization problem. This approach has been successfully applied to various models and large-scale applications. In this review, we give an overview of recent trends in variational inference. We first introduce standard mean field variational inference, then review recent advances focusing on the following aspects: (a) scalable VI, which includes stochastic approximations, (b) generic VI, which extends the applicability of VI to a large class of otherwise intractable models, such as non-conjugate models, (c) accurate VI, which includes variational models beyond the mean field approximation or with atypical divergences, and (d) amortized VI, which implements the inference over local latent variables with inference networks. Finally, we provide a summary of promising future research directions.

319 citations


Journal ArticleDOI
01 Jul 2019-Oikos
TL;DR: Ecologists can and should debate the appropriate form of prior information, but should consider weakly informative priors as the new ‘default’ prior for any Bayesian model.
Abstract: Throughout the last two decades, Bayesian statistical methods have proliferated throughout ecology and evolution. Numerous previous references established both philosophical and computational guidelines for implementing Bayesian methods. However, protocols for incorporating prior information, the defining characteristic of Bayesian philosophy, are nearly nonexistent in the ecological literature. Here, I hope to encourage the use of weakly informative priors in ecology and evolution by providing a ‘consumer's guide’ to weakly informative priors. The first section outlines three reasons why ecologists should abandon noninformative priors: 1) common flat priors are not always noninformative, 2) noninformative priors provide the same result as simpler frequentist methods, and 3) noninformative priors suffer from the same high type I and type M error rates as frequentist methods. The second section provides a guide for implementing informative priors, wherein I detail convenient ‘reference’ prior distributions for common statistical models (i.e. regression, ANOVA, hierarchical models). I then use simulations to visually demonstrate how informative priors influence posterior parameter estimates. With the guidelines provided here, I hope to encourage the use of weakly informative priors for Bayesian analyses in ecology. Ecologists can and should debate the appropriate form of prior information, but should consider weakly informative priors as the new ‘default’ prior for any Bayesian model.

246 citations


Journal ArticleDOI
27 Nov 2019
TL;DR: In this article, an alternative summation of the MultiNest draws, called importance nested sampling (INS), is presented, which can calculate the Bayesian evidence at up to an order of magnitude higher accuracy than vanilla NS with no change in the way Multi-Nest explores the parameter space.
Abstract: Bayesian inference involves two main computational challenges. First, in estimating the parameters of some model for the data, the posterior distribution may well be highly multi-modal: a regime in which the convergence to stationarity of traditional Markov Chain Monte Carlo (MCMC) techniques becomes incredibly slow. Second, in selecting between a set of competing models the necessary estimation of the Bayesian evidence for each is, by definition, a (possibly high-dimensional) integration over the entire parameter space; again this can be a daunting computational task, although new Monte Carlo (MC) integration algorithms offer solutions of ever increasing efficiency. Nested sampling (NS) is one such contemporary MC strategy targeted at calculation of the Bayesian evidence, but which also enables posterior inference as a by-product, thereby allowing simultaneous parameter estimation and model selection. The widely-used MultiNest algorithm presents a particularly efficient implementation of the NS technique for multi-modal posteriors. In this paper we discuss importance nested sampling (INS), an alternative summation of the MultiNest draws, which can calculate the Bayesian evidence at up to an order of magnitude higher accuracy than `vanilla' NS with no change in the way MultiNest explores the parameter space. This is accomplished by treating as a (pseudo-)importance sample the totality of points collected by MultiNest, including those previously discarded under the constrained likelihood sampling of the NS algorithm. We apply this technique to several challenging test problems and compare the accuracy of Bayesian evidences obtained with INS against those from vanilla NS.

204 citations


Journal ArticleDOI
TL;DR: An updated NDV classification and nomenclature system that incorporates phylogenetic topology, genetic distances, branch support, and epidemiological independence was developed and will facilitate future studies of NDV evolution and epidemiology, and comparison of results obtained across the world.

192 citations


Proceedings Article
01 Jan 2019
TL;DR: This work enables practical deep learning while preserving benefits of Bayesian principles, and applies techniques such as batch normalisation, data augmentation, and distributed training to achieve similar performance in about the same number of epochs as the Adam optimiser.
Abstract: Bayesian methods promise to fix many shortcomings of deep learning, but they are impractical and rarely match the performance of standard methods, let alone improve them. In this paper, we demonstrate practical training of deep networks with natural-gradient variational inference. By applying techniques such as batch normalisation, data augmentation, and distributed training, we achieve similar performance in about the same number of epochs as the Adam optimiser, even on large datasets such as ImageNet. Importantly, the benefits of Bayesian principles are preserved: predictive probabilities are well-calibrated, uncertainties on out-of-distribution data are improved, and continual-learning performance is boosted. This work enables practical deep learning while preserving benefits of Bayesian principles. A PyTorch implementation is available as a plug-and-play optimiser.

167 citations


Journal ArticleDOI
TL;DR: This work proposes to conduct likelihood-free Bayesian inferences about parameters with no prior selection of the relevant components of the summary statistics and bypassing the derivation of the associated tolerance level using the random forest methodology of Breiman (2001).
Abstract: Approximate Bayesian Computation (ABC) has grown into a standard methodology to handle Bayesian inference in models associated with intractable likelihood functions. Most ABC implementations require the selection of a summary statistic as the data itself is too large or too complex to be compared to simulated realisations from the assumed model. The dimension of this statistic is generally constrained to be close to the dimension of the model parameter for efficiency reasons. Furthermore, the tolerance level that governs the acceptance or rejection of parameter values needs to be calibrated and the range of calibration techniques available so far is mostly based on asymptotic arguments. We propose here to conduct Bayesian inference based on an arbitrarily large vector of summary statistics without imposing a selection of the relevant components and bypassing the derivation of a tolerance. The approach relies on the random forest methodology of Breiman (2001) when applied to regression. We advocate the derivation of a new random forest for each component of the parameter vector, a tool from which an approximation to the marginal posterior distribution can be derived. Correlations between parameter components are handled by separate random forests. This technology offers significant gains in terms of robustness to the choice of the summary statistics and of computing time, when compared with more standard ABC solutions.

Journal ArticleDOI
TL;DR: The authors focus on two flexible models, Bayesian and frequentist, to determine overall effect sizes in network meta-analysis, making the material easy to understand for Korean researchers who did not major in statistics.
Abstract: The objective of this study is to describe the general approaches to network meta-analysis that are available for quantitative data synthesis using R software. We conducted a network meta-analysis using two approaches: Bayesian and frequentist methods. The corresponding R packages were "gemtc" for the Bayesian approach and "netmeta" for the frequentist approach. In estimating a network meta-analysis model using a Bayesian framework, the "rjags" package is a common tool. "rjags" implements Markov chain Monte Carlo simulation with a graphical output. The estimated overall effect sizes, test for heterogeneity, moderator effects, and publication bias were reported using R software. The authors focus on two flexible models, Bayesian and frequentist, to determine overall effect sizes in network meta-analysis. This study focused on the practical methods of network meta-analysis rather than theoretical concepts, making the material easy to understand for Korean researchers who did not major in statistics. The authors hope that this study will help many Korean researchers to perform network meta-analyses and conduct related research more easily with R software.

Journal ArticleDOI
TL;DR: This paper proposes a new Bayesian estimation procedure for (possibly very large) VARs featuring time-varying volatilities and general priors and shows that indeed empirically the new estimation procedure performs well in applications to both structural analysis and out-of-sample forecasting.

Journal ArticleDOI
TL;DR: In this paper, average-case error was proposed in the applied mathematics literature as an alternative criterion with which to assess numerical methods, in contrast to worst case error, this criterion was proposed as a new criterion for numerical methods.
Abstract: Over forty years ago average-case error was proposed in the applied mathematics literature as an alternative criterion with which to assess numerical methods. In contrast to worst-case error, this ...

Journal ArticleDOI
04 Sep 2019-Neuron
TL;DR: It is found that prior statistics warp neural representations in the frontal cortex, allowing the mapping of sensory inputs to motor outputs to incorporate prior statistics in accordance with Bayesian inference.

Journal ArticleDOI
TL;DR: A Bayesian Estimator of Abrupt change, Seasonal change, and Trend (BEAST) is reported, which offers a new analytical option for robust changepoint detection and nonlinear trend analysis and will help exploit environmental time-series data for probing patterns and drivers of ecosystem dynamics.

Journal ArticleDOI
TL;DR: Nested sampling is introduced to phylogenetics and its performance is analysed under different scenarios and compared to established methods to conclude that NS is a competitive and attractive algorithm for phylogenetic inference.
Abstract: Bayesian inference methods rely on numerical algorithms for both model selection and parameter inference. In general, these algorithms require a high computational effort to yield reliable estimates. One of the major challenges in phylogenetics is the estimation of the marginal likelihood. This quantity is commonly used for comparing different evolutionary models, but its calculation, even for simple models, incurs high computational cost. Another interesting challenge relates to the estimation of the posterior distribution. Often, long Markov chains are required to get sufficient samples to carry out parameter inference, especially for tree distributions. In general, these problems are addressed separately by using different procedures. Nested sampling (NS) is a Bayesian computation algorithm, which provides the means to estimate marginal likelihoods together with their uncertainties, and to sample from the posterior distribution at no extra cost. The methods currently used in phylogenetics for marginal likelihood estimation lack in practicality due to their dependence on many tuning parameters and their inability of most implementations to provide a direct way to calculate the uncertainties associated with the estimates, unlike NS. In this article, we introduce NS to phylogenetics. Its performance is analysed under different scenarios and compared to established methods. We conclude that NS is a competitive and attractive algorithm for phylogenetic inference. An implementation is available as a package for BEAST 2 under the LGPL licence, accessible at https://github.com/BEAST2-Dev/nested-sampling.

Posted Content
TL;DR: This paper predicts how certain the model prediction is based on the epistemic and aleatoric uncertainties and empirically shows how the uncertainty can decrease, allowing the decisions made by the network to become more deterministic as the training accuracy increases.
Abstract: Artificial Neural Networks are connectionist systems that perform a given task by learning on examples without having prior knowledge about the task. This is done by finding an optimal point estimate for the weights in every node. Generally, the network using point estimates as weights perform well with large datasets, but they fail to express uncertainty in regions with little or no data, leading to overconfident decisions. In this paper, Bayesian Convolutional Neural Network (BayesCNN) using Variational Inference is proposed, that introduces probability distribution over the weights. Furthermore, the proposed BayesCNN architecture is applied to tasks like Image Classification, Image Super-Resolution and Generative Adversarial Networks. The results are compared to point-estimates based architectures on MNIST, CIFAR-10 and CIFAR-100 datasets for Image CLassification task, on BSD300 dataset for Image Super Resolution task and on CIFAR10 dataset again for Generative Adversarial Network task. BayesCNN is based on Bayes by Backprop which derives a variational approximation to the true posterior. We, therefore, introduce the idea of applying two convolutional operations, one for the mean and one for the variance. Our proposed method not only achieves performances equivalent to frequentist inference in identical architectures but also incorporate a measurement for uncertainties and regularisation. It further eliminates the use of dropout in the model. Moreover, we predict how certain the model prediction is based on the epistemic and aleatoric uncertainties and empirically show how the uncertainty can decrease, allowing the decisions made by the network to become more deterministic as the training accuracy increases. Finally, we propose ways to prune the Bayesian architecture and to make it more computational and time effective.

Posted Content
TL;DR: This paper proposes a new target criterion for model confidence, corresponding to the True Class Probability (TCP), and shows how using the TCP is more suited than relying on the classic maximum class probability (MCP) in the context of failure prediction.
Abstract: Assessing reliably the confidence of a deep neural network and predicting its failures is of primary importance for the practical deployment of these models. In this paper, we propose a new target criterion for model confidence, corresponding to the True Class Probability (TCP). We show how using the TCP is more suited than relying on the classic Maximum Class Probability (MCP). We provide in addition theoretical guarantees for TCP in the context of failure prediction. Since the true class is by essence unknown at test time, we propose to learn TCP criterion on the training set, introducing a specific learning scheme adapted to this context. Extensive experiments are conducted for validating the relevance of the proposed approach. We study various network architectures, small and large scale datasets for image classification and semantic segmentation. We show that our approach consistently outperforms several strong methods, from MCP to Bayesian uncertainty, as well as recent approaches specifically designed for failure prediction.

Journal ArticleDOI
TL;DR: This is an introduction to Bayesian inference with a focus on hierarchical models and hyper-parameters, and includes extensive appendices discussing the creation of credible intervals, Gaussian noise, explicit marginalisation, posterior predictive distributions, and selection effects.
Abstract: This is an introduction to Bayesian inference with a focus on hierarchical models and hyper-parameters. We write primarily for an audience of Bayesian novices, but we hope to provide useful insights for seasoned veterans as well. Examples are drawn from gravitational-wave astronomy, though we endeavour for the presentation to be understandable to a broader audience. We begin with a review of the fundamentals: likelihoods, priors, and posteriors. Next, we discuss Bayesian evidence, Bayes factors, odds ratios, and model selection. From there, we describe how posteriors are estimated using samplers such as Markov Chain Monte Carlo algorithms and nested sampling. Finally, we generalise the formalism to discuss hyper-parameters and hierarchical models. We include extensive appendices discussing the creation of credible intervals, Gaussian noise, explicit marginalisation, posterior predictive distributions, and selection effects.

Journal ArticleDOI
TL;DR: In this article, the authors consider alternate formulations of hierarchical nearest neighbor Gaussian process (NNGP) models for improved convergence, faster computing time, and more robust and reproducibl...
Abstract: We consider alternate formulations of recently proposed hierarchical nearest neighbor Gaussian process (NNGP) models for improved convergence, faster computing time, and more robust and reproducibl...

Journal ArticleDOI
17 Jul 2019
TL;DR: In this article, the authors adopt a Bayesian approach, viewing the observed graph as a realization from a parametric family of random graphs, and target inference of the joint posterior of the random graph parameters and the node (or graph) labels.
Abstract: Recently, techniques for applying convolutional neural networks to graph-structured data have emerged. Graph convolutional neural networks (GCNNs) have been used to address node and graph classification and matrix completion. Although the performance has been impressive, the current implementations have limited capability to incorporate uncertainty in the graph structure. Almost all GCNNs process a graph as though it is a ground-truth depiction of the relationship between nodes, but often the graphs employed in applications are themselves derived from noisy data or modelling assumptions. Spurious edges may be included; other edges may be missing between nodes that have very strong relationships. In this paper we adopt a Bayesian approach, viewing the observed graph as a realization from a parametric family of random graphs. We then target inference of the joint posterior of the random graph parameters and the node (or graph) labels. We present the Bayesian GCNN framework and develop an iterative learning procedure for the case of assortative mixed-membership stochastic block models. We present the results of experiments that demonstrate that the Bayesian formulation can provide better performance when there are very few labels available during the training process.

Journal ArticleDOI
TL;DR: Here, the authors use Bayesian causal modeling and measures of neural activity to show how the brain dynamically codes for and combines sensory signals to draw causal inferences.
Abstract: Transforming the barrage of sensory signals into a coherent multisensory percept relies on solving the binding problem - deciding whether signals come from a common cause and should be integrated or, instead, segregated. Human observers typically arbitrate between integration and segregation consistent with Bayesian Causal Inference, but the neural mechanisms remain poorly understood. Here, we presented people with audiovisual sequences that varied in the number of flashes and beeps, then combined Bayesian modelling and EEG representational similarity analyses. Our data suggest that the brain initially represents the number of flashes and beeps independently. Later, it computes their numbers by averaging the forced-fusion and segregation estimates weighted by the probabilities of common and independent cause models (i.e. model averaging). Crucially, prestimulus oscillatory alpha power and phase correlate with observers' prior beliefs about the world's causal structure that guide their arbitration between sensory integration and segregation.

Proceedings ArticleDOI
15 Jun 2019
TL;DR: In this article, the authors show that the deep image prior is asymptotically equivalent to a stationary Gaussian process prior in the limit as the number of channels in each layer goes to infinity, and derive the corresponding kernel.
Abstract: The deep image prior was recently introduced as a prior for natural images. It represents images as the output of a convolutional network with random inputs. For “inference”, gradient descent is performed to adjust network parameters to make the output match observations. This approach yields good performance on a range of image reconstruction tasks. We show that the deep image prior is asymptotically equivalent to a stationary Gaussian process prior in the limit as the number of channels in each layer of the network goes to infinity, and derive the corresponding kernel. This informs a Bayesian approach to inference. We show that by conducting posterior inference using stochastic gradient Langevin dynamics we avoid the need for early stopping, which is a drawback of the current approach, and improve results for denoising and impainting tasks. We illustrate these intuitions on a number of 1D and 2D signal reconstruction tasks.

Journal ArticleDOI
TL;DR: This work outlines how to fit evidence-accumulation models using the flexible, open-source, R-based Dynamic Models of Choice (DMC) software, and guides the reader through the practical details of a Bayesian hierarchical analysis.
Abstract: Parameter estimation in evidence-accumulation models of choice response times is demanding of both the data and the user. We outline how to fit evidence-accumulation models using the flexible, open-source, R-based Dynamic Models of Choice (DMC) software. DMC provides a hands-on introduction to the Bayesian implementation of two popular evidence-accumulation models: the diffusion decision model (DDM) and the linear ballistic accumulator (LBA). It enables individual and hierarchical estimation, as well as assessment of the quality of a model’s parameter estimates and descriptive accuracy. First, we introduce the basic concepts of Bayesian parameter estimation, guiding the reader through a simple DDM analysis. We then illustrate the challenges of fitting evidence-accumulation models using a set of LBA analyses. We emphasize best practices in modeling and discuss the importance of parameter- and model-recovery simulations, exploring the strengths and weaknesses of models in different experimental designs and parameter regions. We also demonstrate how DMC can be used to model complex cognitive processes, using as an example a race model of the stop-signal paradigm, which is used to measure inhibitory ability. We illustrate the flexibility of DMC by extending this model to account for mixtures of cognitive processes resulting from attention failures. We then guide the reader through the practical details of a Bayesian hierarchical analysis, from specifying priors to obtaining posterior distributions that encapsulate what has been learned from the data. Finally, we illustrate how the Bayesian approach leads to a quantitatively cumulative science, showing how to use posterior distributions to specify priors that can be used to inform the analysis of future experiments.

Journal ArticleDOI
TL;DR: The focus is on meeting challenges that arise from system identification and damage assessment for the civil infrastructure but the presented theories also have a considerably broader applicability for inverse problems in science and technology.
Abstract: Bayesian inference provides a powerful approach to system identification and damage assessment for structures. The application of Bayesian method is motivated by the fact that inverse problems in s...

Proceedings ArticleDOI
01 Oct 2019
TL;DR: A framework for recognizing human actions from skeleton data is proposed by modeling the underlying dynamic process that generates the motion pattern and an adversarial prior is developed to regularize the model parameters to improve the generalization of the model.
Abstract: We propose a framework for recognizing human actions from skeleton data by modeling the underlying dynamic process that generates the motion pattern. We capture three major factors that contribute to the complexity of the motion pattern including spatial dependencies among body joints, temporal dependencies of body poses, and variation among subjects in action execution. We utilize graph convolution to extract structure-aware feature representation from pose data by exploiting the skeleton anatomy. Long short-term memory (LSTM) network is then used to capture the temporal dynamics of the data. Finally, the whole model is extended under the Bayesian framework to a probabilistic model in order to better capture the stochasticity and variation in the data. An adversarial prior is developed to regularize the model parameters to improve the generalization of the model. A Bayesian inference problem is formulated to solve the classification task. We demonstrate the benefit of this framework in several benchmark datasets with recognition under various generalization conditions.

Journal ArticleDOI
TL;DR: In this paper, Bayesian semi-supervised graph convolutional neural networks are used to estimate uncertainty in a statistically principled way through sampling from the posterior distribution, which disentangles representation learning and regression, keeping uncertainty estimates accurate in the low data limit.
Abstract: Predicting bioactivity and physical properties of small molecules is a central challenge in drug discovery. Deep learning is becoming the method of choice but studies to date focus on mean accuracy as the main metric. However, to replace costly and mission-critical experiments by models, a high mean accuracy is not enough: outliers can derail a discovery campaign, thus models need to reliably predict when it will fail, even when the training data is biased; experiments are expensive, thus models need to be data-efficient and suggest informative training sets using active learning. We show that uncertainty quantification and active learning can be achieved by Bayesian semi-supervised graph convolutional neural networks. The Bayesian approach estimates uncertainty in a statistically principled way through sampling from the posterior distribution. Semi-supervised learning disentangles representation learning and regression, keeping uncertainty estimates accurate in the low data limit and allowing the model to start active learning from a small initial pool of training data. Our study highlights the promise of Bayesian deep learning for chemistry.

Journal ArticleDOI
TL;DR: A theoretical and conceptual comparison of nine different shrinkage priors is provided and parametrize the priors, if possible, in terms of scale mixture of normal distributions to facilitate comparisons.

Journal ArticleDOI
TL;DR: This tutorial provides a practical introduction to Bayesian multilevel modeling by reanalyzing a phonetic data set containing formant values for 5 vowels of standard Indonesian, as spoken by 8 speakers, with several repetitions of each vowel.
Abstract: Purpose Bayesian multilevel models are increasingly used to overcome the limitations of frequentist approaches in the analysis of complex structured data. This tutorial introduces Bayesian multilevel modeling for the specific analysis of speech data, using the brms package developed in R. Method In this tutorial, we provide a practical introduction to Bayesian multilevel modeling by reanalyzing a phonetic data set containing formant (F1 and F2) values for 5 vowels of standard Indonesian (ISO 639-3:ind), as spoken by 8 speakers (4 females and 4 males), with several repetitions of each vowel. Results We first give an introductory overview of the Bayesian framework and multilevel modeling. We then show how Bayesian multilevel models can be fitted using the probabilistic programming language Stan and the R package brms, which provides an intuitive formula syntax. Conclusions Through this tutorial, we demonstrate some of the advantages of the Bayesian framework for statistical modeling and provide a detailed c...