scispace - formally typeset
Search or ask a question

Showing papers on "Bayesian inference published in 2018"


Journal ArticleDOI
TL;DR: The software package Tracer is presented, for visualizing and analyzing the MCMC trace files generated through Bayesian phylogenetic inference, which provides kernel density estimation, multivariate visualization, demographic trajectory reconstruction, conditional posterior distribution summary, and more.
Abstract: Bayesian inference of phylogeny using Markov chain Monte Carlo (MCMC) plays a central role in understanding evolutionary history from molecular sequence data. Visualizing and analyzing the MCMC-generated samples from the posterior distribution is a key step in any non-trivial Bayesian inference. We present the software package Tracer (version 1.7) for visualizing and analyzing the MCMC trace files generated through Bayesian phylogenetic inference. Tracer provides kernel density estimation, multivariate visualization, demographic trajectory reconstruction, conditional posterior distribution summary, and more. Tracer is open-source and available at http://beast.community/tracer.

5,492 citations


Journal ArticleDOI
TL;DR: The BEAST software package unifies molecular phylogenetic reconstruction with complex discrete and continuous trait evolution, divergence-time dating, and coalescent demographic models in an efficient statistical inference engine using Markov chain Monte Carlo integration.
Abstract: The Bayesian Evolutionary Analysis by Sampling Trees (BEAST) software package has become a primary tool for Bayesian phylogenetic and phylodynamic inference from genetic sequence data. BEAST unifies molecular phylogenetic reconstruction with complex discrete and continuous trait evolution, divergence-time dating, and coalescent demographic models in an efficient statistical inference engine using Markov chain Monte Carlo integration. A convenient, cross-platform, graphical user interface allows the flexible construction of complex evolutionary analyses.

2,184 citations


Journal ArticleDOI
TL;DR: This part of this series introduces JASP (http://www.jasp-stats.org), an open-source, cross-platform, user-friendly graphical software package that allows users to carry out Bayesian hypothesis tests for standard statistical problems.
Abstract: Bayesian hypothesis testing presents an attractive alternative to p value hypothesis testing. Part I of this series outlined several advantages of Bayesian hypothesis testing, including the ability to quantify evidence and the ability to monitor and update this evidence as data come in, without the need to know the intention with which the data were collected. Despite these and other practical advantages, Bayesian hypothesis tests are still reported relatively rarely. An important impediment to the widespread adoption of Bayesian tests is arguably the lack of user-friendly software for the run-of-the-mill statistical problems that confront psychologists for the analysis of almost every experiment: the t-test, ANOVA, correlation, regression, and contingency tables. In Part II of this series we introduce JASP (http://www.jasp-stats.org), an open-source, cross-platform, user-friendly graphical software package that allows users to carry out Bayesian hypothesis tests for standard statistical problems. JASP is based in part on the Bayesian analyses implemented in Morey and Rouder’s BayesFactor package for R. Armed with JASP, the practical advantages of Bayesian hypothesis testing are only a mouse click away.

1,031 citations


Journal ArticleDOI
TL;DR: Ten prominent advantages of the Bayesian approach are outlined, and several objections to Bayesian hypothesis testing are countered.
Abstract: Bayesian parameter estimation and Bayesian hypothesis testing present attractive alternatives to classical inference using confidence intervals and p values. In part I of this series we outline ten prominent advantages of the Bayesian approach. Many of these advantages translate to concrete opportunities for pragmatic researchers. For instance, Bayesian hypothesis testing allows researchers to quantify evidence and monitor its progression as data come in, without needing to know the intention with which the data were collected. We end by countering several objections to Bayesian hypothesis testing. Part II of this series discusses JASP, a free and open source software program that makes it easy to conduct Bayesian estimation and testing for a range of popular statistical scenarios (Wagenmakers et al. this issue).

940 citations


Proceedings Article
15 Feb 2018
TL;DR: The exact equivalence between infinitely wide deep networks and GPs is derived and it is found that test performance increases as finite-width trained networks are made wider and more similar to a GP, and thus that GP predictions typically outperform those of finite- width networks.
Abstract: It has long been known that a single-layer fully-connected neural network with an i.i.d. prior over its parameters is equivalent to a Gaussian process (GP), in the limit of infinite network width. This correspondence enables exact Bayesian inference for infinite width neural networks on regression tasks by means of evaluating the corresponding GP. Recently, kernel functions which mimic multi-layer random neural networks have been developed, but only outside of a Bayesian framework. As such, previous work has not identified that these kernels can be used as covariance functions for GPs and allow fully Bayesian prediction with a deep neural network. In this work, we derive the exact equivalence between infinitely wide deep networks and GPs. We further develop a computationally efficient pipeline to compute the covariance function for these GPs. We then use the resulting GPs to perform Bayesian inference for wide deep neural networks on MNIST and CIFAR-10. We observe that trained neural network accuracy approaches that of the corresponding GP with increasing layer width, and that the GP uncertainty is strongly correlated with trained network prediction error. We further find that test performance increases as finite-width trained networks are made wider and more similar to a GP, and thus that GP predictions typically outperform those of finite-width networks. Finally we connect the performance of these GPs to the recent theory of signal propagation in random neural networks.

757 citations


Proceedings ArticleDOI
23 Apr 2018
TL;DR: In this article, a variational autoencoder (VAE) was extended to collaborative filtering for implicit feedback, and a generative model with multinomial likelihood and Bayesian inference for parameter estimation was proposed.
Abstract: We extend variational autoencoders (VAEs) to collaborative filtering for implicit feedback. This non-linear probabilistic model enables us to go beyond the limited modeling capacity of linear factor models which still largely dominate collaborative filtering research.We introduce a generative model with multinomial likelihood and use Bayesian inference for parameter estimation. Despite widespread use in language modeling and economics, the multinomial likelihood receives less attention in the recommender systems literature. We introduce a different regularization parameter for the learning objective, which proves to be crucial for achieving competitive performance. Remarkably, there is an efficient way to tune the parameter using annealing. The resulting model and learning algorithm has information-theoretic connections to maximum entropy discrimination and the information bottleneck principle. Empirically, we show that the proposed approach significantly outperforms several state-of-the-art baselines, including two recently-proposed neural network approaches, on several real-world datasets. We also provide extended experiments comparing the multinomial likelihood with other commonly used likelihood functions in the latent factor collaborative filtering literature and show favorable results. Finally, we identify the pros and cons of employing a principled Bayesian inference approach and characterize settings where it provides the most significant improvements.

637 citations


Journal ArticleDOI
TL;DR: In this paper, the authors compare Bayesian and frequentist approaches to hypothesis testing and estimation with confidence or credible intervals, and explain how Bayesian methods achieve the goals of the New Statistics better than frequentist methods.
Abstract: In the practice of data analysis, there is a conceptual distinction between hypothesis testing, on the one hand, and estimation with quantified uncertainty on the other. Among frequentists in psychology, a shift of emphasis from hypothesis testing to estimation has been dubbed "the New Statistics" (Cumming 2014). A second conceptual distinction is between frequentist methods and Bayesian methods. Our main goal in this article is to explain how Bayesian methods achieve the goals of the New Statistics better than frequentist methods. The article reviews frequentist and Bayesian approaches to hypothesis testing and to estimation with confidence or credible intervals. The article also describes Bayesian approaches to meta-analysis, randomized controlled trials, and power analysis.

562 citations


Journal ArticleDOI
TL;DR: This approach achieves state of the art performance in terms of predictive accuracy and uncertainty quantification in comparison to other approaches in Bayesian neural networks as well as techniques that include Gaussian processes and ensemble methods even when the training data size is relatively small.

522 citations


Journal ArticleDOI
TL;DR: Bilby as mentioned in this paper is a user-friendly Bayesian inference library for gravitational-wave astronomy, which provides expert-level parameter estimation infrastructure with straightforward syntax and tools that facilitate use by beginners.
Abstract: Bayesian parameter estimation is fast becoming the language of gravitational-wave astronomy. It is the method by which gravitational-wave data is used to infer the sources' astrophysical properties. We introduce a user-friendly Bayesian inference library for gravitational-wave astronomy, Bilby. This python code provides expert-level parameter estimation infrastructure with straightforward syntax and tools that facilitate use by beginners. It allows users to perform accurate and reliable gravitational-wave parameter estimation on both real, freely-available data from LIGO/Virgo, and simulated data. We provide a suite of examples for the analysis of compact binary mergers and other types of signal model including supernovae and the remnants of binary neutron star mergers. These examples illustrate how to change the signal model, how to implement new likelihood functions, and how to add new detectors. Bilby has additional functionality to do population studies using hierarchical Bayesian modelling. We provide an example in which we infer the shape of the black hole mass distribution from an ensemble of observations of binary black hole mergers.

385 citations


Journal ArticleDOI
TL;DR: This article provides a very basic introduction to MCMC sampling, and describes what MCMC is, and what it can be used for, with simple illustrative examples.
Abstract: Markov Chain Monte–Carlo (MCMC) is an increasingly popular method for obtaining information about distributions, especially for estimating posterior distributions in Bayesian inference. This article provides a very basic introduction to MCMC sampling. It describes what MCMC is, and what it can be used for, with simple illustrative examples. Highlighted are some of the benefits and limitations of MCMC sampling, as well as different approaches to circumventing the limitations most likely to trouble cognitive scientists.

360 citations


Posted Content
TL;DR: In this paper, a variational autoencoder (VAE) was extended to collaborative filtering for implicit feedback, and a generative model with multinomial likelihood and Bayesian inference for parameter estimation was proposed.
Abstract: We extend variational autoencoders (VAEs) to collaborative filtering for implicit feedback. This non-linear probabilistic model enables us to go beyond the limited modeling capacity of linear factor models which still largely dominate collaborative filtering research.We introduce a generative model with multinomial likelihood and use Bayesian inference for parameter estimation. Despite widespread use in language modeling and economics, the multinomial likelihood receives less attention in the recommender systems literature. We introduce a different regularization parameter for the learning objective, which proves to be crucial for achieving competitive performance. Remarkably, there is an efficient way to tune the parameter using annealing. The resulting model and learning algorithm has information-theoretic connections to maximum entropy discrimination and the information bottleneck principle. Empirically, we show that the proposed approach significantly outperforms several state-of-the-art baselines, including two recently-proposed neural network approaches, on several real-world datasets. We also provide extended experiments comparing the multinomial likelihood with other commonly used likelihood functions in the latent factor collaborative filtering literature and show favorable results. Finally, we identify the pros and cons of employing a principled Bayesian inference approach and characterize settings where it provides the most significant improvements.

Proceedings Article
01 Jan 2018
TL;DR: The proposed method combines scalable gradient-based meta-learning with nonparametric variational inference in a principled probabilistic framework and is capable of learning complex uncertainty structure beyond a point estimate or a simple Gaussian approximation during fast adaptation.
Abstract: Due to the inherent model uncertainty, learning to infer Bayesian posterior from a few-shot dataset is an important step towards robust meta-learning. In this paper, we propose a novel Bayesian model-agnostic meta-learning method. The proposed method combines efficient gradient-based meta-learning with nonparametric variational inference in a principled probabilistic framework. Unlike previous methods, during fast adaptation, the method is capable of learning complex uncertainty structure beyond a simple Gaussian approximation, and during meta-update, a novel Bayesian mechanism prevents meta-level overfitting. Remaining a gradient-based method, it is also the first Bayesian model-agnostic meta-learning method applicable to various tasks including reinforcement learning. Experiment results show the accuracy and robustness of the proposed method in sinusoidal regression, image classification, active learning, and reinforcement learning.

Journal ArticleDOI
TL;DR: RadVel as discussed by the authors is an open-source Python package for modeling Keplerian orbits in radial velocity (RV) timeseries, which allows users to float or fix parameters, impose priors, and perform Bayesian model comparison.
Abstract: RadVel is an open-source Python package for modeling Keplerian orbits in radial velocity (RV) timeseries. RadVel provides a convenient framework to fit RVs using maximum a posteriori optimization and to compute robust confidence intervals by sampling the posterior probability density via Markov Chain Monte Carlo (MCMC). RadVel allows users to float or fix parameters, impose priors, and perform Bayesian model comparison. We have implemented real-time MCMC convergence tests to ensure adequate sampling of the posterior. RadVel can output a number of publication-quality plots and tables. Users may interface with RadVel through a convenient command-line interface or directly from Python. The code is object-oriented and thus naturally extensible. We encourage contributions from the community. Documentation is available at http://radvel.readthedocs.io.

Journal ArticleDOI
TL;DR: An applied introduction to Bayesian inference with Bayes factors using JASP provides a straightforward means of performing reproducible Bayesian hypothesis tests using a graphical “point and click” environment that will be familiar to researchers conversant with other graphical statistical packages, such as SPSS.
Abstract: Despite its popularity as an inferential framework, classical null hypothesis significance testing (NHST) has several restrictions. Bayesian analysis can be used to complement NHST, however, this approach has been underutilized largely due to a dearth of accessible software options. JASP is a recently developed open-source statistical package that facilitates both Bayesian and NHST analysis using a graphical interface. This article provides an applied introduction to Bayesian inference with Bayes factors using JASP. We use JASP to compare and contrast Bayesian alternatives for several common classical null hypothesis significance tests: correlations, frequency distributions, t-tests, ANCOVAs, and ANOVAs. These examples are also used to illustrate the strengths and limitations of both NHST and Bayesian hypothesis testing. A comparison of NHST and Bayesian inferential frameworks demonstrates that Bayes factors can complement p-values by providing additional information for hypothesis testing. Namely, Bayes factors can quantify relative evidence for both alternative and null hypotheses. Moreover, the magnitude of this evidence can be presented as an easy-to-interpret odds ratio. While Bayesian analysis is by no means a new method, this type of statistical inference has been largely inaccessible for most psychiatry researchers. JASP provides a straightforward means of performing reproducible Bayesian hypothesis tests using a graphical “point and click” environment that will be familiar to researchers conversant with other graphical statistical packages, such as SPSS.

Journal ArticleDOI
TL;DR: In this paper, a Bayesian asset-pricing test is developed that is easily computed in closed-form from the standard F-statistic, and this test can be adapted to permit an analysis of Bayesian model comparison, i.e., the computation of model probabilities for the collection of all possible pricing models that are based on subsets of the given factors.
Abstract: A Bayesian asset-pricing test is developed that is easily computed in closed-form from the standard F-statistic. Given a set of candidate traded factors, we show how this test can be adapted to permit an analysis of Bayesian model comparison, i.e., the computation of model probabilities for the collection of all possible pricing models that are based on subsets of the given factors. We find that the recent q-factor model is superior to the Fama-French three-factor model augmented by profitability and net investment factors, but both models are dominated by five or six-factor models that include a momentum factor and value and profitability factors that are updated monthly. Thus, although the standard value factor is redundant, our tests show that a version that incorporates more timely price information is not.

Posted Content
TL;DR: This paper is an attempt to bridge the conceptual gaps between researchers working on the two widely used approaches based on positive definite kernels: Bayesian learning or inference using Gaussian processes on the one side, and frequentist kernel methods based on reproducing kernel Hilbert spaces on the other.
Abstract: This paper is an attempt to bridge the conceptual gaps between researchers working on the two widely used approaches based on positive definite kernels: Bayesian learning or inference using Gaussian processes on the one side, and frequentist kernel methods based on reproducing kernel Hilbert spaces on the other. It is widely known in machine learning that these two formalisms are closely related; for instance, the estimator of kernel ridge regression is identical to the posterior mean of Gaussian process regression. However, they have been studied and developed almost independently by two essentially separate communities, and this makes it difficult to seamlessly transfer results between them. Our aim is to overcome this potential difficulty. To this end, we review several old and new results and concepts from either side, and juxtapose algorithmic quantities from each framework to highlight close similarities. We also provide discussions on subtle philosophical and theoretical differences between the two approaches.

Journal ArticleDOI
TL;DR: PhyloNet now allows for maximum parsimony, maximum likelihood, and Bayesian inference of phylogenetic networks from gene tree estimates and summarizes the results of the various analyzes and generates phylogenetics networks in the extended Newick format that is readily viewable by existing visualization software.
Abstract: PhyloNet was released in 2008 as a software package for representing and analyzing phylogenetic networks. At the time of its release, the main functionalities in PhyloNet consisted of measures for comparing network topologies and a single heuristic for reconciling gene trees with a species tree. Since then, PhyloNet has grown significantly. The software package now includes a wide array of methods for inferring phylogenetic networks from data sets of unlinked loci while accounting for both reticulation (e.g., hybridization) and incomplete lineage sorting. In particular, PhyloNet now allows for maximum parsimony, maximum likelihood, and Bayesian inference of phylogenetic networks from gene tree estimates. Furthermore, Bayesian inference directly from sequence data (sequence alignments or biallelic markers) is implemented. Maximum parsimony is based on an extension of the "minimizing deep coalescences" criterion to phylogenetic networks, whereas maximum likelihood and Bayesian inference are based on the multispecies network coalescent. All methods allow for multiple individuals per species. As computing the likelihood of a phylogenetic network is computationally hard, PhyloNet allows for evaluation and inference of networks using a pseudolikelihood measure. PhyloNet summarizes the results of the various analyzes and generates phylogenetic networks in the extended Newick format that is readily viewable by existing visualization software.

Journal ArticleDOI
TL;DR: In this article, the authors present an introduction to Bayesian inference with a focus on hierarchical models and hyper-parameters, and describe how posteriors are estimated using samplers such as Markov Chain Monte Carlo algorithms and nested sampling.
Abstract: This is an introduction to Bayesian inference with a focus on hierarchical models and hyper-parameters. We write primarily for an audience of Bayesian novices, but we hope to provide useful insights for seasoned veterans as well. Examples are drawn from gravitational-wave astronomy, though we endeavor for the presentation to be understandable to a broader audience. We begin with a review of the fundamentals: likelihoods, priors, and posteriors. Next, we discuss Bayesian evidence, Bayes factors, odds ratios, and model selection. From there, we describe how posteriors are estimated using samplers such as Markov Chain Monte Carlo algorithms and nested sampling. Finally, we generalize the formalism to discuss hyper-parameters and hierarchical models. We include extensive appendices discussing the creation of credible intervals, Gaussian noise, explicit marginalization, posterior predictive distributions, and selection effects.

Posted Content
TL;DR: Sequential Neural Likelihood (SNL) as discussed by the authors trains an autoregressive flow on simulated data in order to learn a model of the likelihood in the region of high posterior density, which reduces simulation cost by orders of magnitude.
Abstract: We present Sequential Neural Likelihood (SNL), a new method for Bayesian inference in simulator models, where the likelihood is intractable but simulating data from the model is possible. SNL trains an autoregressive flow on simulated data in order to learn a model of the likelihood in the region of high posterior density. A sequential training procedure guides simulations and reduces simulation cost by orders of magnitude. We show that SNL is more robust, more accurate and requires less tuning than related neural-based methods, and we discuss diagnostics for assessing calibration, convergence and goodness-of-fit.

Posted Content
TL;DR: It is argued that SBC is a critical part of a robust Bayesian workflow, as well as being a useful tool for those developing computational algorithms and statistical software.
Abstract: Verifying the correctness of Bayesian computation is challenging. This is especially true for complex models that are common in practice, as these require sophisticated model implementations and algorithms. In this paper we introduce \emph{simulation-based calibration} (SBC), a general procedure for validating inferences from Bayesian algorithms capable of generating posterior samples. This procedure not only identifies inaccurate computation and inconsistencies in model implementations but also provides graphical summaries that can indicate the nature of the problems that arise. We argue that SBC is a critical part of a robust Bayesian workflow, as well as being a useful tool for those developing computational algorithms and statistical software.

Journal ArticleDOI
TL;DR: In this paper, a Markov chain Monte Carlo (MCMC) method is proposed for high-dimensional models that are log-concave and nonsmooth, a class of models that is central in imaging sciences.
Abstract: Modern imaging methods rely strongly on Bayesian inference techniques to solve challenging imaging problems. Currently, the predominant Bayesian computation approach is convex optimization, which scales very efficiently to high-dimensional image models and delivers accurate point estimation results. However, in order to perform more complex analyses, for example, image uncertainty quantification or model selection, it is necessary to use more computationally intensive Bayesian computation techniques such as Markov chain Monte Carlo methods. This paper presents a new and highly efficient Markov chain Monte Carlo methodology to perform Bayesian computation for high-dimensional models that are log-concave and nonsmooth, a class of models that is central in imaging sciences. The methodology is based on a regularized unadjusted Langevin algorithm that exploits tools from convex analysis, namely, Moreau--Yoshida envelopes and proximal operators, to construct Markov chains with favorable convergence properties. ...

Journal ArticleDOI
TL;DR: Bayesian model averaging (BMA) provides a coherent and systematic mechanism for accounting for model uncertainty as discussed by the authors, which can be regarded as an direct application of Bayesian inference to the problem of model selection, combined estimation and prediction.
Abstract: Bayesian model averaging (BMA) provides a coherent and systematic mechanism for accounting for model uncertainty. It can be regarded as an direct application of Bayesian inference to the problem of model selection, combined estimation and prediction. BMA produces a straightforward model choice criterion and less risky predictions. However, the application of BMA is not always straightforward, leading to diverse assumptions and situational choices on its different aspects. Despite the widespread application of BMA in the literature, there were not many accounts of these differences and trends besides a few landmark revisions in the late 1990s and early 2000s, therefore not accounting for advancements made in the last decades. In this work, we present an account of these developments through a careful content analysis of 820 articles in BMA published between 1996 and 2016. We also develop a conceptual classification scheme to better describe this vast literature, understand its trends and future directions and provide guidance for the researcher interested in both the application and development of the methodology. The results of the classification scheme and content review are then used to discuss the present and future of the BMA literature.

Journal ArticleDOI
TL;DR: In this paper, a Bayesian hierarchical model of the full distance ladder was developed to estimate the Hubble constant from the local distance ladder and from the cosmic microwave background (CMB).
Abstract: Estimates of the Hubble constant, H0, from the local distance ladder and from the cosmic microwave background (CMB) are discrepant at the ∼3σ level, indicating a potential issue with the standard Λ cold dark matter (ΛCDM) cosmology. A probabilistic (i.e. Bayesian) interpretation of this tension requires a model comparison calculation, which in turn depends strongly on the tails of the H0 likelihoods. Evaluating the tails of the local H0 likelihood requires the use of non-Gaussian distributions to faithfully represent anchor likelihoods and outliers, and simultaneous fitting of the complete distance-ladder data set to ensure correct uncertainty propagation. We have hence developed a Bayesian hierarchical model of the full distance ladder that does not rely on Gaussian distributions and allows outliers to be modelled without arbitrary data cuts. Marginalizing over the full ∼3000-parameter joint posterior distribution, we find H0 = (72.72 ± 1.67) km s−1 Mpc−1 when applied to the outlier-cleaned Riess et al. data, and (73.15 ± 1.78) km s−1 Mpc−1 with supernova outliers reintroduced (the pre-cut Cepheid data set is not available). Using our precise evaluation of the tails of the H0 likelihood, we apply Bayesian model comparison to assess the evidence for deviation from ΛCDM given the distance-ladder and CMB data. The odds against ΛCDM are at worst ∼10:1 when considering the Planck 2015 XIII data, regardless of outlier treatment, considerably less dramatic than naively implied by the 2.8σ discrepancy. These odds become ∼60:1 when an approximation to the more-discrepant Planck Intermediate XLVI likelihood is included.

Journal ArticleDOI
TL;DR: This review synthesizes existing literature to guide ecologists through the many available options for Bayesian model checking and concludes that model checking is an essential component of scientific discovery and learning that should accompany most Bayesian analyses presented in the literature.
Abstract: Checking that models adequately represent data is an essential component of applied statistical inference. Ecologists increasingly use hierarchical Bayesian statistical models in their research. The appeal of this modeling paradigm is undeniable, as researchers can build and fit models that embody complex ecological processes while simultaneously controlling observation error. However, ecologists tend to be less focused on checking model assumptions and assessing potential lack-of-fit when applying Bayesian methods than when applying more traditional modes of inference such as maximum likelihood. There are also multiple ways of assessing the fit of Bayesian models, each of which has strengths and weaknesses. For instance, Bayesian p-values are relatively easy to compute, but are well known to be conservative, producing p-values biased toward 0.5. Alternatively, lesser known approaches to model checking, such as prior predictive checks, cross-validation probability integral transforms, and pivot discrepancy measures may produce more accurate characterizations of goodness-of-fit but are not as well known to ecologists. In addition, a suite of visual and targeted diagnostics can be used to examine violations of different model assumptions and lack-of-fit at different levels of the modeling hierarchy, and to check for residual temporal or spatial autocorrelation. In this review, we synthesize existing literature to guide ecologists through the many available options for Bayesian model checking. We illustrate methods and procedures with several ecological case studies, including i) analysis of simulated spatio-temporal count data, (ii) N-mixture models for estimating abundance and detection probability of sea otters from an aircraft, and (iii) hidden Markov modeling to describe attendance patterns of California sea lion mothers on a rookery. We find that commonly used procedures based on posterior predictive p-values detect extreme model inadequacy, but often do not detect more subtle cases of lack of fit. Tests based on cross-validation and pivot discrepancy measures (including the ``sampled predictive p-value'') appear to be better suited to model checking and to have better overall statistical performance. We conclude that model checking is an essential component of scientific discovery and learning that should accompany most Bayesian analyses presented in the literature.

Journal ArticleDOI
TL;DR: This work presents a Bayesian approach to jointly infer species networks and gene trees from multilocus sequence data, and provides an extensible framework for Bayesian inference of reticulate evolution.
Abstract: Reticulate species evolution, such as hybridization or introgression, is relatively common in nature. In the presence of reticulation, species relationships can be captured by a rooted phylogenetic network, and orthologous gene evolution can be modeled as bifurcating gene trees embedded in the species network. We present a Bayesian approach to jointly infer species networks and gene trees from multilocus sequence data. A novel birth-hybridization process is used as the prior for the species network, and we assume a multispecies network coalescent prior for the embedded gene trees. We verify the ability of our method to correctly sample from the posterior distribution, and thus to infer a species network, through simulations. To quantify the power of our method, we reanalyze two large data sets of genes from spruces and yeasts. For the three closely related spruces, we verify the previously suggested homoploid hybridization event in this clade; for the yeast data, we find extensive hybridization events. Our method is available within the BEAST 2 add-on SpeciesNetwork, and thus provides an extensible framework for Bayesian inference of reticulate evolution.

Posted Content
TL;DR: This work reformulates the model-agnostic meta-learning algorithm (MAML) of Finn et al. (2017) as a method for probabilistic inference in a hierarchical Bayesian model and proposes an improvement to the MAML algorithm that makes use of techniques from approximate inference and curvature estimation.
Abstract: Meta-learning allows an intelligent agent to leverage prior learning episodes as a basis for quickly improving performance on a novel task. Bayesian hierarchical modeling provides a theoretical framework for formalizing meta-learning as inference for a set of parameters that are shared across tasks. Here, we reformulate the model-agnostic meta-learning algorithm (MAML) of Finn et al. (2017) as a method for probabilistic inference in a hierarchical Bayesian model. In contrast to prior methods for meta-learning via hierarchical Bayes, MAML is naturally applicable to complex function approximators through its use of a scalable gradient descent procedure for posterior inference. Furthermore, the identification of MAML as hierarchical Bayes provides a way to understand the algorithm's operation as a meta-learning procedure, as well as an opportunity to make use of computational strategies for efficient inference. We use this opportunity to propose an improvement to the MAML algorithm that makes use of techniques from approximate inference and curvature estimation.

Journal ArticleDOI
TL;DR: It is argued that the form of the generative models required for inference constrains the way in which brain regions connect to one another, and is illustrated in four different domains: perception, planning, attention, and movement.
Abstract: To infer the causes of its sensations, the brain must call on a generative (predictive) model. This necessitates passing local messages between populations of neurons to update beliefs about hidden variables in the world beyond its sensory samples. It also entails inferences about how we will act. Active inference is a principled framework that frames perception and action as approximate Bayesian inference. This has been successful in accounting for a wide range of physiological and behavioural phenomena. Recently, a process theory has emerged that attempts to relate inferences to their neurobiological substrates. In this paper, we review and develop the anatomical aspects of this process theory. We argue that the form of the generative models required for inference constrains the way in which brain regions connect to one another. Specifically, neuronal populations representing beliefs about a variable must receive input from populations representing the Markov blanket of that variable. We illustrate this idea in four different domains: perception, planning, attention, and movement. In doing so, we attempt to show how appealing to generative models enables us to account for anatomical brain architectures. Ultimately, committing to an anatomical theory of inference ensures we can form empirical hypotheses that can be tested using neuroimaging, neuropsychological, and electrophysiological experiments.

Journal ArticleDOI
TL;DR: The data-driven prediction of dynamics with error bars using discovered governing physical laws is more accurate and robust than classical polynomial regressions.
Abstract: Discovering governing physical laws from noisy data is a grand challenge in many science and engineering research areas. We present a new approach to data-driven discovery of ordinary differential equations (ODEs) and partial differential equations (PDEs), in explicit or implicit form. We demonstrate our approach on a wide range of problems, including shallow water equations and Navier–Stokes equations. The key idea is to select candidate terms for the underlying equations using dimensional analysis, and to approximate the weights of the terms with error bars using our threshold sparse Bayesian regression. This new algorithm employs Bayesian inference to tune the hyperparameters automatically. Our approach is effective, robust and able to quantify uncertainties by providing an error bar for each discovered candidate equation. The effectiveness of our algorithm is demonstrated through a collection of classical ODEs and PDEs. Numerical experiments demonstrate the robustness of our algorithm with respect to noisy data and its ability to discover various candidate equations with error bars that represent the quantified uncertainties. Detailed comparisons with the sequential threshold least-squares algorithm and the lasso algorithm are studied from noisy time-series measurements and indicate that the proposed method provides more robust and accurate results. In addition, the data-driven prediction of dynamics with error bars using discovered governing physical laws is more accurate and robust than classical polynomial regressions.

Journal ArticleDOI
TL;DR: The fundamental tenets of Bayesian inference are introduced, which derive from two basic laws of probability theory, and the interpretation of probabilities, discrete and continuous versions of Bayes’ rule, parameter estimation, and model comparison are covered.
Abstract: We introduce the fundamental tenets of Bayesian inference, which derive from two basic laws of probability theory. We cover the interpretation of probabilities, discrete and continuous versions of Bayes’ rule, parameter estimation, and model comparison. Using seven worked examples, we illustrate these principles and set up some of the technical background for the rest of this special issue of Psychonomic Bulletin & Review. Supplemental material is available via https://osf.io/wskex/.

Journal ArticleDOI
TL;DR: The correlated pseudomarginal method (CSM) as discussed by the authors is a modification of the pseudo-argininal method using a likelihood ratio estimator computed by using two correlated likelihood estimators.
Abstract: The pseudomarginal algorithm is a Metropolis–Hastings‐type scheme which samples asymptotically from a target probability density when we can only estimate unbiasedly an unnormalized version of it. In a Bayesian context, it is a state of the art posterior simulation technique when the likelihood function is intractable but can be estimated unbiasedly by using Monte Carlo samples. However, for the performance of this scheme not to degrade as the number T of data points increases, it is typically necessary for the number N of Monte Carlo samples to be proportional to T to control the relative variance of the likelihood ratio estimator appearing in the acceptance probability of this algorithm. The correlated pseudomarginal method is a modification of the pseudomarginal method using a likelihood ratio estimator computed by using two correlated likelihood estimators. For random‐effects models, we show under regularity conditions that the parameters of this scheme can be selected such that the relative variance of this likelihood ratio estimator is controlled when N increases sublinearly with T and we provide guidelines on how to optimize the algorithm on the basis of a non‐standard weak convergence analysis. The efficiency of computations for Bayesian inference relative to the pseudomarginal method empirically increases with T and exceeds two orders of magnitude in some examples.