scispace - formally typeset
Search or ask a question

Showing papers on "Bayesian inference published in 2013"


Journal ArticleDOI
TL;DR: Stochastic variational inference lets us apply complex Bayesian models to massive data sets, and it is shown that the Bayesian nonparametric topic model outperforms its parametric counterpart.
Abstract: We develop stochastic variational inference, a scalable algorithm for approximating posterior distributions. We develop this technique for a large class of probabilistic models and we demonstrate it with two probabilistic topic models, latent Dirichlet allocation and the hierarchical Dirichlet process topic model. Using stochastic variational inference, we analyze several large collections of documents: 300K articles from Nature, 1.8M articles from The New York Times, and 3.8M articles from Wikipedia. Stochastic inference can easily handle data sets of this size and outperforms traditional variational inference, which can only handle a smaller subset. (We also show that the Bayesian nonparametric topic model outperforms its parametric counterpart.) Stochastic variational inference lets us apply complex Bayesian models to massive data sets.

2,291 citations


Posted Content
TL;DR: Expectation Propagation (EP) as mentioned in this paper is a deterministic approximation technique in Bayesian networks that unifies two previous techniques: assumed-density filtering, an extension of the Kalman filter, and loopy belief propagation.
Abstract: This paper presents a new deterministic approximation technique in Bayesian networks. This method, "Expectation Propagation", unifies two previous techniques: assumed-density filtering, an extension of the Kalman filter, and loopy belief propagation, an extension of belief propagation in Bayesian networks. All three algorithms try to recover an approximate distribution which is close in KL divergence to the true distribution. Loopy belief propagation, because it propagates exact belief states, is useful for a limited class of belief networks, such as those which are purely discrete. Expectation Propagation approximates the belief states by only retaining certain expectations, such as mean and variance, and iterates until these expectations are consistent throughout the network. This makes it applicable to hybrid networks with discrete and continuous nodes. Expectation Propagation also extends belief propagation in the opposite direction - it can propagate richer belief states that incorporate correlations between nodes. Experiments with Gaussian mixture models show Expectation Propagation to be convincingly better than methods with similar computational cost: Laplace's method, variational Bayes, and Monte Carlo. Expectation Propagation also provides an efficient algorithm for training Bayes point machine classifiers.

1,365 citations


Journal ArticleDOI
TL;DR: Bayesian estimation for 2 groups provides complete distributions of credible values for the effect size, group means and their difference, standard deviations and their Difference, and the normality of the data.
Abstract: Bayesian estimation for 2 groups provides complete distributions of credible values for the effect size, group means and their difference, standard deviations and their difference, and the normality of the data. The method handles outliers. The decision rule can accept the null value (unlike traditional t tests) when certainty in the estimate is high (unlike Bayesian model comparison using Bayes factors). The method also yields precise estimates of statistical power for various research goals. The software and programs are free and run on Macintosh, Windows, and Linux platforms.

1,214 citations


Book
03 Feb 2013
TL;DR: In this paper, the authors present a survey of Bayesian methods for health-care evaluation, focusing on the following: 1.1 What is probability? 2.2 Random variables, parameters and likelihood.
Abstract: Preface. List of examples. 1. Introduction. 1.1 What are Bayesian methods? 1.2 What do we mean by 'health--care evaluation'? 1.3 A Bayesian approach to evaluation. 1.4 The aim of this book and the intended audience. 1.5 Structure of the book. 2. Basic Concepts from Traditional Statistical Analysis. 2.1 Probability. 2.1.1 What is probability? 2.1.2 Odds and log--odds. 2.1.3 Bayes theorem for simple events. 2.2 Random variables, parameters and likelihood. 2.2.1 Random variables and their distributions. 2.2.2 Expectation, variance, covariance and correlation. 2.2.3 Parametric distributions and conditional independence. 2.2.4 Likelihoods. 2.3 The normal distribution. 2.4 Normal likelihoods. 2.4.1 Normal approximations for binary data. 2.4.2 Normal likelihoods for survival data. 2.4.3 Normal likelihoods for count responses. 2.4.4 Normal likelihoods for continuous responses. 2.5 Classical inference. 2.6 A catalogue of useful distributionsaeo. 2.6.1 Binomial and Bernoulli. 2.6.2 Poisson. 2.6.3 Beta. 2.6.4 Uniform. 2.6.5 Gamma. 2.6.6 Root--inverse--gamma. 2.6.7 Half--normal. 2.6.8 Log--normal. 2.6.9 Student's t. 2.6.10 Bivariate normal. 2.7 Key points. Exercises. 3. An Overview of the Bayesian Approach. 3.1 Subjectivity and context. 3.2 Bayes theorem for two hypotheses. 3.3 Comparing simple hypotheses: likelihood ratios and Bayes factors. 3.4 Exchangeability and parametric modellingaeo. 3.5 Bayes theorem for general quantities. 3.6 Bayesian analysis with binary data. 3.6.1 Binary data with a discrete prior distribution. 3.6.2 Conjugate analysis for binary data. 3.7 Bayesian analysis with normal distributions. 3.8 Point estimation, interval estimation and interval hypotheses. 3.9 The prior distribution. 3.10 How to use Bayes theorem to interpret trial results. 3.11 The 'credibility' of significant trial resultsaeo. 3.12 Sequential use of Bayes theoremaeo. 3.13 Predictions. 3.13.1 Predictions in the Bayesian framework. 3.13.2 Predictions for binary dataaeo. 3.13.3 Predictions for normal data. 3.14 Decision--making. 3.15 Design. 3.16 Use of historical data. 3.17 Multiplicity, exchangeability and hierarchical models. 3.18 Dealing with nuisance parametersaeo. 3.18.1 Alternative methods for eliminating nuisance parametersaeo. 3.18.2 Profile likelihood in a hierarchical modelaeo. 3.19 Computational issues. 3.19.1 Monte Carlo methods. 3.19.2 Markov chain Monte Carlo methods. 3.19.3 WinBUGS. 3.20 Schools of Bayesians. 3.21 A Bayesian checklist. 3.22 Further reading. 3.23 Key points. Exercises. 4. Comparison of Alternative Approaches to Inference. 4.1 A structure for alternative approaches. 4.2 Conventional statistical methods used in health--care evaluation. 4.3 The likelihood principle, sequential analysis and types of error. 4.3.1 The likelihood principle. 4.3.2 Sequential analysis. 4.3.3 Type I and Type II error. 4.4 P--values and Bayes factorsaeo. 4.4.1 Criticism of P--values. 4.4.2 Bayes factors as an alternative to P--values: simple hypotheses. 4.4.3 Bayes factors as an alternative to P--values: composite hypotheses. 4.4.4 Bayes factors in preference studies. 4.4.5 Lindley's paradox. 4.5 Key points. Exercises. 5. Prior Distributions. 5.1 Introduction. 5.2 Elicitation of opinion: a brief review. 5.2.1 Background to elicitation. 5.2.2 Elicitation techniques. 5.2.3 Elicitation from multiple experts. 5.3 Critique of prior elicitation. 5.4 Summary of external evidenceaeo. 5.5 Default priors. 5.5.1 'Non--informative' or 'reference' priors: 5.5.2 'Sceptical' priors. 5.5.3 'Enthusiastic' priors. 5.5.4 Priors with a point mass at the null hypothesis ('lump--and--smear' priors)aeo. 5.6 Sensitivity analysis and 'robust' priors. 5.7 Hierarchical priors. 5.7.1 The judgement of exchangeability. 5.7.2 The form for the random--effects distribution. 5.7.3 The prior for the standard deviation of the random effectsaeo. 5.8 Empirical criticism of priors. 5.9 Key points. Exercises. 6. Randomised Controlled Trials. 6.1 Introduction. 6.2 Use of a loss function: is a clinical trial for inference or decision? 6.3 Specification of null hypotheses. 6.4 Ethics and randomisation: a brief review. 6.4.1 Is randomisation necessary? 6.4.2 When is it ethical to randomise? 6.5 Sample size of non--sequential trials. 6.5.1 Alternative approaches to sample--size assessment. 6.5.2 'Classical power': hybrid classical--Bayesian methods assuming normality. 6.5.3 'Bayesian power'. 6.5.4 Adjusting formulae for different hypotheses. 6.5.5 Predictive distribution of power and necessary sample size. 6.6 Monitoring of sequential trials. 6.6.1 Introduction. 6.6.2 Monitoring using the posterior distribution. 6.6.3 Monitoring using predictions: 'interim power'. 6.6.4 Monitoring using a formal loss function. 6.6.5 Frequentist properties of sequential Bayesian methods. 6.6.6 Bayesian methods and data monitoring committees. 6.7 The role of 'scepticism' in confirmatory studies. 6.8 Multiplicity in randomised trials. 6.8.1 Subset analysis. 6.8.2 Multi--centre analysis. 6.8.3 Cluster randomization. 6.8.4 Multiple endpoints and treatments. 6.9 Using historical controlsaeo. 6.10 Data--dependent allocation. 6.11 Trial designs other than two parallel groups. 6.12 Other aspects of drug development. 6.13 Further reading. 6.14 Key points. Exercises. 7. Observational Studies. 7.1 Introduction. 7.2 Alternative study designs. 7.3 Explicit modelling of biases. 7.4 Institutional comparisons. 7.5 Key points. Exercises. 8. Evidence Synthesis. 8.1 Introduction. 8.2 'Standard' meta--analysis. 8.2.1 A Bayesian perspective. 8.2.2 Some delicate issues in Bayesian meta--analysis. 8.2.3 The relationship between treatment effect and underlying risk. 8.3 Indirect comparison studies. 8.4 Generalised evidence synthesis. 8.5 Further reading. 8.6 Key points. Exercises. 9. Cost--effectiveness, Policy--Making and Regulation. 9.1 Introduction. 9.2 Contexts. 9.3 'Standard' cost--effectiveness analysis without uncertainty. 9.4 'Two--stage' and integrated approaches to uncertainty in cost--effectiveness modeling. 9.5 Probabilistic analysis of sensitivity to uncertainty about parameters: two--stage approach. 9.6 Cost--effectiveness analyses of a single study: integrated approach. 9.7 Levels of uncertainty in cost--effectiveness models. 9.8 Complex cost--effectiveness models. 9.8.1 Discrete--time, discrete--state Markov models. 9.8.2 Micro--simulation in cost--effectiveness models. 9.8.3 Micro--simulation and probabilistic sensitivity analysis. 9.8.4 Comprehensive decision modeling. 9.9 Simultaneous evidence synthesis and complex cost--effectiveness modeling. 9.9.1 Generalised meta--analysis of evidence. 9.9.2 Comparison of integrated Bayesian and two--stage approach. 9.10 Cost--effectiveness of carrying out research: payback models. 9.10.1 Research planning in the public sector. 9.10.2 Research planning in the pharmaceutical industry. 9.10.3 Value of information. 9.11 Decision theory in cost--effectiveness analysis, regulation and policy. 9.12 Regulation and health policy. 9.12.1 The regulatory context. 9.12.2 Regulation of pharmaceuticals. 9.12.3 Regulation of medical devices. 9.13 Conclusions. 9.14 Key points. Exercises. 10. Conclusions and Implications for Future Research. 10.1 Introduction. 10.2 General advantages and problems of a Bayesian approach. 10.3 Future research and development. Appendix: Websites and Software. A.1 The site for this book. A.2 Bayesian methods in health--care evaluation. A.3 Bayesian software. A.4 General Bayesian sites. References. Index.

1,038 citations


Journal ArticleDOI
TL;DR: A new data-augmentation strategy for fully Bayesian inference in models with binomial likelihoods is proposed, which appeals to a new class of Pólya–Gamma distributions, which are constructed in detail.
Abstract: We propose a new data-augmentation strategy for fully Bayesian inference in models with binomial likelihoods. The approach appeals to a new class of Polya–Gamma distributions, which are constructed in detail. A variety of examples are presented to show the versatility of the method, including logistic regression, negative binomial regression, nonlinear mixed-effect models, and spatial models for count data. In each case, our data-augmentation strategy leads to simple, effective methods for posterior inference that (1) circumvent the need for analytic approximations, numerical integration, or Metropolis–Hastings; and (2) outperform other known data-augmentation strategies, both in ease of use and in computational efficiency. All methods, including an efficient sampler for the Polya–Gamma distribution, are implemented in the R package BayesLogit. Supplementary materials for this article are available online.

805 citations


Book
02 Dec 2013
TL;DR: Uncertainty Quantification: Theory, Implementation, and Applications provides readers with the basic concepts, theory, and algorithms necessary to quantify input and response uncertainties for simulation models arising in a broad range of disciplines.
Abstract: The field of uncertainty quantification is evolving rapidly because of increasing emphasis on models that require quantified uncertainties for large-scale applications, novel algorithm development, and new computational architectures that facilitate implementation of these algorithms. Uncertainty Quantification: Theory, Implementation, and Applications provides readers with the basic concepts, theory, and algorithms necessary to quantify input and response uncertainties for simulation models arising in a broad range of disciplines. The book begins with a detailed discussion of applications where uncertainty quantification is critical for both scientific understanding and policy. It then covers concepts from probability and statistics, parameter selection techniques, frequentist and Bayesian model calibration, propagation of uncertainties, quantification of model discrepancy, surrogate model construction, and local and global sensitivity analysis. The author maintains a complementary web page where readers can find data used in the exercises and other supplementary material. Uncertainty Quantification: Theory, Implementation, and Applications includes a large number of definitions and examples that use a suite of relatively simple models to illustrate concepts; numerous references to current and open research issues; and exercises that illustrate basic concepts and guide readers through the numerical implementation of algorithms for prototypical problems. It also features a wide range of applications, including weather and climate models, subsurface hydrology and geology models, nuclear power plant design, and models for biological phenomena, along with recent advances and topics that have appeared in the research literature within the last 15 years, including aspects of Bayesian model calibration, surrogate model development, parameter selection techniques, and global sensitivity analysis. Audience: The text is intended for advanced undergraduates, graduate students, and researchers in mathematics, statistics, operations research, computer science, biology, science, and engineering. It can be used as a textbook for one- or two-semester courses on uncertainty quantification or as a resource for researchers in a wide array of disciplines. A basic knowledge of probability, linear algebra, ordinary and partial differential equations, and introductory numerical analysis techniques is assumed. Contents: Chapter 1: Introduction; Chapter 2: Large-Scale Applications; Chapter 3: Prototypical Models; Chapter 4: Fundamentals of Probability, Random Processes, and Statistics; Chapter 5: Representation of Random Inputs; Chapter 6: Parameter Selection Techniques; Chapter 7: Frequentist Techniques for Parameter Estimation; Chapter 8: Bayesian Techniques for Parameter Estimation; Chapter 9: Uncertainty Propagation in Models; Chapter 10: Stochastic Spectral Methods; Chapter 11: Sparse Grid Quadrature and Interpolation Techniques; Chapter 12: Prediction in the Presence of Model Discrepancy; Chapter 13: Surrogate Models; Chapter 14: Local Sensitivity Analysis; Chapter 15: Global Sensitivity Analysis; Appendix A: Concepts from Functional Analysis; Bibliography; Index

782 citations


Journal ArticleDOI
TL;DR: In this article, an alternative summation of the MultiNest draws, called importance nested sampling (INS), is presented, which can calculate the Bayesian evidence at up to an order of magnitude higher accuracy than vanilla NS with no change in the way Multi-Nest explores the parameter space.
Abstract: Bayesian inference involves two main computational challenges. First, in estimating the parameters of some model for the data, the posterior distribution may well be highly multi-modal: a regime in which the convergence to stationarity of traditional Markov Chain Monte Carlo (MCMC) techniques becomes incredibly slow. Second, in selecting between a set of competing models the necessary estimation of the Bayesian evidence for each is, by definition, a (possibly high-dimensional) integration over the entire parameter space; again this can be a daunting computational task, although new Monte Carlo (MC) integration algorithms offer solutions of ever increasing efficiency. Nested sampling (NS) is one such contemporary MC strategy targeted at calculation of the Bayesian evidence, but which also enables posterior inference as a by-product, thereby allowing simultaneous parameter estimation and model selection. The widely-used MultiNest algorithm presents a particularly efficient implementation of the NS technique for multi-modal posteriors. In this paper we discuss importance nested sampling (INS), an alternative summation of the MultiNest draws, which can calculate the Bayesian evidence at up to an order of magnitude higher accuracy than `vanilla' NS with no change in the way MultiNest explores the parameter space. This is accomplished by treating as a (pseudo-)importance sample the totality of points collected by MultiNest, including those previously discarded under the constrained likelihood sampling of the NS algorithm. We apply this technique to several challenging test problems and compare the accuracy of Bayesian evidences obtained with INS against those from vanilla NS.

674 citations


Journal ArticleDOI
TL;DR: An iterative algorithm is developed based on the off-grid model from a Bayesian perspective while joint sparsity among different snapshots is exploited by assuming a Laplace prior for signals at all snapshots.
Abstract: Direction of arrival (DOA) estimation is a classical problem in signal processing with many practical applications. Its research has recently been advanced owing to the development of methods based on sparse signal reconstruction. While these methods have shown advantages over conventional ones, there are still difficulties in practical situations where true DOAs are not on the discretized sampling grid. To deal with such an off-grid DOA estimation problem, this paper studies an off-grid model that takes into account effects of the off-grid DOAs and has a smaller modeling error. An iterative algorithm is developed based on the off-grid model from a Bayesian perspective while joint sparsity among different snapshots is exploited by assuming a Laplace prior for signals at all snapshots. The new approach applies to both single snapshot and multi-snapshot cases. Numerical simulations show that the proposed algorithm has improved accuracy in terms of mean squared estimation error. The algorithm can maintain high estimation accuracy even under a very coarse sampling grid.

623 citations


Journal ArticleDOI
TL;DR: A novel Python-based toolbox called HDDM (hierarchical drift diffusion model), which allows fast and flexible estimation of the the drift-diffusion model and the related linear ballistic accumulator model, and supports the estimation of how trial-by-trial measurements influence decision-making parameters.
Abstract: The diffusion model is a commonly used tool to infer latent psychological processes underlying decision making, and to link them to neural mechanisms based on reaction times. Although efficient open source software has been made available to quantitatively fit the model to data, current estimation methods require an abundance of reaction time measurements to recover meaningful parameters, and only provide point estimates of each parameter. In contrast, hierarchical Bayesian parameter estimation methods are useful for enhancing statistical power, allowing for simultaneous estimation of individual subject parameters and the group distribution that they are drawn from, while also providing measures of uncertainty in these parameters in the posterior distribution. Here, we present a novel Python-based toolbox called HDDM (hierarchical drift diffusion model), which allows fast and flexible estimation of the the drift-diffusion model and the related linear ballistic accumulator model. HDDM requires fewer data per subject / condition than non-hierarchical method, allows for full Bayesian data analysis, and can handle outliers in the data. Finally, HDDM supports the estimation of how trial-by-trial measurements (e.g. fMRI) influence decision making parameters. This paper will first describe the theoretical background of drift-diffusion model and Bayesian inference. We then illustrate usage of the toolbox on a real-world data set from our lab. Finally, parameter recovery studies show that HDDM beats alternative fitting methods like the chi-quantile method as well as maximum likelihood estimation. The software and documentation can be downloaded at: http://ski.clps.brown.edu/hddm_docs

595 citations


Journal ArticleDOI
TL;DR: The challenges that will emerge as researchers start focusing their efforts on real-life computations, with a focus on probabilistic learning, structural learning and approximate inference are discussed.
Abstract: There is strong behavioral and physiological evidence that the brain both represents probability distributions and performs probabilistic inference. Computational neuroscientists have started to shed light on how these probabilistic representations and computations might be implemented in neural circuits. One particularly appealing aspect of these theories is their generality: they can be used to model a wide range of tasks, from sensory processing to high-level cognition. To date, however, these theories have only been applied to very simple tasks. Here we discuss the challenges that will emerge as researchers start focusing their efforts on real-life computations, with a focus on probabilistic learning, structural learning and approximate inference.

586 citations


Journal ArticleDOI
TL;DR: A heuristic proof suggesting that life—or biological self-organization—is an inevitable and emergent property of any (ergodic) random dynamical system that possesses a Markov blanket is presented.
Abstract: This paper presents a heuristic proof (and simulations of a primordial soup) suggesting that life-or biological self-organization-is an inevitable and emergent property of any (ergodic) random dynamical system that possesses a Markov blanket. This conclusion is based on the following arguments: if the coupling among an ensemble of dynamical systems is mediated by short-range forces, then the states of remote systems must be conditionally independent. These independencies induce a Markov blanket that separates internal and external states in a statistical sense. The existence of a Markov blanket means that internal states will appear to minimize a free energy functional of the states of their Markov blanket. Crucially, this is the same quantity that is optimized in Bayesian inference. Therefore, the internal states (and their blanket) will appear to engage in active Bayesian inference. In other words, they will appear to model-and act on-their world to preserve their functional and structural integrity, leading to homoeostasis and a simple form of autopoiesis.

Journal ArticleDOI
TL;DR: The INLA approach for approximate Bayesian inference for latent Gaussian models has been shown to give fast and accurate estimates of posterior marginals and to be a valuable tool in practice via the R-package R-INLA.

Journal ArticleDOI
TL;DR: The authors argue that the most successful forms of Bayesian statistics do not actually support that particular philosophy but rather accord much better with sophisticated forms of hypothetico-deductivism, and examine the actual role played by prior distributions in Bayesian models, and the crucial aspects of model checking and model revision.
Abstract: A substantial school in the philosophy of science identies Bayesian inference with inductive inference and even rationality as such, and seems to be strengthened by the rise and practical success of Bayesian statistics. We argue that the most successful forms of Bayesian statistics do not actually support that particular philosophy but rather accord much better with sophisticated forms of hypothetico-deductivism. We examine the actual role played by prior distributions in Bayesian models, and the crucial aspects of model checking and model revision, which fall outside the scope of Bayesian

Journal ArticleDOI
TL;DR: This work proposes a general mathematical framework and an algorithmic approach for optimal experimental design with nonlinear simulation-based models, and focuses on finding sets of experiments that provide the most information about targeted sets of parameters.

Journal ArticleDOI
01 Jul 2013-Genetics
TL;DR: It is concluded that members of the alphabet have a room in whole-genome prediction of phenotypes, but have somewhat doubtful inferential value, at least when sample size is such that n ≪ p.
Abstract: Whole-genome enabled prediction of complex traits has received enormous attention in animal and plant breeding and is making inroads into human and even Drosophila genetics. The term “Bayesian alphabet” denotes a growing number of letters of the alphabet used to denote various Bayesian linear regressions that differ in the priors adopted, while sharing the same sampling model. We explore the role of the prior distribution in whole-genome regression models for dissecting complex traits in what is now a standard situation with genomic data where the number of unknown parameters (p) typically exceeds sample size (n). Members of the alphabet aim to confront this overparameterization in various manners, but it is shown here that the prior is always influential, unless n ≫ p. This happens because parameters are not likelihood identified, so Bayesian learning is imperfect. Since inferences are not devoid of the influence of the prior, claims about genetic architecture from these methods should be taken with caution. However, all such procedures may deliver reasonable predictions of complex traits, provided that some parameters (“tuning knobs”) are assessed via a properly conducted cross-validation. It is concluded that members of the alphabet have a room in whole-genome prediction of phenotypes, but have somewhat doubtful inferential value, at least when sample size is such that n ≪ p.

Journal ArticleDOI
TL;DR: This work considers a hierarchical spatio-temporal model for particulate matter (PM) concentration in the North-Italian region Piemonte and proposes a strategy to represent a GF with Matérn covariance function as a Gaussian Markov Random Field (GMRF) through the SPDE approach.
Abstract: In this work, we consider a hierarchical spatio-temporal model for particulate matter (PM) concentration in the North-Italian region Piemonte. The model involves a Gaussian Field (GF), affected by a measurement error, and a state process characterized by a first order autoregressive dynamic model and spatially correlated innovations. This kind of model is well discussed and widely used in the air quality literature thanks to its flexibility in modelling the effect of relevant covariates (i.e. meteorological and geographical variables) as well as time and space dependence. However, Bayesian inference—through Markov chain Monte Carlo (MCMC) techniques—can be a challenge due to convergence problems and heavy computational loads. In particular, the computational issue refers to the infeasibility of linear algebra operations involving the big dense covariance matrices which occur when large spatio-temporal datasets are present. The main goal of this work is to present an effective estimating and spatial prediction strategy for the considered spatio-temporal model. This proposal consists in representing a GF with Matern covariance function as a Gaussian Markov Random Field (GMRF) through the Stochastic Partial Differential Equations (SPDE) approach. The main advantage of moving from a GF to a GMRF stems from the good computational properties that the latter enjoys. In fact, GMRFs are defined by sparse matrices that allow for computationally effective numerical methods. Moreover, when dealing with Bayesian inference for GMRFs, it is possible to adopt the Integrated Nested Laplace Approximation (INLA) algorithm as an alternative to MCMC methods giving rise to additional computational advantages. The implementation of the SPDE approach through the R-library INLA ( www.r-inla.org ) is illustrated with reference to the Piemonte PM data. In particular, providing the step-by-step R-code, we show how it is easy to get prediction and probability of exceedance maps in a reasonable computing time.

Journal ArticleDOI
TL;DR: In this paper, the authors develop the formalism of quantum conditional states, which provides a unified description of these two sorts of experiment, and they also show that remote steering of quantum states can be described within their formalism as a mere updating of beliefs about one system given new information about another, and retrodictive inferences can be expressed using the same belief propagation rule as is used for predictive inferences.
Abstract: Quantum theory can be viewed as a generalization of classical probability theory, but the analogy as it has been developed so far is not complete. Whereas the manner in which inferences are made in classical probability theory is independent of the causal relation that holds between the conditioned variable and the conditioning variable, in the conventional quantum formalism, there is a significant difference between how one treats experiments involving two systems at a single time and those involving a single system at two times. In this article, we develop the formalism of quantum conditional states, which provides a unified description of these two sorts of experiment. In addition, concepts that are distinct in the conventional formalism become unified: Channels, sets of states, and positive operator valued measures are all seen to be instances of conditional states; the action of a channel on a state, ensemble averaging, the Born rule, the composition of channels, and nonselective state-update rules are all seen to be instances of belief propagation. Using a quantum generalization of Bayes' theorem and the associated notion of Bayesian conditioning, we also show that the remote steering of quantum states can be described within our formalism as a mere updating of beliefs about one system given new information about another, and retrodictive inferences can be expressed using the same belief propagation rule as is used for predictive inferences. Finally, we show that previous arguments for interpreting the projection postulate as a quantum generalization of Bayesian conditioning are based on a misleading analogy and that it is best understood as a combination of belief propagation (corresponding to the nonselective state-update map) and conditioning on the measurement outcome.

Journal ArticleDOI
TL;DR: This work proposes a new parameterization of the spatial generalized linear mixed model that alleviates spatial confounding and speeds computation by greatly reducing the dimension of theatial random effects.
Abstract: Summary. Non-Gaussian spatial data are very common in many disciplines. For instance, count data are common in disease mapping, and binary data are common in ecology. When fitting spatial regressions for such data, one needs to account for dependence to ensure reliable inference for the regression coefficients. The spatial generalized linear mixed model offers a very popular and flexible approach to modelling such data, but this model suffers from two major shortcomings: variance inflation due to spatial confounding and high dimensional spatial random effects that make fully Bayesian inference for such models computationally challenging. We propose a new parameterization of the spatial generalized linear mixed model that alleviates spatial confounding and speeds computation by greatly reducing the dimension of the spatial random effects. We illustrate the application of our approach to simulated binary, count and Gaussian spatial data sets, and to a large infant mortality data set.

Journal ArticleDOI
TL;DR: Bayesian model averaging is extended to wind speed, taking account of a skewed distribution and observations that are coarsely discretized, and this method provides calibrated and sharp probabilistic forecasts.
Abstract: The current weather forecasting paradigm is deterministic, based on numerical models. Multiple estimates of the current state of the atmosphere are used to generate an ensemble of deterministic predictions. Ensemble forecasts, while providing information on forecast uncertainty, are often uncalibrated. Bayesian model averaging (BMA) is a statistical ensemble postprocessing method that creates calibrated predictive probability density functions (PDFs). Probabilistic wind forecasting offers two challenges: a skewed distribution, and observations that are coarsely discretized. We extend BMA to wind speed, taking account of these challenges. This method provides calibrated and sharp probabilistic forecasts. Comparisons are made between several formulations.

Journal ArticleDOI
TL;DR: Variational Bayes is considered as an alternative scheme that provides formal constraints on the computational anatomy of inference and action—constraints that are remarkably consistent with neuroanatomy.
Abstract: This paper considers agency in the setting of embodied or active inference. In brief, we associate a sense of agency with prior beliefs about action and ask what sorts of beliefs underlie optimal behaviour. In particular, we consider prior beliefs that action minimises the Kullback-Leibler divergence between desired states and attainable states in the future. This allows one to formulate bounded rationality as approximate Bayesian inference that optimises a free energy bound on model evidence. We show that constructs like expected utility, exploration bonuses, softmax choice rules and optimism bias emerge as natural consequences of this formulation. Previous accounts of active inference have focused on predictive coding and Bayesian filtering schemes for minimising free energy. Here, we consider variational Bayes as an alternative scheme that provides formal constraints on the computational anatomy of inference and action – constraints that are remarkably consistent with neuroanatomy. Furthermore, this scheme contextualises optimal decision theory and economic (utilitarian) formulations as pure inference problems. For example, expected utility theory emerges as a special case of free energy minimisation, where the sensitivity or inverse temperature (of softmax functions and quantal response equilibria) has a unique and Bayes-optimal solution – that minimises free energy. This sensitivity corresponds to the precision of beliefs about behaviour, such that attainable goals are afforded a higher precision or confidence. In turn, this means that optimal behaviour entails a representation of confidence about outcomes that are under an agent's control.

Journal Article
TL;DR: The GPstuff toolbox is a versatile collection of Gaussian process models and computational tools required for Bayesian inference, including various inference methods, sparse approximations and model assessment methods.
Abstract: The GPstuff toolbox is a versatile collection of Gaussian process models and computational tools required for Bayesian inference. The tools include, among others, various inference methods, sparse approximations and model assessment methods.

Journal ArticleDOI
01 May 2013-Genetics
TL;DR: Modifications are introduced to the rjMCMC algorithms that remove the constraint on the new species divergence time when splitting and alter the gene trees to remove incompatibilities, and are found to improve mixing of the Markov chain for both simulated and empirical data sets.
Abstract: Several computational methods have recently been proposed for delimiting species using multilocus sequence data. Among them, the Bayesian method of Yang and Rannala uses the multispecies coalescent model in the likelihood framework to calculate the posterior probabilities for the different species-delimitation models. It has a sound statistical basis and is found to have nice statistical properties in simulation studies, such as low error rates of undersplitting and oversplitting. However, the method suffers from poor mixing of the reversible-jump Markov chain Monte Carlo (rjMCMC) algorithms. Here, we describe several modifications to the algorithms. We propose a flexible prior that allows the user to specify the probability that each node on the guide tree represents a true speciation event. We also introduce modifications to the rjMCMC algorithms that remove the constraint on the new species divergence time when splitting and alter the gene trees to remove incompatibilities. The new algorithms are found to improve mixing of the Markov chain for both simulated and empirical data sets.

Journal ArticleDOI
TL;DR: A Bayesian model based on automatic relevance determination (ARD) in which the columns of the dictionary matrix and the rows of the activation matrix are tied together through a common scale parameter in their prior is proposed.
Abstract: This paper addresses the estimation of the latent dimensionality in nonnegative matrix factorization (NMF) with the β-divergence. The β-divergence is a family of cost functions that includes the squared euclidean distance, Kullback-Leibler (KL) and Itakura-Saito (IS) divergences as special cases. Learning the model order is important as it is necessary to strike the right balance between data fidelity and overfitting. We propose a Bayesian model based on automatic relevance determination (ARD) in which the columns of the dictionary matrix and the rows of the activation matrix are tied together through a common scale parameter in their prior. A family of majorization-minimization (MM) algorithms is proposed for maximum a posteriori (MAP) estimation. A subset of scale parameters is driven to a small lower bound in the course of inference, with the effect of pruning the corresponding spurious components. We demonstrate the efficacy and robustness of our algorithms by performing extensive experiments on synthetic data, the swimmer dataset, a music decomposition example, and a stock price prediction task.

Journal ArticleDOI
TL;DR: In this article, the uncertainty in the numerical solution of linearized infinite-dimensional statistical inverse problems is estimated using the Bayesian inference formulation, where the prior probability distribution is chosen appropriately in order to guarantee wellposedness of the inverse problem and facilitate computation of the posterior.
Abstract: We present a computational framework for estimating the uncertainty in the numerical solution of linearized infinite-dimensional statistical inverse problems. We adopt the Bayesian inference formulation: given observational data and their uncertainty, the governing forward problem and its uncertainty, and a prior probability distribution describing uncertainty in the parameter field, find the posterior probability distribution over the parameter field. The prior must be chosen appropriately in order to guarantee well-posedness of the infinite-dimensional inverse problem and facilitate computation of the posterior. Furthermore, straightforward discretizations may not lead to convergent approximations of the infinite-dimensional problem. And finally, solution of the discretized inverse problem via explicit construction of the covariance matrix is prohibitive due to the need to solve the forward problem as many times as there are parameters. Our computational framework builds on the infinite-dimensional form...

Journal ArticleDOI
TL;DR: It is shown that the essential components of a Bayesian framework are closely related to the clock, memory, and decision stages used by these models, and that such an integrated framework offers a new perspective on distortions in timing and time perception that are otherwise difficult to explain.

Proceedings ArticleDOI
Xudong Cao1, David Wipf1, Fang Wen1, Genquan Duan1, Jian Sun1 
01 Dec 2013
TL;DR: This work proposes a principled transfer learning approach for merging plentiful source-domain data with limited samples from some target domain of interest to create a classifier that ideally performs nearly as well as if rich target- domain data were present.
Abstract: Face verification involves determining whether a pair of facial images belongs to the same or different subjects. This problem can prove to be quite challenging in many important applications where labeled training data is scarce, e.g., family album photo organization software. Herein we propose a principled transfer learning approach for merging plentiful source-domain data with limited samples from some target domain of interest to create a classifier that ideally performs nearly as well as if rich target-domain data were present. Based upon a surprisingly simple generative Bayesian model, our approach combines a KL-divergence based regularizer/prior with a robust likelihood function leading to a scalable implementation via the EM algorithm. As justification for our design choices, we later use principles from convex analysis to recast our algorithm as an equivalent structured rank minimization problem leading to a number of interesting insights related to solution structure and feature-transform invariance. These insights help to both explain the effectiveness of our algorithm as well as elucidate a wide variety of related Bayesian approaches. Experimental testing with challenging datasets validate the utility of the proposed algorithm.

Journal ArticleDOI
TL;DR: In this paper, a family of prior distributions for covariance matrices is studied, which possess the attractive property of all standard deviation and correlation parameters being marginally noninformative for particular hyper-parameter choices.
Abstract: A family of prior distributions for covariance matrices is studied. Members of the family possess the attractive property of all standard deviation and correlation parameters being marginally noninformative for particular hyper-parameter choices. Moreover, the family is quite simple and, for approximate Bayesian inference techniques such as Markov chain Monte Carlo and mean eld variational Bayes, has tractability on par with the Inverse-Wishart conjugate fam-ily of prior distributions. A simulation study shows that the new prior distributions can lead to more accurate sparse covariance matrix estimation.

Journal ArticleDOI
TL;DR: CATMIP as discussed by the authors combines the Metropolis algorithm with elements of simulated annealing and genetic algorithms to dynamically optimize the algorithm's efficiency as it runs, and it works independently of the model design, a priori constraints and data under consideration, and can be used for a wide variety of scientific problems.
Abstract: The estimation of finite fault earthquake source models is an inherently underdetermined problem: there is no unique solution to the inverse problem of determining the rupture history at depth as a function of time and space when our data are limited to observations at the Earth’s surface. Bayesian methods allow us to determine the set of all plausible source model parameters that are consistent with the observations, our a priori assumptions about the physics of the earthquake source and wave propagation, and models for the observation errors and the errors due to the limitations in our forward model. Because our inversion approach does not require inverting any matrices other than covariance matrices, we can restrict our ensemble of solutions to only those models that are physically defensible while avoiding the need to restrict our class of models based on considerations of numerical invertibility. We only use prior information that is consistent with the physics of the problem rather than some artefice (such as smoothing) needed to produce a unique optimal model estimate. Bayesian inference can also be used to estimate model-dependent and internally consistent effective errors due to shortcomings in the forward model or data interpretation, such as poor Green’s functions or extraneous signals recorded by our instruments. Until recently, Bayesian techniques have been of limited utility for earthquake source inversions because they are computationally intractable for problems with as many free parameters as typically used in kinematic finite fault models. Our algorithm, called cascading adaptive transitional metropolis in parallel (CATMIP), allows sampling of high-dimensional problems in a parallel computing framework. CATMIP combines the Metropolis algorithm with elements of simulated annealing and genetic algorithms to dynamically optimize the algorithm’s efficiency as it runs. The algorithm is a generic Bayesian Markov Chain Monte Carlo sampler; it works independently of the model design, a priori constraints and data under consideration, and so can be used for a wide variety of scientific problems. We compare CATMIP’s efficiency relative to several existing sampling algorithms and then present synthetic performance tests of finite fault earthquake rupture models computed using CATMIP.

Journal ArticleDOI
TL;DR: The theoretical framework is developed and applied to a range of exemplary problems that highlight how to improve experimental investigations into the structure and dynamics of biological systems and their behavior.
Abstract: Our understanding of most biological systems is in its infancy. Learning their structure and intricacies is fraught with challenges, and often side-stepped in favour of studying the function of different gene products in isolation from their physiological context. Constructing and inferring global mathematical models from experimental data is, however, central to systems biology. Different experimental setups provide different insights into such systems. Here we show how we can combine concepts from Bayesian inference and information theory in order to identify experiments that maximize the information content of the resulting data. This approach allows us to incorporate preliminary information; it is global and not constrained to some local neighbourhood in parameter space and it readily yields information on parameter robustness and confidence. Here we develop the theoretical framework and apply it to a range of exemplary problems that highlight how we can improve experimental investigations into the structure and dynamics of biological systems and their behavior.

Journal Article
TL;DR: This work introduces a novel efficient solution that imposes group-wise sparsity to estimate the posterior of an extended model which not only extracts the statistical dependencies between data sets but also decomposes the data into shared and data set-specific components.
Abstract: Canonical correlation analysis (CCA) is a classical method for seeking correlations between two multivariate data sets. During the last ten years, it has received more and more attention in the machine learning community in the form of novel computational formulations and a plethora of applications. We review recent developments in Bayesian models and inference methods for CCA which are attractive for their potential in hierarchical extensions and for coping with the combination of large dimensionalities and small sample sizes. The existing methods have not been particularly successful in fulfilling the promise yet; we introduce a novel efficient solution that imposes group-wise sparsity to estimate the posterior of an extended model which not only extracts the statistical dependencies (correlations) between data sets but also decomposes the data into shared and data set-specific components. In statistics literature the model is known as inter-battery factor analysis (IBFA), for which we now provide a Bayesian treatment.