Showing papers on "Bayesian inference published in 2014"

PDF

Open Access

Posted Content•

Stochastic Backpropagation and Approximate Inference in Deep Generative Models

[...]

Danilo Jimenez Rezende¹, Shakir Mohamed¹, Daan Wierstra¹•Institutions (1)

16 Jan 2014-arXiv: Machine Learning

TL;DR: In this article, a generative and recognition model is proposed to represent approximate posterior distributions and act as a stochastic encoder of the data, which allows for joint optimisation of the parameters of both the generative model and the recognition model.

...read moreread less

Abstract: We marry ideas from deep neural networks and approximate Bayesian inference to derive a generalised class of deep, directed generative models, endowed with a new algorithm for scalable inference and learning. Our algorithm introduces a recognition model to represent approximate posterior distributions, and that acts as a stochastic encoder of the data. We develop stochastic back-propagation -- rules for back-propagation through stochastic variables -- and use this to develop an algorithm that allows for joint optimisation of the parameters of both the generative and recognition model. We demonstrate on several real-world data sets that the model generates realistic samples, provides accurate imputations of missing data and is a useful tool for high-dimensional data visualisation.

...read moreread less

3,316 citations

Posted Content•

Semi-Supervised Learning with Deep Generative Models

[...]

Diederik P. Kingma, Danilo Jimenez Rezende, Shakir Mohamed, Max Welling

20 Jun 2014-arXiv: Learning

TL;DR: It is shown that deep generative models and approximate Bayesian inference exploiting recent advances in variational methods can be used to provide significant improvements, making generative approaches highly competitive for semi-supervised learning.

...read moreread less

Abstract: The ever-increasing size of modern data sets combined with the difficulty of obtaining label information has made semi-supervised learning one of the problems of significant practical importance in modern data analysis. We revisit the approach to semi-supervised learning with generative models and develop new models that allow for effective generalisation from small labelled data sets to large unlabelled ones. Generative approaches have thus far been either inflexible, inefficient or non-scalable. We show that deep generative models and approximate Bayesian inference exploiting recent advances in variational methods can be used to provide significant improvements, making generative approaches highly competitive for semi-supervised learning.

...read moreread less

2,194 citations

Journal Article•

The No-U-turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo

[...]

Matthew D. Homan¹, Andrew Gelman²•Institutions (2)

Adobe Systems¹, Columbia University²

01 Jan 2014-Journal of Machine Learning Research

TL;DR: The No-U-Turn Sampler (NUTS), an extension to HMC that eliminates the need to set a number of steps L, and derives a method for adapting the step size parameter {\epsilon} on the fly based on primal-dual averaging.

...read moreread less

Abstract: Hamiltonian Monte Carlo (HMC) is a Markov chain Monte Carlo (MCMC) algorithm that avoids the random walk behavior and sensitivity to correlated parameters that plague many MCMC methods by taking a series of steps informed by first-order gradient information. These features allow it to converge to high-dimensional target distributions much more quickly than simpler methods such as random walk Metropolis or Gibbs sampling. However, HMC's performance is highly sensitive to two user-specified parameters: a step size e and a desired number of steps L. In particular, if L is too small then the algorithm exhibits undesirable random walk behavior, while if L is too large the algorithm wastes computation. We introduce the No-U-Turn Sampler (NUTS), an extension to HMC that eliminates the need to set a number of steps L. NUTS uses a recursive algorithm to build a set of likely candidate points that spans a wide swath of the target distribution, stopping automatically when it starts to double back and retrace its steps. Empirically, NUTS performs at least as efficiently as (and sometimes more effciently than) a well tuned standard HMC method, without requiring user intervention or costly tuning runs. We also derive a method for adapting the step size parameter e on the fly based on primal-dual averaging. NUTS can thus be used with no hand-tuning at all, making it suitable for applications such as BUGS-style automatic inference engines that require efficient "turnkey" samplers.

...read moreread less

1,988 citations

Proceedings Article•

Stochastic Backpropagation and Approximate Inference in Deep Generative Models

[...]

Danilo Jimenez Rezende¹, Shakir Mohamed¹, Daan Wierstra¹•Institutions (1)

Google¹

21 Jun 2014

TL;DR: This work marries ideas from deep neural networks and approximate Bayesian inference to derive a generalised class of deep, directed generative models, endowed with a new algorithm for scalable inference and learning that introduces a recognition model to represent approximate posterior distributions and that acts as a stochastic encoder of the data.

...read moreread less

Abstract: We marry ideas from deep neural networks and approximate Bayesian inference to derive a generalised class of deep, directed generative models, endowed with a new algorithm for scalable inference and learning. Our algorithm introduces a recognition model to represent an approximate posterior distribution and uses this for optimisation of a variational lower bound. We develop stochastic backpropagation - rules for gradient backpropagation through stochastic variables - and derive an algorithm that allows for joint optimisation of the parameters of both the generative and recognition models. We demonstrate on several real-world data sets that by using stochastic backpropagation and variational inference, we obtain models that are able to generate realistic samples of data, allow for accurate imputations of missing data, and provide a useful tool for high-dimensional data visualisation.

...read moreread less

1,954 citations

Book•

Bayesian Cognitive Modeling: A Practical Course

[...]

Michael D. Lee¹, Eric-Jan Wagenmakers²•Institutions (2)

University of California, Irvine¹, University of Amsterdam²

14 Apr 2014

TL;DR: In this article, the basics of Bayesian analysis are discussed, and a WinBUGS-based approach is presented to get started with WinBUGs, which is based on the SIMPLE model of memory.

...read moreread less

Abstract: Part I. Getting Started: 1. The basics of Bayesian analysis 2. Getting started with WinBUGS Part II. Parameter Estimation: 3. Inferences with binomials 4. Inferences with Gaussians 5. Some examples of data analysis 6. Latent mixture models Part III. Model Selection: 7. Bayesian model comparison 8. Comparing Gaussian means 9. Comparing binomial rates Part IV. Case Studies: 10. Memory retention 11. Signal detection theory 12. Psychophysical functions 13. Extrasensory perception 14. Multinomial processing trees 15. The SIMPLE model of memory 16. The BART model of risk taking 17. The GCM model of categorization 18. Heuristic decision-making 19. Number concept development.

...read moreread less

1,192 citations

Book•

Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan

[...]

John K. Kruschke

17 Nov 2014

TL;DR: Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan provides an accessible approach to Bayesian data analysis, as material is explained clearly with concrete examples.

...read moreread less

Abstract: There is an explosion of interest in Bayesian statistics, primarily because recently created computational methods have finally made Bayesian analysis obtainable to a wide audience. Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan provides an accessible approach to Bayesian data analysis, as material is explained clearly with concrete examples. The book begins with the basics, including essential concepts of probability and random sampling, and gradually progresses to advanced hierarchical modeling methods for realistic data. Included are step-by-step instructions on how to conduct Bayesian data analyses in the popular and free software R and WinBugs. This book is intended for first-year graduate students or advanced undergraduates. It provides a bridge between undergraduate training and modern Bayesian methods for data analysis, which is becoming the accepted research standard. Knowledge of algebra and basic calculus is a prerequisite. New to this Edition (partial list): * There are all new programs in JAGS and Stan. The new programs are designed to be much easier to use than the scripts in the first edition. In particular, there are now compact high-level scripts that make it easy to run the programs on your own data sets. This new programming was a major undertaking by itself.* The introductory Chapter 2, regarding the basic ideas of how Bayesian inference re-allocates credibility across possibilities, is completely rewritten and greatly expanded.* There are completely new chapters on the programming languages R (Ch. 3), JAGS (Ch. 8), and Stan (Ch. 14). The lengthy new chapter on R includes explanations of data files and structures such as lists and data frames, along with several utility functions. (It also has a new poem that I am particularly pleased with.) The new chapter on JAGS includes explanation of the RunJAGS package which executes JAGS on parallel computer cores. The new chapter on Stan provides a novel explanation of the concepts of Hamiltonian Monte Carlo. The chapter on Stan also explains conceptual differences in program flow between it and JAGS.* Chapter 5 on Bayes' rule is greatly revised, with a new emphasis on how Bayes' rule re-allocates credibility across parameter values from prior to posterior. The material on model comparison has been removed from all the early chapters and integrated into a compact presentation in Chapter 10.* What were two separate chapters on the Metropolis algorithm and Gibbs sampling have been consolidated into a single chapter on MCMC methods (as Chapter 7). There is extensive new material on MCMC convergence diagnostics in Chapters 7 and 8. There are explanations of autocorrelation and effective sample size. There is also exploration of the stability of the estimates of the HDI limits. New computer programs display the diagnostics, as well.* Chapter 9 on hierarchical models includes extensive new and unique material on the crucial concept of shrinkage, along with new examples.* All the material on model comparison, which was spread across various chapters in the first edition, in now consolidated into a single focused chapter (Ch. 10) that emphasizes its conceptualization as a case of hierarchical modeling.* Chapter 11 on null hypothesis significance testing is extensively revised. It has new material for introducing the concept of sampling distribution. It has new illustrations of sampling distributions for various stopping rules, and for multiple tests.* Chapter 12, regarding Bayesian approaches to null value assessment, has new material about the region of practical equivalence (ROPE), new examples of accepting the null value by Bayes factors, and new explanation of the Bayes factor in terms of the Savage-Dickey method.* Chapter 13, regarding statistical power and sample size, has an extensive new section on sequential testing, and making the research goal be precision of estimation instead of rejecting or accepting a particular value.* Chapter 15, which introduces the generalized linear model, is fully revised, with more complete tables showing combinations of predicted and predictor variable types.* Chapter 16, regarding estimation of means, now includes extensive discussion of comparing two groups, along with explicit estimates of effect size.* Chapter 17, regarding regression on a single metric predictor, now includes extensive examples of robust regression in JAGS and Stan. New examples of hierarchical regression, including quadratic trend, graphically illustrate shrinkage in estimates of individual slopes and curvatures. The use of weighted data is also illustrated.* Chapter 18, on multiple linear regression, includes a new section on Bayesian variable selection, in which various candidate predictors are probabilistically included in the regression model.* Chapter 19, on one-factor ANOVA-like analysis, has all new examples, including a completely worked out example analogous to analysis of covariance (ANCOVA), and a new example involving heterogeneous variances.* Chapter 20, on multi-factor ANOVA-like analysis, has all new examples, including a completely worked out example of a split-plot design that involves a combination of a within-subjects factor and a between-subjects factor.* Chapter 21, on logistic regression, is expanded to include examples of robust logistic regression, and examples with nominal predictors.* There is a completely new chapter (Ch. 22) on multinomial logistic regression. This chapter fills in a case of the generalized linear model (namely, a nominal predicted variable) that was missing from the first edition.* Chapter 23, regarding ordinal data, is greatly expanded. New examples illustrate single-group and two-group analyses, and demonstrate how interpretations differ from treating ordinal data as if they were metric.* There is a new section (25.4) that explains how to model censored data in JAGS.* Many exercises are new or revised. * Accessible, including the basics of essential concepts of probability and random sampling* Examples with R programming language and JAGS software* Comprehensive coverage of all scenarios addressed by non-Bayesian textbooks: t-tests, analysis of variance (ANOVA) and comparisons in ANOVA, multiple regression, and chi-square (contingency table analysis)* Coverage of experiment planning* R and JAGS computer programming code on website* Exercises have explicit purposes and guidelines for accomplishment* Provides step-by-step instructions on how to conduct Bayesian data analyses in the popular and free software R and WinBugs

...read moreread less

1,190 citations

Journal Article•DOI•

What Are the Odds? A Practical Guide to Computing and Reporting Bayes Factors.

[...]

Andrew F. Jarosz¹, Jennifer Wiley•Institutions (1)

University of Illinois at Chicago¹

07 Nov 2014

TL;DR: This paper is to provide an easy template for the inclusion of the Bayes factor in reporting experimental results, particularly as a recommendation for articles in the Journal of Problem Solving.

...read moreread less

Abstract: The purpose of this paper is to provide an easy template for the inclusion of the Bayes factor in reporting experimental results, particularly as a recommendation for articles in the Journal of Problem Solving. The Bayes factor provides information with a similar purpose to the p-value – to allow the researcher to make statistical inferences from data provided by experiments. While the p-value is widely used, the Bayes factor provides several advantages, particularly in that it allows the researcher to make a statement about the alternative hypothesis, rather than just the null hypothesis. In addition, it provides a clearer estimate of the amount of evidence present in the data. Building on previous work by authors such as Wagenmakers (2007), Rouder et al. (2009), and Masson (2011), this article provides a short introduction to Bayes factors, before providing a practical guide to their computation using examples from published work on problem solving.

...read moreread less

844 citations

Proceedings Article•

Semi-supervised Learning with Deep Generative Models

[...]

Diederik P. Kingma¹, Shakir Mohamed², Danilo Jimenez Rezende², Max Welling¹•Institutions (2)

University of Amsterdam¹, Google²

08 Dec 2014

TL;DR: This paper revisited the approach to semi-supervised learning with generative models and developed new models that allow for effective generalisation from small labelled data sets to large unlabeled ones.

...read moreread less

805 citations

Journal Article•DOI•

Bayesian model selection for group studies - revisited.

[...]

Lionel Rigoux, Klaas E. Stephan¹, Klaas E. Stephan², Karl J. Friston¹, Jean Daunizeau - Show less +1 more•Institutions (2)

Wellcome Trust Centre for Neuroimaging¹, University of Zurich²

01 Jan 2014-NeuroImage

TL;DR: The material presented in this paper finesses the problems of group-level BMS in the analysis of neuroimaging and behavioural data by introducing the Bayesian omnibus risk (BOR) as a measure of the statistical risk incurred when performing group BMS, and highlighting the difference between random effects BMS and classical random effects analyses of parameter estimates.

...read moreread less

498 citations

Journal Article•DOI•

Non-stationary extreme value analysis in a changing climate

[...]

Linyin Cheng¹, Amir AghaKouchak¹, Eric Gilleland², Richard W. Katz²•Institutions (2)

University of California, Irvine¹, National Center for Atmospheric Research²

24 Sep 2014-Climatic Change

TL;DR: The software presents the results of non-stationary extreme value analysis using various exceedance probability methods, and shows that NEVA can reliably describe extremes and their return levels.

...read moreread less

Abstract: This paper introduces a framework for estimating stationary and non-stationary return levels, return periods, and risks of climatic extremes using Bayesian inference. This framework is implemented in the Non-stationary Extreme Value Analysis (NEVA) software package, explicitly designed to facilitate analysis of extremes in the geosciences. In a Bayesian approach, NEVA estimates the extreme value parameters with a Differential Evolution Markov Chain (DE-MC) approach for global optimization over the parameter space. NEVA includes posterior probability intervals (uncertainty bounds) of estimated return levels through Bayesian inference, with its inherent advantages in uncertainty quantification. The software presents the results of non-stationary extreme value analysis using various exceedance probability methods. We evaluate both stationary and non-stationary components of the package for a case study consisting of annual temperature maxima for a gridded global temperature dataset. The results show that NEVA can reliably describe extremes and their return levels.

...read moreread less

412 citations

Journal Article•DOI•

Species delimitation using genome-wide SNP data.

[...]

Adam D. Leaché¹, Matthew K. Fujita², Vladimir N. Minin¹, Remco R. Bouckaert³•Institutions (3)

University of Washington¹, University of Texas at Arlington², University of Auckland³

01 Jul 2014-Systematic Biology

TL;DR: A recently introduced dynamic programming algorithm for estimating species trees that bypasses MCMC integration over gene trees with sophisticated methods for estimating marginal likelihoods, needed for Bayesian model selection, are combined to provide a rigorous and computationally tractable technique for genome-wide species delimitation.

...read moreread less

Abstract: The multispecies coalescent has provided important progress for evolutionary inferences, including increasing the statistical rigor and objectivity of comparisons among competing species delimitation models. However, Bayesian species delimitation methods typically require brute force integration over gene trees via Markov chain Monte Carlo (MCMC), which introduces a large computation burden and precludes their application to genomic-scale data. Here we combine a recently introduced dynamic programming algorithm for estimating species trees that bypasses MCMC integration over gene trees with sophisticated methods for estimating marginal likelihoods, needed for Bayesian model selection, to provide a rigorous and computationally tractable technique for genome-wide species delimitation. We provide a critical yet simple correction that brings the likelihoods of different species trees, and more importantly their corresponding marginal likelihoods, to the same common denominator, which enables direct and accurate comparisons of competing species delimitation models using Bayes factors. We test this approach, which we call Bayes factor delimitation (*with genomic data; BFD*), using common species delimitation scenarios with computer simulations. Varying the numbers of loci and the number of samples suggest that the approach can distinguish the true model even with few loci and limited samples per species. Misspecification of the prior for population size θ has little impact on support for the true model. We apply the approach to West African forest geckos (Hemidactylus fasciatus complex) using genome-wide SNP data. This new Bayesian method for species delimitation builds on a growing trend for objective species delimitation methods with explicit model assumptions that are easily tested. [Bayes factor; model testing; phylogeography; RADseq; simulation; speciation.].

...read moreread less

Book•DOI•

Introduction to imprecise probabilities

[...]

Thomas Augustin, Frank P. A. Coolen, Gerd de Cooman, Matthias C. M. Troffaes

01 Jan 2014

TL;DR: This book discusses the construction of Sets of Desirable Gambles, a model for evaluating the relationship between self-consistency and freedom, and its applications in the context of self-confidence and self-regulation.

...read moreread less

Abstract: Preface Introduction Acknowledgements Outline of this Book and Guide to Readers Contributors 1 Desirability 1.1 Introduction 1.2 Reasoning about and with Sets of Desirable Gambles 1.2.1 Rationality Criteria 1.2.2 Assessments Avoiding Partial or Sure Loss 1.2.3 Coherent Sets of Desirable Gambles 1.2.4 Natural Extension 1.2.5 Desirability Relative to Subspaces with Arbitrary Vector Orderings 1.3 Deriving & Combining Sets of Desirable Gambles 1.3.1 Gamble Space Transformations 1.3.2 Derived Coherent Sets of Desirable Gambles 1.3.3 Conditional Sets of Desirable Gambles 1.3.4 Marginal Sets of Desirable Gambles 1.3.5 Combining Sets of Desirable Gambles 1.4 Partial Preference Orders 1.4.1 Strict Preference 1.4.2 Nonstrict Preference 1.4.3 Nonstrict Preferences Implied by Strict Ones 1.4.4 Strict Preferences Implied by Nonstrict Ones 1.5 Maximally Committal Sets of Strictly Desirable Gambles 1.6 Relationships with Other, Nonequivalent Models 1.6.1 Linear Previsions 1.6.2 Credal Sets 1.6.3 To Lower and Upper Previsions 1.6.4 Simplified Variants of Desirability 1.6.5 From Lower Previsions 1.6.6 Conditional Lower Previsions 1.7 Further Reading 2 Lower Previsions 2.1 Introduction 2.2 Coherent Lower Previsions 2.2.1 Avoiding Sure Loss and Coherence 2.2.2 Linear Previsions 2.2.3 Sets of Desirable Gambles 2.2.4 Natural Extension 2.3 Conditional Lower Previsions 2.3.1 Coherence of a Finite Number of Conditional Lower Previsions 2.3.2 Natural Extension of Conditional Lower Previsions 2.3.3 Coherence of an Unconditional and a Conditional Lower Prevision 2.3.4 Updating with the Regular Extension 2.4 Further Reading 2.4.1 The Work of Williams 2.4.2 The Work of Kuznetsov 2.4.3 The Work of Weichselberger 3 Structural Judgements 3.1 Introduction 3.2 Irrelevance and Independence 3.2.1 Epistemic Irrelevance 3.2.2 Epistemic Independence 3.2.3 Envelopes of Independent Precise Models 3.2.4 Strong Independence 3.2.5 The Formalist Approach to Independence 3.3 Invariance 3.3.1 Weak Invariance 3.3.2 Strong Invariance 3.4 Exchangeability. 3.4.1 Representation Theorem for Finite Sequences 3.4.2 Exchangeable Natural Extension 3.4.3 Exchangeable Sequences 3.5 Further Reading 3.5.1 Independence. 3.5.2 Invariance 3.5.3 Exchangeability 4 Special Cases 4.1 Introduction 4.2 Capacities and n-monotonicity 4.3 2-monotone Capacities 4.4 Probability Intervals on Singletons 4.5 1-monotone Capacities 4.5.1 Constructing 1-monotone Capacities 4.5.2 Simple Support Functions 4.5.3 Further Elements 4.6 Possibility Distributions, p-boxes, Clouds and Related Models. 4.6.1 Possibility Distributions 4.6.2 Fuzzy Intervals 4.6.3 Clouds 4.6.4 p-boxes. 4.7 Neighbourhood Models 4.7.1 Pari-mutuel 4.7.2 Odds-ratio 4.7.3 Linear-vacuous 4.7.4 Relations between Neighbourhood Models 4.8 Summary 5 Other Uncertainty Theories Based on Capacities 5.1 Imprecise Probability = Modal Logic + Probability 5.1.1 Boolean Possibility Theory and Modal Logic 5.1.2 A Unifying Framework for Capacity Based Uncertainty Theories 5.2 From Imprecise Probabilities to Belief Functions and Possibility Theory 5.2.1 Random Disjunctive Sets 5.2.2 Numerical Possibility Theory 5.2.3 Overall Picture 5.3 Discrepancies between Uncertainty Theories 5.3.1 Objectivist vs. Subjectivist Standpoints 5.3.2 Discrepancies in Conditioning 5.3.3 Discrepancies in Notions of Independence 5.3.4 Discrepancies in Fusion Operations 5.4 Further Reading 6 Game-Theoretic Probability 6.1 Introduction 6.2 A Law of Large Numbers 6.3 A General Forecasting Protocol 6.4 The Axiom of Continuity 6.5 Doob s Argument 6.6 Limit Theorems of Probability 6.7 Levy s Zero-One Law. 6.8 The Axiom of Continuity Revisited 6.9 Further Reading 7 Statistical Inference 7.1 Background and Introduction 7.1.1 What is Statistical Inference? 7.1.2 (Parametric) Statistical Models and i.i.d. Samples 7.1.3 Basic Tasks and Procedures of Statistical Inference 7.1.4 Some Methodological Distinctions 7.1.5 Examples: Multinomial and Normal Distribution 7.2 Imprecision in Statistics, some General Sources and Motives 7.2.1 Model and Data Imprecision Sensitivity Analysis and Ontological Views on Imprecision 7.2.2 The Robustness Shock, Sensitivity Analysis 7.2.3 Imprecision as a Modelling Tool to Express the Quality of Partial Knowledge 7.2.4 The Law of Decreasing Credibility 7.2.5 Imprecise Sampling Models: Typical Models and Motives 7.3 Some Basic Concepts of Statistical Models Relying on Imprecise Probabilities 7.3.1 Most Common Classes of Models and Notation 7.3.2 Imprecise Parametric Statistical Models and Corresponding i.i.d. Samples. 7.4 Generalized Bayesian Inference 7.4.1 Some Selected Results from Traditional Bayesian Statistics. 7.4.2 Sets of Precise Prior Distributions, Robust Bayesian Inference and the Generalized Bayes Rule 7.4.3 A Closer Exemplary Look at a Popular Class of Models: The IDM and Other Models Based on Sets of Conjugate Priors in Exponential Families. 7.4.4 Some Further Comments and a Brief Look at Other Models for Generalized Bayesian Inference 7.5 Frequentist Statistics with Imprecise Probabilities 7.5.1 The Non-robustness of Classical Frequentist Methods. 7.5.2 (Frequentist) Hypothesis Testing under Imprecise Probability: Huber-Strassen Theory and Extensions 7.5.3 Towards a Frequentist Estimation Theory under Imprecise Probabilities Some Basic Criteria and First Results 7.5.4 A Brief Outlook on Frequentist Methods 7.6 Nonparametric Predictive Inference (NPI) 7.6.1 Overview 7.6.2 Applications and Challenges 7.7 A Brief Sketch of Some Further Approaches and Aspects 7.8 Data Imprecision, Partial Identification 7.8.1 Data Imprecision 7.8.2 Cautious Data Completion 7.8.3 Partial Identification and Observationally Equivalent Models 7.8.4 A Brief Outlook on Some Further Aspects 7.9 Some General Further Reading 7.10 Some General Challenges 8 Decision Making 8.1 Non-Sequential Decision Problems 8.1.1 Choosing From a Set of Gambles 8.1.2 Choice Functions for Coherent Lower Previsions 8.2 Sequential Decision Problems 8.2.1 Static Sequential Solutions: Normal Form 8.2.2 Dynamic Sequential Solutions: Extensive Form 8.3 Examples and Applications 8.3.1 Ellsberg s Paradox 8.3.2 Robust Bayesian Statistics 9 Probabilistic Graphical Models 9.1 Introduction 9.2 Credal Sets 9.2.1 Definition and Relation with Lower Previsions 9.2.2 Marginalisation and Conditioning 9.2.3 Composition. 9.3 Independence 9.4 Credal Networks 9.4.1 Non-Separately Specified Credal Networks 9.5 Computing with Credal Networks 9.5.1 Credal Networks Updating 9.5.2 Modelling and Updating with Missing Data 9.5.3 Algorithms for Credal Networks Updating 9.5.4 Inference on Credal Networks as a Multilinear Programming Task 9.6 Further Reading 10 Classification 10.1 Introduction 10.2 Naive Bayes 10.3 Naive Credal Classifier (NCC) 10.4 Extensions and Developments of the Naive Credal Classifier 10.4.1 Lazy Naive Credal Classifier 10.4.2 Credal Model Averaging 10.4.3 Profile-likelihood Classifiers 10.4.4 Tree-Augmented Networks (TAN) 10.5 Tree-based Credal Classifiers 10.5.1 Uncertainty Measures on Credal Sets. The Maximum Entropy Function. 10.5.2 Obtaining Conditional Probability Intervals with the Imprecise Dirichlet Model 10.5.3 Classification Procedure 10.6 Metrics, Experiments and Software 10.6.1 Software. 10.6.2 Experiments. 11 Stochastic Processes 11.1 The Classical Characterization of Stochastic Processes 11.1.1 Basic Definitions 11.1.2 Precise Markov Chains 11.2 Event-driven Random Processes 11.3 Imprecise Markov Chains 11.3.1 From Precise to Imprecise Markov Chains 11.3.2 Imprecise Markov Models under Epistemic Irrelevance. 11.3.3 Imprecise Markov Models Under Strong Independence. 11.3.4 When Does the Interpretation of Independence (not) Matter? 11.4 Limit Behaviour of Imprecise Markov Chains 11.4.1 Metric Properties of Imprecise Probability Models 11.4.2 The Perron-Frobenius Theorem 11.4.3 Invariant Distributions 11.4.4 Coefficients of Ergodicity 11.4.5 Coefficients of Ergodicity for Imprecise Markov Chains. 11.5 Further Reading 12 Financial Risk Measurement 12.1 Introduction 12.2 Imprecise Previsions and Betting 12.3 Imprecise Previsions and Risk Measurement 12.3.1 Risk Measures as Imprecise Previsions 12.3.2 Coherent Risk Measures 12.3.3 Convex Risk Measures (and Previsions) 12.4 Further Reading 13 Engineering 13.1 Introduction 13.2 Probabilistic Dimensioning in a Simple Example 13.3 Random Set Modelling of the Output Variability 13.4 Sensitivity Analysis 13.5 Hybrid Models. 13.6 Reliability Analysis and Decision Making in Engineering 13.7 Further Reading 14 Reliability and Risk 14.1 Introduction 14.2 Stress-strength Reliability 14.3 Statistical Inference in Reliability and Risk 14.4 NPI in Reliablity and Risk 14.5 Discussion and Research Challenges 15 Elicitation 15.1 Methods and Issues 15.2 Evaluating Imprecise Probability Judgements 15.3 Factors Affecting Elicitation 15.4 Further Reading 16 Computation 16.1 Introduction 16.2 Natural Extension 16.2.1 Conditional Lower Previsions with Arbitrary Domains. 16.2.2 The Walley-Pelessoni-Vicig Algorithm 16.2.3 Choquet Integration 16.2.4 Mobius Inverse 16.2.5 Linear-Vacuous Mixture 16.3 Decision Making 16.3.1 Maximin, Maximax, and Hurwicz 16.3.2 Maximality 16.3.3 E-Admissibility 16.3.4 Interval Dominance References Author index Subject index

...read moreread less

Journal Article•DOI•

Uncertainty in perception and the Hierarchical Gaussian Filter

[...]

Christoph Mathys, Ekaterina I. Lomakina¹, Ekaterina I. Lomakina², Jean Daunizeau, Sandra Iglesias², Kay H. Brodersen², Karl J. Friston³, Klaas E. Stephan², Klaas E. Stephan³ - Show less +5 more•Institutions (3)

ETH Zurich¹, University of Zurich², Wellcome Trust Centre for Neuroimaging³

19 Nov 2014-Frontiers in Human Neuroscience

TL;DR: This paper explicitly formulate the extension of the HGF's hierarchy to any number of levels, and discusses how various forms of uncertainty are accommodated by the minimization of variational free energy as encoded in the update equations.

...read moreread less

Abstract: In its full sense, perception rests on an agent’s model of how its sensory input comes about and the inferences it draws based on this model. These inferences are necessarily uncertain. Here, we illustrate how the hierarchical Gaussian filter (HGF) offers a principled and generic way to deal with the several forms that uncertainty in perception takes. The HGF is a recent derivation of one-step update equations from Bayesian principles that rests on a hierarchical generative model of the environment and its (in)stability. It is computationally highly efficient, allows for online estimates of hidden states, and has found numerous applications to experimental data from human subjects. In this paper, we generalize previous descriptions of the HGF and its account of perceptual uncertainty. First, we explicitly formulate the extension of the HGF’s hierarchy to any number of levels; second, we discuss how various forms of uncertainty are accommodated by the minimization of variational free energy as encoded in the update equations; third, we combine the HGF with decision models and demonstrate the inversion of this combination; finally, we report a simulation study that compared four optimization methods for inverting the HGF/decision model combination at different noise levels. These four methods (Nelder-Mead simplex algorithm, Gaussian process-based global optimization, variational Bayes and Markov chain Monte Carlo sampling) all performed well even under considerable noise, with variational Bayes offering the best combination of efficiency and informativeness of inference. Our results demonstrate that the HGF provides a principled, flexible, and efficient - but at the same time intuitive - framework for the resolution of perceptual uncertainty in behaving agents.

...read moreread less

Journal Article•DOI•

Optional stopping: No problem for Bayesians

[...]

Jeffrey N. Rouder¹•Institutions (1)

University of Missouri¹

22 Mar 2014-Psychonomic Bulletin & Review

TL;DR: In this article, it is shown through simulation that the interpretation of Bayesian quantities does not depend on the stopping rule, and the proper interpretation ofBayesian quantities as measures of subjective belief on theoretical positions is emphasized.

...read moreread less

Abstract: Optional stopping refers to the practice of peeking at data and then, based on the results, deciding whether or not to continue an experiment. In the context of ordinary significance-testing analysis, optional stopping is discouraged, because it necessarily leads to increased type I error rates over nominal values. This article addresses whether optional stopping is problematic for Bayesian inference with Bayes factors. Statisticians who developed Bayesian methods thought not, but this wisdom has been challenged by recent simulation results of Yu, Sprenger, Thomas, and Dougherty (2013) and Sanborn and Hills (2013). In this article, I show through simulation that the interpretation of Bayesian quantities does not depend on the stopping rule. Researchers using Bayesian methods may employ optional stopping in their own research and may provide Bayesian analysis of secondary data regardless of the employed stopping rule. I emphasize here the proper interpretation of Bayesian quantities as measures of subjective belief on theoretical positions, the difference between frequentist and Bayesian interpretations, and the difficulty of using frequentist intuition to conceptualize the Bayesian approach.

...read moreread less

Dissertation•

Perception as Bayesian Inference

[...]

Adam Binch

01 Dec 2014

Journal Article•DOI•

Robust meta-analytic-predictive priors in clinical trials with historical control information

[...]

Heinz Schmidli¹, Sandro Gsteiger², Satrajit Roychoudhury¹, Anthony O'Hagan³, David Spiegelhalter⁴, Beat Neuenschwander¹ - Show less +2 more•Institutions (4)

Novartis¹, University of Bern², University of Sheffield³, University of Cambridge⁴

01 Dec 2014-Biometrics

TL;DR: This work derives a Bayesian meta-analytic-predictive prior from historical data, which is then combined with the new data, and proposes two- or three-component mixtures of standard priors, which allow for good approximations and, for the one-parameter exponential family, straightforward posterior calculations.

...read moreread less

Abstract: Historical information is always relevant for clinical trial design. Additionally, if incorporated in the analysis of a new trial, historical data allow to reduce the number of subjects. This decreases costs and trial duration, facilitates recruitment, and may be more ethical. Yet, under prior-data conflict, a too optimistic use of historical data may be inappropriate. We address this challenge by deriving a Bayesian meta-analytic-predictive prior from historical data, which is then combined with the new data. This prospective approach is equivalent to a meta-analytic-combined analysis of historical and new data if parameters are exchangeable across trials. The prospective Bayesian version requires a good approximation of the meta-analytic-predictive prior, which is not available analytically. We propose two- or three-component mixtures of standard priors, which allow for good approximations and, for the one-parameter exponential family, straightforward posterior calculations. Moreover, since one of the mixture components is usually vague, mixture priors will often be heavy-tailed and therefore robust. Further robustness and a more rapid reaction to prior-data conflicts can be achieved by adding an extra weakly-informative mixture component. Use of historical prior information is particularly attractive for adaptive trials, as the randomization ratio can then be changed in case of prior-data conflict. Both frequentist operating characteristics and posterior summaries for various data scenarios show that these designs have desirable properties. We illustrate the methodology for a phase II proof-of-concept trial with historical controls from four studies. Robust meta-analytic-predictive priors alleviate prior-data conflicts ' they should encourage better and more frequent use of historical data in clinical trials.

...read moreread less

Journal Article•

Amortized Inference in Probabilistic Reasoning

[...]

Samuel J. Gershman¹, Noah D. Goodman²•Institutions (2)

Massachusetts Institute of Technology¹, Stanford University²

01 Jan 2014-Cognitive Science

TL;DR: It is argued that the brain oper- ates in the setting of amortized inference, where numerous related queries must be answered (e.g., recognizing a scene from multiple viewpoints); in this setting, memoryless algo- rithms can be computationally wasteful.

...read moreread less

Journal Article•DOI•

Ancillarity-sufficiency interweaving strategy (ASIS) for boosting MCMC estimation of stochastic volatility models

[...]

Gregor Kastner¹, Sylvia Frühwirth-Schnatter¹•Institutions (1)

Vienna University of Economics and Business¹

01 Aug 2014-Computational Statistics & Data Analysis

TL;DR: It is demonstrated how an ancillarity-sufficiency interweaving strategy can be applied to stochastic volatility models in order to greatly improve sampling efficiency for all parameters and throughout the entire parameter range.

...read moreread less

Journal Article•DOI•

One and done? Optimal decisions from very few samples.

[...]

Edward Vul¹, Noah D. Goodman², Thomas L. Griffiths³, Joshua B. Tenenbaum⁴•Institutions (4)

University of California, San Diego¹, Stanford University², University of California, Berkeley³, Massachusetts Institute of Technology⁴

01 May 2014-Cognitive Science

TL;DR: Under reasonable assumptions about the time costs of sampling, making many quick but locally suboptimal decisions based on very few samples may be the globally optimal strategy over long periods.

...read moreread less

One and Done? Optimal Decisions From Very Few Samples

[...]

Edward Vul¹, Noah D. Goodman², Thomas L. Griffiths³, Joshua B. Tenenbaum⁴•Institutions (4)

University of California, San Diego¹, Stanford University², University of California, Berkeley³, Massachusetts Institute of Technology⁴

01 Jan 2014

TL;DR: In this article, it was shown that making many quick but locally suboptimal decisions based on very few samples may be the globally optimal strategy over long periods under reasonable assumptions about the time costs of sampling.

...read moreread less

Abstract: In many learning or inference tasks human behavior approximates that of a Bayesian ideal observer, suggesting that, at some level, cognition can be described as Bayesian inference However, a number of findings have highlighted an intriguing mismatch between human behavior and standard assumptions about optimality: People often appear to make decisions based on just one or a few samples from the appropriate posterior probability distribution, rather than using the full distribution Although sampling-based approximations are a common way to implement Bayesian inference, the very limited numbers of samples often used by humans seem insufficient to approximate the required probability distributions very accurately Here, we consider this discrepancy in the broader framework of statistical decision theory, and ask: If people are making decisions based on samples—but as samples are costly—how many samples should people use to optimize their total expected or worst-case reward over a large number of decisions? We find that under reasonable assumptions about the time costs of sampling, making many quick but locally suboptimal decisions based on very few samples may be the globally optimal strategy over long periods These results help to reconcile a large body of work showing sampling-based or probability matching behavior with the hypothesis that human cognition can be understood in Bayesian terms, and they suggest promising future directions for studies of resource-constrained cognition

...read moreread less

Journal Article•DOI•

A novel Bayesian method for inferring and interpreting the dynamics of adaptive landscapes from phylogenetic comparative data.

[...]

Josef C. Uyeda¹, Luke J. Harmon¹•Institutions (1)

University of Idaho¹

30 Jul 2014-Systematic Biology

TL;DR: It is argued that Bayesian model fitting of OU models to comparative data provides a framework for integrating of multiple sources of biological data-such as microevolutionary estimates of selection parameters and paleontological timeseries-allowing inference of adaptive landscape dynamics with explicit, process-based biological interpretations.

...read moreread less

Abstract: Our understanding of macroevolutionary patterns of adaptive evolution has greatly increased with the advent of large-scale phylogenetic comparative methods. Widely used Ornstein-Uhlenbeck (OU) models can describe an adaptive process of divergence and selection. However, inference of the dynamics of adaptive landscapes from comparative data is complicated by interpretational difficulties, lack of identifiability among parameter values and the common requirement that adaptive hypotheses must be assigned a priori. Here, we develop a reversible-jump Bayesian method of fitting multi-optima OU models to phylogenetic comparative data that estimates the placement and magnitude of adaptive shifts directly from the data. We show how biologically informed hypotheses can be tested against this inferred posterior of shift locations using Bayes Factors to establish whether our a priori models adequately describe the dynamics of adaptive peak shifts. Furthermore, we show how the inclusion of informative priors can be used to restrict models to biologically realistic parameter space and test particular biological interpretations of evolutionary models. We argue that Bayesian model fitting of OU models to comparative data provides a framework for integrating of multiple sources of biological data-such as microevolutionary estimates of selection parameters and paleontological timeseries-allowing inference of adaptive landscape dynamics with explicit, process-based biological interpretations.

...read moreread less

Journal Article•DOI•

Neural coding of uncertainty and probability.

[...]

Wei Ji Ma¹, Mehrdad Jazayeri²•Institutions (2)

Center for Neural Science¹, Massachusetts Institute of Technology²

17 Jul 2014-Annual Review of Neuroscience

TL;DR: This work formalizes this problem using Bayesian decision theory and review recent behavioral and neural evidence that the brain may use knowledge of uncertainty, confidence, and probability to make better decisions.

...read moreread less

Abstract: Organisms must act in the face of sensory, motor, and reward uncertainty stemming from a pandemonium of stochasticity and missing information. In many tasks, organisms can make better decisions if they have at their disposal a representation of the uncertainty associated with task-relevant variables. We formalize this problem using Bayesian decision theory and review recent behavioral and neural evidence that the brain may use knowledge of uncertainty, confidence, and probability.

...read moreread less

Journal Article•DOI•

The best inflationary models after Planck

[...]

Jérôme Martin¹, Christophe Ringeval², Roberto Trotta³, Vincent Vennin¹•Institutions (3)

Institut d'Astrophysique de Paris¹, Université catholique de Louvain², Imperial College London³

19 Mar 2014-Journal of Cosmology and Astroparticle Physics

TL;DR: In this article, the authors compute the Bayesian evidence and complexity of 193 slow-roll single-field models of inflation using the Planck 2013 Cosmic Microwave Background data, with the aim of establishing which models are favoured from a Bayesian perspective.

...read moreread less

Abstract: We compute the Bayesian evidence and complexity of 193 slow-roll single-field models of inflation using the Planck 2013 Cosmic Microwave Background data, with the aim of establishing which models are favoured from a Bayesian perspective. Our calculations employ a new numerical pipeline interfacing an inflationary effective likelihood with the slow-roll library ASPIC and the nested sampling algorithm MultiNest. The models considered represent a complete and systematic scan of the entire landscape of inflationary scenarios proposed so far. Our analysis singles out the most probable models (from an Occam's razor point of view) that are compatible with Planck data, while ruling out with very strong evidence 34% of the models considered. We identify 26% of the models that are favoured by the Bayesian evidence, corresponding to 15 different potential shapes. If the Bayesian complexity is included in the analysis, only 9% of the models are preferred, corresponding to only 9 different potential shapes. These shapes are all of the plateau type.

...read moreread less

Journal Article•DOI•

A Bayesian, spatially-varying calibration model for the TEX86 proxy

[...]

Jessica E. Tierney¹, Martin P. Tingley², Martin P. Tingley³•Institutions (3)

Woods Hole Oceanographic Institution¹, Pennsylvania State University², Harvard University³

15 Feb 2014-Geochimica et Cosmochimica Acta

TL;DR: A Bayesian regression approach is developed and applied to the TEX86–SST calibration that explicitly allows for model parameters to smoothly vary as a function of space, and considers uncertainties in the modern SSTs as well as in the T EX86-SST relationship.

...read moreread less

Journal Article•DOI•

The anatomy of choice: dopamine and decision-making.

[...]

Karl J. Friston¹, Philipp Schwartenbeck¹, Thomas H. B. FitzGerald¹, Michael Moutoussis¹, Timothy E.J. Behrens¹, Raymond J. Dolan¹ - Show less +2 more•Institutions (1)

Wellcome Trust Centre for Neuroimaging¹

05 Nov 2014-Philosophical Transactions of the Royal Society B

TL;DR: Variational Bayes is considered as a scheme that the brain might use for approximate Bayesian inference that optimizes a free energy bound on model evidence and changes in precision during variational updates are remarkably reminiscent of empirical dopaminergic responses.

...read moreread less

Abstract: This paper considers goal-directed decision-making in terms of embodied or active inference. We associate bounded rationality with approximate Bayesian inference that optimizes a free energy bound on model evidence. Several constructs such as expected utility, exploration or novelty bonuses, softmax choice rules and optimism bias emerge as natural consequences of free energy minimization. Previous accounts of active inference have focused on predictive coding. In this paper, we consider variational Bayes as a scheme that the brain might use for approximate Bayesian inference. This scheme provides formal constraints on the computational anatomy of inference and action, which appear to be remarkably consistent with neuroanatomy. Active inference contextualizes optimal decision theory within embodied inference, where goals become prior beliefs. For example, expected utility theory emerges as a special case of free energy minimization, where the sensitivity or inverse temperature (associated with softmax functions and quantal response equilibria) has a unique and Bayes-optimal solution. Crucially, this sensitivity corresponds to the precision of beliefs about behaviour. The changes in precision during variational updates are remarkably reminiscent of empirical dopaminergic responses-and they may provide a new perspective on the role of dopamine in assimilating reward prediction errors to optimize decision-making.

...read moreread less

Journal Article•DOI•

A framework for parameter estimation and model selection from experimental data in systems biology using approximate Bayesian computation

[...]

Juliane Liepe¹, Paul D. W. Kirk¹, Sarah Filippi¹, Tina Toni¹, Chris P. Barnes², Michael P. H. Stumpf¹ - Show less +2 more•Institutions (2)

Imperial College London¹, University College London²

01 Feb 2014-Nature Protocols

TL;DR: An approximate Bayesian computation framework and software environment, ABC-SysBio, which is a Python package that runs on Linux and Mac OS X systems and that enables parameter estimation and model selection in the Bayesian formalism by using sequential Monte Carlo (SMC) approaches is presented.

...read moreread less

Abstract: As modeling becomes a more widespread practice in the life sciences and biomedical sciences, researchers need reliable tools to calibrate models against ever more complex and detailed data. Here we present an approximate Bayesian computation (ABC) framework and software environment, ABC-SysBio, which is a Python package that runs on Linux and Mac OS X systems and that enables parameter estimation and model selection in the Bayesian formalism by using sequential Monte Carlo (SMC) approaches. We outline the underlying rationale, discuss the computational and practical issues and provide detailed guidance as to how the important tasks of parameter inference and model selection can be performed in practice. Unlike other available packages, ABC-SysBio is highly suited for investigating, in particular, the challenging problem of fitting stochastic models to data. In order to demonstrate the use of ABC-SysBio, in this protocol we postulate the existence of an imaginary reaction network composed of seven interrelated biological reactions (involving a specific mRNA, the protein it encodes and a post-translationally modified version of the protein), a network that is defined by two files containing 'observed' data that we provide as supplementary information. In the first part of the PROCEDURE, ABC-SysBio is used to infer the parameters of this system, whereas in the second part we use ABC-SysBio's relevant functionality to discriminate between two different reaction network models, one of them being the 'true' one. Although computationally expensive, the additional insights gained in the Bayesian formalism more than make up for this cost, especially in complex problems.

...read moreread less

Journal Article•DOI•

A Computational Framework for Infinite-Dimensional Bayesian Inverse Problems, Part II: Stochastic Newton MCMC with Application to Ice Sheet Flow Inverse Problems

[...]

Noemi Petra¹, James Martin¹, Georg Stadler, Omar Ghattas¹•Institutions (1)

University of Texas at Austin¹

24 Jul 2014-SIAM Journal on Scientific Computing

TL;DR: Bui-Thanh et al. as mentioned in this paper considered the numerical solution of infinite-dimensional inverse problems in the framework of Bayesian inference and used a Markov chain Monte Carlo (MCMC) sampling method.

...read moreread less

Abstract: We address the numerical solution of infinite-dimensional inverse problems in the framework of Bayesian inference. In Part I of this paper [T. Bui-Thanh, O. Ghattas, J. Martin, and G. Stadler, SIAM J. Sci. Comput., 35 (2013), pp. A2494--A2523] we considered the linearized infinite-dimensional inverse problem. In Part II, we relax the linearization assumption and consider the fully nonlinear infinite-dimensional inverse problem using a Markov chain Monte Carlo (MCMC) sampling method. To address the challenges of sampling high-dimensional probability density functions (pdfs) arising upon discretization of Bayesian inverse problems governed by PDEs, we build upon the stochastic Newton MCMC method. This method exploits problem structure by taking as a proposal density a local Gaussian approximation of the posterior pdf, whose covariance operator is given by the inverse of the local Hessian of the negative log posterior pdf. The construction of the covariance is made tractable by invoking a low-rank approximat...

...read moreread less

Journal Article•DOI•

Bayesian Inference of Infectious Disease Transmission from Whole-Genome Sequence Data

[...]

Xavier Didelot¹, Jennifer L. Gardy², Caroline Colijn¹•Institutions (2)

Imperial College London¹, University of British Columbia²

01 Jul 2014-Molecular Biology and Evolution

TL;DR: It is found that under a realistic model of within-host evolution, reconstructions of simulated outbreaks contain substantial uncertainty even when genomic data reflect a high substitution rate, and that Bayesian reconstructions derived from sequence data may form a useful starting point for a genomic epidemiology investigation.

...read moreread less

Abstract: Genomics is increasingly being used to investigate disease outbreaks, but an important question remains unanswered—how well do genomic data capture known transmission events, particularly for pathogens with long carriage periods or large within-host population sizes? Here we present a novel Bayesian approach to reconstruct densely sampled outbreaks from genomic data while considering within-host diversity. We infer a time-labeled phylogeny using Bayesian evolutionary analysis by sampling trees (BEAST), and then infer a transmission network via a Monte Carlo Markov chain. We find that under a realistic model of within-host evolution, reconstructions of simulated outbreaks contain substantial uncertainty even when genomic data reflect a high substitution rate. Reconstruction of a real-world tuberculosis outbreak displayed similar uncertainty, although the correct source case and several clusters of epidemiologically linked cases were identified. We conclude that genomics cannot wholly replace traditional epidemiology but that Bayesian reconstructions derived from sequence data may form a useful starting point for a genomic epidemiology investigation.

...read moreread less

Journal Article•DOI•

Discrete Bayesian Network Classifiers: A Survey

[...]

Concha Bielza¹, Pedro Larrañaga¹•Institutions (1)

Technical University of Madrid¹

14 Jul 2014-ACM Computing Surveys

TL;DR: This article surveys the whole set of discrete Bayesian network classifiers devised to date, organized in increasing order of structure complexity: naive Bayes, selective naive Baye, seminaive Bayer, one-dependence Bayesian classifiers, k-dependency Bayesianclassifiers, Bayes network-augmented naiveBayes, Markov blanket-based Bayesian Classifier, unrestricted BayesianClassifiers, and Bayesian multinets.

...read moreread less

Abstract: We have had to wait over 30 years since the naive Bayes model was first introduced in 1960 for the so-called Bayesian network classifiers to resurge. Based on Bayesian networks, these classifiers have many strengths, like model interpretability, accommodation to complex data and classification problem settings, existence of efficient algorithms for learning and classification tasks, and successful applicability in real-world problems. In this article, we survey the whole set of discrete Bayesian network classifiers devised to date, organized in increasing order of structure complexity: naive Bayes, selective naive Bayes, seminaive Bayes, one-dependence Bayesian classifiers, k-dependence Bayesian classifiers, Bayesian network-augmented naive Bayes, Markov blanket-based Bayesian classifier, unrestricted Bayesian classifiers, and Bayesian multinets. Issues of feature subset selection and generative and discriminative structure and parameter learning are also covered.

...read moreread less

Proceedings Article•

Doubly Stochastic Variational Bayes for non-Conjugate Inference

[...]

Michalis K. Titsias¹, Miguel L zaro-gredilla²•Institutions (2)

Athens University of Economics and Business¹, Charles III University of Madrid²

21 Jun 2014

TL;DR: This article proposed a simple and effective variational inference algorithm based on stochastic optimisation that can be widely applied for Bayesian non-conjugate inference in continuous parameter spaces.

...read moreread less

Abstract: We propose a simple and effective variational inference algorithm based on stochastic optimisation that can be widely applied for Bayesian non-conjugate inference in continuous parameter spaces. This algorithm is based on stochastic approximation and allows for efficient use of gradient information from the model joint density. We demonstrate these properties using illustrative examples as well as in challenging and diverse Bayesian inference problems such as variable selection in logistic regression and fully Bayesian inference over kernel hyperparameters in Gaussian process regression.

...read moreread less

Collapse