scispace - formally typeset
Search or ask a question

Showing papers on "Bayesian probability published in 2012"


Journal ArticleDOI
TL;DR: The new version provides convergence diagnostics and allows multiple analyses to be run in parallel with convergence progress monitored on the fly, and provides more output options than previously, including samples of ancestral states, site rates, site dN/dS rations, branch rates, and node dates.
Abstract: Since its introduction in 2001, MrBayes has grown in popularity as a software package for Bayesian phylogenetic inference using Markov chain Monte Carlo (MCMC) methods. With this note, we announce the release of version 3.2, a major upgrade to the latest official release presented in 2003. The new version provides convergence diagnostics and allows multiple analyses to be run in parallel with convergence progress monitored on the fly. The introduction of new proposals and automatic optimization of tuning parameters has improved convergence for many problems. The new version also sports significantly faster likelihood calculations through streaming single-instruction-multiple-data extensions (SSE) and support of the BEAGLE library, allowing likelihood calculations to be delegated to graphics processing units (GPUs) on compatible hardware. Speedup factors range from around 2 with SSE code to more than 50 with BEAGLE for codon problems. Checkpointing across all models allows long runs to be completed even when an analysis is prematurely terminated. New models include relaxed clocks, dating, model averaging across time-reversible substitution models, and support for hard, negative, and partial (backbone) tree constraints. Inference of species trees from gene trees is supported by full incorporation of the Bayesian estimation of species trees (BEST) algorithms. Marginal model likelihoods for Bayes factor tests can be estimated accurately across the entire model space using the stepping stone method. The new version provides more output options than previously, including samples of ancestral states, site rates, site d(N)/d(S) rations, branch rates, and node dates. A wide range of statistics on tree parameters can also be output for visualization in FigTree and compatible software.

18,718 citations


Book
22 Dec 2012
TL;DR: An overview of statistical decision theory, which emphasizes the use and application of the philosophical ideas and mathematical structure of decision theory.
Abstract: 1. Basic concepts 2. Utility and loss 3. Prior information and subjective probability 4. Bayesian analysis 5. Minimax analysis 6. Invariance 7. Preposterior and sequential analysis 8. Complete and essentially complete classes Appendices.

5,573 citations


Journal ArticleDOI
TL;DR: This article proposes a new approach to factor analysis and structural equation modeling using Bayesian analysis, which replaces parameter specifications of exact zeros with approximate zeros based on informative, small-variance priors.
Abstract: This article proposes a new approach to factor analysis and structural equation modeling using Bayesian analysis. The new approach replaces parameter specifications of exact zeros with approximate zeros based on informative, small-variance priors. It is argued that this produces an analysis that better reflects substantive theories. The proposed Bayesian approach is particularly beneficial in applications where parameters are added to a conventional model such that a nonidentified model is obtained if maximum-likelihood estimation is applied. This approach is useful for measurement aspects of latent variable modeling, such as with confirmatory factor analysis, and the measurement part of structural equation modeling. Two application areas are studied, cross-loadings and residual correlations in confirmatory factor analysis. An example using a full structural equation model is also presented, showing an efficient way to find model misspecification. The approach encompasses 3 elements: model testing using posterior predictive checking, model estimation, and model modification. Monte Carlo simulations and real data are analyzed using Mplus. The real-data analyses use data from Holzinger and Swineford's (1939) classic mental abilities study, Big Five personality factor data from a British survey, and science achievement data from the National Educational Longitudinal Study of 1988.

1,045 citations


Book
02 Oct 2012
TL;DR: Introduction: Probability and Parameters Probability Probability distributions Calculating properties of probability distributions Monte Carlo integration Monte Carlo Simulations Using BUGS using BUGs to simulate from distributions Transformations of random variables Complex calculations using Monte Carlo Multivariate Monte Carlo analysis Predictions with unknown parameters
Abstract: Introduction: Probability and Parameters Probability Probability distributions Calculating properties of probability distributions Monte Carlo integration Monte Carlo Simulations Using BUGS Introduction to BUGS DoodleBUGS Using BUGS to simulate from distributions Transformations of random variables Complex calculations using Monte Carlo Multivariate Monte Carlo analysis Predictions with unknown parameters Introduction to Bayesian Inference Bayesian learning Posterior predictive distributions Conjugate Bayesian inference Inference about a discrete parameter Combinations of conjugate analyses Bayesian and classical methods Introduction to Markov Chain Monte Carlo Methods Bayesian computation Initial values Convergence Efficiency and accuracy Beyond MCMC Prior Distributions Different purposes of priors Vague, 'objective' and 'reference' priors Representation of informative priors Mixture of prior distributions Sensitivity analysis Regression Models Linear regression with normal errors Linear regression with non-normal errors Nonlinear regression with normal errors Multivariate responses Generalised linear regression models Inference on functions of parameters Further reading Categorical Data 2 x 2 tables Multinomial models Ordinal regression Further reading Model Checking and Comparison Introduction Deviance Residuals Predictive checks and Bayesian p-values Model assessment by embedding in larger models Model comparison using deviances Bayes factors Model uncertainty Discussion on model comparison Prior-data conflict Issues in Modelling Missing data Prediction Measurement error Cutting feedback New distributions Censored, truncated and grouped observations Constrained parameters Bootstrapping Ranking Hierarchical Models Exchangeability Priors Hierarchical regression models Hierarchical models for variances Redundant parameterisations More general formulations Checking of hierarchical models Comparison of hierarchical models Further resources Specialised Models Time-to-event data Time series models Spatial models Evidence synthesis Differential equation and pharmacokinetic models Finite mixture and latent class models Piecewise parametric models Bayesian nonparametric models Different Implementations of BUGS Introduction BUGS engines and interfaces Expert systems and MCMC methods Classic BUGS WinBUGS OpenBUGS JAGS A Appendix: BUGS Language Syntax Introduction Distributions Deterministic functions Repetition Multivariate quantities Indexing Data transformations Commenting B Appendix: Functions in BUGS Standard functions Trigonometric functions Matrix algebra Distribution utilities and model checking Functionals and differential equations Miscellaneous C Appendix: Distributions in BUGS Continuous univariate, unrestricted range Continuous univariate, restricted to be positive Continuous univariate, restricted to a finite interval Continuous multivariate distributions Discrete univariate distributions Discrete multivariate distributions Bibliography Index

772 citations


Book
07 Nov 2012
TL;DR: Risk Assessment and Decision Analysis with Bayesian Networks explains how to incorporate knowledge with data to develop and use (Bayesian) causal models of risk that provide powerful insights and better decision making.
Abstract: Although many Bayesian Network (BN) applications are now in everyday use, BNs have not yet achieved mainstream penetration. Focusing on practical real-world problem solving and model building, as opposed to algorithms and theory, Risk Assessment and Decision Analysis with Bayesian Networks explains how to incorporate knowledge with data to develop and use (Bayesian) causal models of risk that provide powerful insights and better decision making. Provides all tools necessary to build and run realistic Bayesian network models Supplies extensive example models based on real risk assessment problems in a wide range of application domains provided; for example, finance, safety, systems reliability, law, and more Introduces all necessary mathematics, probability, and statistics as needed The book first establishes the basics of probability, risk, and building and using BN models, then goes into the detailed applications. The underlying BN algorithms appear in appendices rather than the main text since there is no need to understand them to build and use BN models. Keeping the body of the text free of intimidating mathematics, the book provides pragmatic advice about model building to ensure models are built efficiently. A dedicated website, www.BayesianRisk.com, contains executable versions of all of the models described, exercises and worked solutions for all chapters, PowerPoint slides, numerous other resources, and a free downloadable copy of the AgenaRisk software.

721 citations


Journal ArticleDOI
TL;DR: In this article, a Bayesian maximum a posteriori (MAP) approach is presented, where a subset of highly correlated and quiet stars is used to generate a cotrending basis vector set, which is in turn used to establish a range of "reasonable" robust fit parameters.
Abstract: With the unprecedented photometric precision of the Kepler spacecraft, significant systematic and stochastic errors on transit signal levels are observable in the Kepler photometric data. These errors, which include discontinuities, outliers, systematic trends, and other instrumental signatures, obscure astrophysical signals. The presearch data conditioning (PDC) module of the Kepler data analysis pipeline tries to remove these errors while preserving planet transits and other astrophysically interesting signals. The completely new noise and stellar vari- ability regime observed inKepler data poses a significant problem to standard cotrending methods. Variable stars are often of particular astrophysical interest, so the preservation of their signals is of significant importance to the astrophysical community. We present a Bayesian maximum a posteriori (MAP) approach, where a subset of highly correlated and quiet stars is used to generate a cotrending basis vector set, which is in turn used to establish a range of "reasonable" robust fit parameters. These robust fit parameters are then used to generate a Bayesian prior and a Bayesian posterior probability distribution function (PDF) which, when maximized, finds the best fit that simulta- neously removes systematic effects while reducing the signal distortion and noise injection that commonly afflicts simple least-squares (LS) fitting. A numerical and empirical approach is taken where the Bayesian prior PDFs are generated from fits to the light-curve distributions themselves.

721 citations


Book
19 Jun 2012
TL;DR: This text is a reprint of the seminal 1989 book Probabilistic Reasoning in Expert systems: Theory and Algorithms, which helped serve to create the field the authors now call Bayesian networks and provides an insightful comparison of the two most prominent approaches to probability.
Abstract: This text is a reprint of the seminal 1989 book Probabilistic Reasoning in Expert systems: Theory and Algorithms, which helped serve to create the field we now call Bayesian networks. It introduces the properties of Bayesian networks (called causal networks in the text), discusses algorithms for doing inference in Bayesian networks, covers abductive inference, and provides an introduction to decision analysis. Furthermore, it compares rule-base experts systems to ones based on Bayesian networks, and it introduces the frequentist and Bayesian approaches to probability. Finally, it provides a critique of the maximum entropy formalism. Probabilistic Reasoning in Expert Systems was written from the perspective of a mathematician with the emphasis being on the development of theorems and algorithms. Every effort was made to make the material accessible. There are ample examples throughout the text. This text is important reading for anyone interested in both the fundamentals of Bayesian networks and in the history of how they came to be. It also provides an insightful comparison of the two most prominent approaches to probability.

687 citations


Journal ArticleDOI
TL;DR: This tutorial is a high-level introduction to Bayesian nonparametric methods and contains several examples of their application.

549 citations


Journal ArticleDOI
TL;DR: In this article, a Bayesian Maximum A Posteriori (MAP) approach is presented where a subset of highly correlated and quiet stars is used to generate a cotrending basis vector set which is in turn used to establish a range of "reasonable" robust fit parameters.
Abstract: With the unprecedented photometric precision of the Kepler Spacecraft, significant systematic and stochastic errors on transit signal levels are observable in the Kepler photometric data. These errors, which include discontinuities, outliers, systematic trends and other instrumental signatures, obscure astrophysical signals. The Presearch Data Conditioning (PDC) module of the Kepler data analysis pipeline tries to remove these errors while preserving planet transits and other astrophysically interesting signals. The completely new noise and stellar variability regime observed in Kepler data poses a significant problem to standard cotrending methods such as SYSREM and TFA. Variable stars are often of particular astrophysical interest so the preservation of their signals is of significant importance to the astrophysical community. We present a Bayesian Maximum A Posteriori (MAP) approach where a subset of highly correlated and quiet stars is used to generate a cotrending basis vector set which is in turn used to establish a range of "reasonable" robust fit parameters. These robust fit parameters are then used to generate a Bayesian Prior and a Bayesian Posterior Probability Distribution Function (PDF) which when maximized finds the best fit that simultaneously removes systematic effects while reducing the signal distortion and noise injection which commonly afflicts simple least-squares (LS) fitting. A numerical and empirical approach is taken where the Bayesian Prior PDFs are generated from fits to the light curve distributions themselves.

520 citations


Journal ArticleDOI
TL;DR: The method uses Bayesian transdimensional Markov Chain Monte Carlo and allows a wide range of possible thermal history models to be considered as general prior information on time, temperature (and temperature offset for multiple samples in a vertical profile).
Abstract: [1] A new approach for inverse thermal history modeling is presented. The method uses Bayesian transdimensional Markov Chain Monte Carlo and allows us to specify a wide range of possible thermal history models to be considered as general prior information on time, temperature (and temperature offset for multiple samples in a vertical profile). We can also incorporate more focused geological constraints in terms of more specific priors. The Bayesian approach naturally prefers simpler thermal history models (which provide an adequate fit to the observations), and so reduces the problems associated with over interpretation of inferred thermal histories. The output of the method is a collection or ensemble of thermal histories, which quantifies the range of accepted models in terms a (posterior) probability distribution. Individual models, such as the best data fitting (maximum likelihood) model or the expected model (effectively the weighted mean from the posterior distribution) can be examined. Different data types (e.g., fission track, U-Th/He, 40Ar/39Ar) can be combined, requiring just a data-specific predictive forward model and data fit (likelihood) function. To demonstrate the main features and implementation of the approach, examples are presented using both synthetic and real data.

514 citations


Journal ArticleDOI
TL;DR: The conceptual and theoretical foundations for the Bayesian information criterion are reviewed, and its properties and applications are discussed.
Abstract: The Bayesian information criterion BIC is one of the most widely known and pervasively used tools in statistical model selection Its popularity is derived from its computational simplicity and effective performance in many modeling frameworks, including Bayesian applications where prior distributions may be elusive The criterion was derived by Schwarz Ann Stat 1978, 6:461-464 to serve as an asymptotic approximation to a transformation of the Bayesian posterior probability of a candidate model This article reviews the conceptual and theoretical foundations for BIC, and also discusses its properties and applications WIREs Comput Stat 2012, 4:199-203 doi: 101002/wics199

Book ChapterDOI
07 Oct 2012
TL;DR: This paper revisits the classical Bayesian face recognition method by Baback Moghaddam et al. and proposes a new joint formulation that leads to an EM-like model learning at the training time and an efficient, closed-formed computation at the test time.
Abstract: In this paper, we revisit the classical Bayesian face recognition method by Baback Moghaddam et al. and propose a new joint formulation. The classical Bayesian method models the appearance difference between two faces. We observe that this "difference" formulation may reduce the separability between classes. Instead, we model two faces jointly with an appropriate prior on the face representation. Our joint formulation leads to an EM-like model learning at the training time and an efficient, closed-formed computation at the test time. On extensive experimental evaluations, our method is superior to the classical Bayesian face and many other supervised approaches. Our method achieved 92.4% test accuracy on the challenging Labeled Face in Wild (LFW) dataset. Comparing with current best commercial system, we reduced the error rate by 10%.

01 Jan 2012
TL;DR: This work presents a generalization of the GMRF model that allows for the analysis of multilocus sequence data and recovers an older and more reconcilable TMRCA for a classic ancient DNA data set.
Abstract: Effective population size is fundamental in population genetics and characterizes genetic diversity. To infer past population dynamics from molecular sequence data, coalescent-based models have been developed for Bayesian nonparametric estimation of effective population size over time. Among the most successful is a Gaussian Markov random field (GMRF) model for a single gene locus. Here, we present a generalization of the GMRF model that allows for the analysis of multilocus sequence data. Using simulated data, we demonstrate the improved performance of our method to recover true population trajectories and the time to the most recent common ancestor (TMRCA). We analyze a multilocus alignment of HIV-1 CRF02_AG gene sequences sampled from Cameroon. Our results are consistent with HIV p revalence data and uncover some aspects of the population history that go undetected in Bayesian parametric estimation. Finally, we recover an older and more reconcilable TMRCA for a classic ancient DNA data set.

Posted Content
TL;DR: This work presents an alternative algorithm based on stochastic optimization that allows for direct optimization of the variational lower bound and demonstrates the approach on two non-conjugate models: logistic regression and an approximation to the HDP.
Abstract: Mean-field variational inference is a method for approximate Bayesian posterior inference. It approximates a full posterior distribution with a factorized set of distributions by maximizing a lower bound on the marginal likelihood. This requires the ability to integrate a sum of terms in the log joint likelihood using this factorized distribution. Often not all integrals are in closed form, which is typically handled by using a lower bound. We present an alternative algorithm based on stochastic optimization that allows for direct optimization of the variational lower bound. This method uses control variates to reduce the variance of the stochastic search gradient, in which existing lower bounds can play an important role. We demonstrate the approach on two non-conjugate models: logistic regression and an approximation to the HDP.

Journal ArticleDOI
TL;DR: It is argued that many of the important constraints in Bayesian theories in psychology and neuroscience come from biological, evolutionary, and processing considerations that have no adaptive relevance to the problem per se.
Abstract: According to Bayesian theories in psychology and neuroscience, minds and brains are (near) optimal in solving a wide range of tasks. We challenge this view and argue that more traditional, non-Bayesian approaches are more promising. We make 3 main arguments. First, we show that the empirical evidence for Bayesian theories in psychology is weak. This weakness relates to the many arbitrary ways that priors, likelihoods, and utility functions can be altered in order to account for the data that are obtained, making the models unfalsifiable. It further relates to the fact that Bayesian theories are rarely better at predicting data compared with alternative (and simpler) non-Bayesian theories. Second, we show that the empirical evidence for Bayesian theories in neuroscience is weaker still. There are impressive mathematical analyses showing how populations of neurons could compute in a Bayesian manner but little or no evidence that they do. Third, we challenge the general scientific approach that characterizes Bayesian theorizing in cognitive science. A common premise is that theories in psychology should largely be constrained by a rational analysis of what the mind ought to do. We question this claim and argue that many of the important constraints come from biological, evolutionary, and processing (algorithmic) considerations that have no adaptive relevance to the problem per se. In our view, these factors have contributed to the development of many Bayesian “just so” stories in psychology and neuroscience; that is, mathematical analyses of cognition that can be used to explain almost any behavior as optimal.

Journal ArticleDOI
TL;DR: In this paper, the half-Cauchy distribution is proposed as a default prior for a top-level scale parameter in Bayesian hierarchical models, at least for cases where a proper prior is necessary.
Abstract: This paper argues that the half-Cauchy distribution should replace the inverseGamma distribution as a default prior for a top-level scale parameter in Bayesian hierarchical models, at least for cases where a proper prior is necessary. Our arguments involve a blend of Bayesian and frequentist reasoning, and are intended to complement the original case made by Gelman (2006) in support of the folded-t family of priors. First, we generalize the half-Cauchy prior to the wider class of hypergeometric inverted-beta priors. We derive expressions for posterior moments and marginal densities when these priors are used for a top-level normal variance in a Bayesian hierarchical model. We go on to prove a proposition that, together with the results for moments and marginals, allows us to characterize the frequentist risk of the Bayes estimators under all global-shrinkage priors in the class. These theoretical results, in turn, allow us to study the frequentist properties of the half-Cauchy prior versus a wide class of alternatives. The half-Cauchy occupies a sensible “middle ground” within this class: it performs very well near the origin, but does not lead to drastic compromises in other parts of the parameter space. This provides an alternative, classical justification for the repeated, routine use of this prior. We also consider situations where the underlying mean vector is sparse, where we argue that the usual conjugate choice of an inverse-gamma prior is particularly inappropriate, and can lead to highly distorted posterior inferences. Finally, we briefly summarize some open issues in the specification of default priors for scale terms in hierarchical models.


Journal ArticleDOI
TL;DR: A unified review of Bayesian predictive model assessment and selection methods, and of methods closely related to them, with an emphasis on how each method approximates the expected utility of using a Bayesian model for the purpose of predicting future data.
Abstract: To date, several methods exist in the statistical literature for model assessment, which purport themselves specifically as Bayesian predictive methods. The decision theoretic assumptions on which these methods are based are not always clearly stated in the original articles, however. The aim of this survey is to provide a unified review of Bayesian predictive model assessment and selection methods, and of methods closely related to them. We review the various assumptions that are made in this context and discuss the connections between different approaches, with an emphasis on how each method approximates the expected utility of using a Bayesian model for the purpose of predicting future data.

Proceedings Article
21 Mar 2012
TL;DR: It is proved that the corresponding algorithm, termed BayesUCB, satisfies finite-time regret bounds that imply its asymptotic optimality and gives a general formulation for a class of Bayesian index policies that rely on quantiles of the posterior distribution.
Abstract: Stochastic bandit problems have been analyzed from two different perspectives: a frequentist view, where the parameter is a deterministic unknown quantity, and a Bayesian approach, where the parameter is drawn from a prior distribution. We show in this paper that methods derived from this second perspective prove optimal when evaluated using the frequentist cumulated regret as a measure of performance. We give a general formulation for a class of Bayesian index policies that rely on quantiles of the posterior distribution. For binary bandits, we prove that the corresponding algorithm, termed Bayes-UCB, satisfies finite-time regret bounds that imply its asymptotic optimality. More generally, Bayes-UCB appears as an unifying framework for several variants of the UCB algorithm addressing different bandit problems (parametric multi-armed bandits, Gaussian bandits with unknown mean and variance, linear bandits). But the generality of the Bayesian approach makes it possible to address more challenging models. In particular, we show how to handle linear bandits with sparsity constraints by resorting to Gibbs sampling.

Journal ArticleDOI
TL;DR: A robust recurrent neural network is presented in a Bayesian framework based on echo state mechanisms that is robust in the presence of outliers and is superior to existing methods.
Abstract: In this paper, a robust recurrent neural network is presented in a Bayesian framework based on echo state mechanisms. Since the new model is capable of handling outliers in the training data set, it is termed as a robust echo state network (RESN). The RESN inherits the basic idea of ESN learning in a Bayesian framework, but replaces the commonly used Gaussian distribution with a Laplace one, which is more robust to outliers, as the likelihood function of the model output. Moreover, the training of the RESN is facilitated by employing a bound optimization algorithm, based on which, a proper surrogate function is derived and the Laplace likelihood function is approximated by a Gaussian one, while remaining robust to outliers. It leads to an efficient method for estimating model parameters, which can be solved by using a Bayesian evidence procedure in a fully autonomous way. Experimental results show that the proposed method is robust in the presence of outliers and is superior to existing methods.

Journal ArticleDOI
TL;DR: This tutorial describes the mean-field variational Bayesian approximation to inference in graphical models, using modern machine learning terminology rather than statistical physics concepts, and derives local node updates and reviews the recent Variational Message Passing framework.
Abstract: This tutorial describes the mean-field variational Bayesian approximation to inference in graphical models, using modern machine learning terminology rather than statistical physics concepts. It begins by seeking to find an approximate mean-field distribution close to the target joint in the KL-divergence sense. It then derives local node updates and reviews the recent Variational Message Passing framework.

Journal ArticleDOI
TL;DR: This work assesses an alternative to MCMC based on a simple variational approximation to retain useful features of Bayesian variable selection at a reduced cost and illustrates how these results guide the use of variational inference for a genome-wide association study with thousands of samples and hundreds of thousands of variables.
Abstract: The Bayesian approach to variable selection in regression is a powerful tool for tackling many scientific problems. Inference for variable selection models is usually implemented using Markov chain Monte Carlo (MCMC). Because MCMC can impose a high computational cost in studies with a large number of variables, we assess an alternative to MCMC based on a simple variational approximation. Our aim is to retain useful features of Bayesian variable selection at a reduced cost. Using simulations designed to mimic genetic association studies, we show that this simple variational approximation yields posterior inferences in some settings that closely match exact values. In less restrictive (and more realistic) conditions, we show that posterior probabilities of inclusion for individual variables are often incorrect, but variational estimates of other useful quantities|including posterior distributions of the hyperparameters|are remarkably accurate. We illustrate how these results guide the use of variational inference for a genome-wide association study with thousands of samples and hundreds of thousands of variables.

Journal ArticleDOI
TL;DR: Using existing and new metrics for gauging Bayesian network model performance and uncertainty can vitally bolster model credibility, acceptance, and appropriate application, particularly when informing management decisions.


Journal ArticleDOI
TL;DR: In terms of both covariance matrix estimation and graphical structure learning, the Bayesian adaptive graphical lasso appears to be the top overall performer among a range of frequentist and Bayesian methods.
Abstract: Recently, the graphical lasso procedure has become popular in estimating Gaussian graphical models. In this paper, we introduce a fully Bayesian treatment of graphical lasso models. We first investigate the graphical lasso prior that has been relatively unexplored. Using data augmentation, we develop a simple but highly efficient block Gibbs sampler for simulating covariance matrices. We then generalize the Bayesian graphical lasso to the Bayesian adaptive graphical lasso. Finally, we illustrate and compare the results from our approach to those obtained using the standard graphical lasso procedures for real and simulated data. In terms of both covariance matrix estimation and graphical structure learning, the Bayesian adaptive graphical lasso appears to be the top overall performer among a range of frequentist and Bayesian methods.

Journal ArticleDOI
TL;DR: An approach of Thompson (1933) which makes use of samples from the posterior distributions for the instantaneous value of each action is considered, and a new algorithm, Optimistic Bayesian Sampling (OBS), which performs competitively when compared to recently proposed benchmark algorithms and outperforms Thompson's method throughout.
Abstract: In sequential decision problems in an unknown environment, the decision maker often faces a dilemma over whether to explore to discover more about the environment, or to exploit current knowledge. We address the exploration-exploitation dilemma in a general setting encompassing both standard and contextualised bandit problems. The contextual bandit problem has recently resurfaced in attempts to maximise click-through rates in web based applications, a task with significant commercial interest. In this article we consider an approach of Thompson (1933) which makes use of samples from the posterior distributions for the instantaneous value of each action. We extend the approach by introducing a new algorithm, Optimistic Bayesian Sampling (OBS), in which the probability of playing an action increases with the uncertainty in the estimate of the action value. This results in better directed exploratory behaviour. We prove that, under unrestrictive assumptions, both approaches result in optimal behaviour with respect to the average reward criterion of Yang and Zhu (2002). We implement OBS and measure its performance in simulated Bernoulli bandit and linear regression domains, and also when tested with the task of personalised news article recommendation on a Yahoo! Front Page Today Module data set. We find that OBS performs competitively when compared to recently proposed benchmark algorithms and outperforms Thompson's method throughout.

Proceedings Article
21 Mar 2012
TL;DR: Bayesian model averaging as discussed by the authors is the coherent Bayesian way of combining multiple models only under certain restrictive assumptions, which is the framework for Bayesian model combination (which differs from model averaging) in the context of classification.
Abstract: Bayesian model averaging linearly mixes the probabilistic predictions of multiple models, each weighted by its posterior probability. This is the coherent Bayesian way of combining multiple models only under certain restrictive assumptions, which we outline. We explore a general framework for Bayesian model combination (which differs from model averaging) in the context of classification. This framework explicitly models the relationship between each model’s output and the unknown true label. The framework does not require that the models be probabilistic (they can even be human assessors), that they share prior information or receive the same training data, or that they be independent in their errors. Finally, the Bayesian combiner does not need to believe any of the models is in fact correct. We test several variants of this classifier combination procedure starting from a classic statistical model proposed by Dawid and Skene (1979) and using MCMC to add more complex but important features to the model. Comparisons on several data sets to simpler methods like majority voting show that the Bayesian methods not only perform well but result in interpretable diagnostics on the data points and the models.

Journal ArticleDOI
TL;DR: A highly parallel implementation of the transitional Markov chain Monte Carlo for populating the posterior probability distribution of the MD force-field parameters and efficient scheduling algorithms are proposed to handle the MD model runs and to distribute the computations in clusters with heterogeneous architectures.
Abstract: We present a Bayesian probabilistic framework for quantifying and propagating the uncertainties in the parameters of force fields employed in molecular dynamics (MD) simulations. We propose a highly parallel implementation of the transitional Markov chain Monte Carlo for populating the posterior probability distribution of the MD force-field parameters. Efficient scheduling algorithms are proposed to handle the MD model runs and to distribute the computations in clusters with heterogeneous architectures. Furthermore, adaptive surrogate models are proposed in order to reduce the computational cost associated with the large number of MD model runs. The effectiveness and computational efficiency of the proposed Bayesian framework is demonstrated in MD simulations of liquid and gaseous argon.

Journal ArticleDOI
TL;DR: In this article, the authors investigate the performance of common MCMC proposal distributions in terms of median and variance of run time to convergence on 11 data sets and introduce two new Metropolized Gibbs Samplers for moving through "tree space."
Abstract: Increasingly, large data sets pose a challenge for computationally intensive phylogenetic methods such as Bayesian Markov chain Monte Carlo (MCMC). Here, we investigate the performance of common MCMC proposal distributions in terms of median and variance of run time to convergence on 11 data sets. We introduce two new Metropolized Gibbs Samplers for moving through "tree space." MCMC simulation using these new proposals shows faster average run time and dramatically improved predictability in performance, with a 20-fold reduction in the variance of the time to estimate the posterior distribution to a given accuracy. We also introduce conditional clade probabilities and demonstrate that they provide a superior means of approximating tree topology posterior probabilities from samples recorded during MCMC.

Journal ArticleDOI
TL;DR: A variational free-energy formulation of (partially observable) Markov decision problems in decision making under uncertainty is described, showing that optimal control can be cast as active inference and leading to a distinction between models with and without inference on hidden control states.
Abstract: This paper describes a variational free-energy formulation of (partially observable) Markov decision problems in decision making under uncertainty. We show that optimal control can be cast as active inference. In active inference, both action and posterior beliefs about hidden states minimise a free energy bound on the negative log-likelihood of observed states, under a generative model. In this setting, reward or cost functions are absorbed into prior beliefs about state transitions and terminal states. Effectively, this converts optimal control into a pure inference problem, enabling the application of standard Bayesian filtering techniques. We then consider optimal trajectories that rest on posterior beliefs about hidden states in the future. Crucially, this entails modelling control as a hidden state that endows the generative model with a representation of agency. This leads to a distinction between models with and without inference on hidden control states; namely, agency-free and agency-based models, respectively.