scispace - formally typeset
Search or ask a question

Showing papers on "Bayes' theorem published in 2013"


Posted Content
TL;DR: In this paper, a stochastic variational inference and learning algorithm was proposed for directed probabilistic models with intractable posterior distributions and large datasets, which scales to large datasets.
Abstract: How can we perform efficient inference and learning in directed probabilistic models, in the presence of continuous latent variables with intractable posterior distributions, and large datasets? We introduce a stochastic variational inference and learning algorithm that scales to large datasets and, under some mild differentiability conditions, even works in the intractable case. Our contributions is two-fold. First, we show that a reparameterization of the variational lower bound yields a lower bound estimator that can be straightforwardly optimized using standard stochastic gradient methods. Second, we show that for i.i.d. datasets with continuous latent variables per datapoint, posterior inference can be made especially efficient by fitting an approximate inference model (also called a recognition model) to the intractable posterior using the proposed lower bound estimator. Theoretical advantages are reflected in experimental results.

4,883 citations


Posted Content
TL;DR: Expectation Propagation (EP) as mentioned in this paper is a deterministic approximation technique in Bayesian networks that unifies two previous techniques: assumed-density filtering, an extension of the Kalman filter, and loopy belief propagation.
Abstract: This paper presents a new deterministic approximation technique in Bayesian networks. This method, "Expectation Propagation", unifies two previous techniques: assumed-density filtering, an extension of the Kalman filter, and loopy belief propagation, an extension of belief propagation in Bayesian networks. All three algorithms try to recover an approximate distribution which is close in KL divergence to the true distribution. Loopy belief propagation, because it propagates exact belief states, is useful for a limited class of belief networks, such as those which are purely discrete. Expectation Propagation approximates the belief states by only retaining certain expectations, such as mean and variance, and iterates until these expectations are consistent throughout the network. This makes it applicable to hybrid networks with discrete and continuous nodes. Expectation Propagation also extends belief propagation in the opposite direction - it can propagate richer belief states that incorporate correlations between nodes. Experiments with Gaussian mixture models show Expectation Propagation to be convincingly better than methods with similar computational cost: Laplace's method, variational Bayes, and Monte Carlo. Expectation Propagation also provides an efficient algorithm for training Bayes point machine classifiers.

1,365 citations


Book
03 Feb 2013
TL;DR: In this paper, the authors present a survey of Bayesian methods for health-care evaluation, focusing on the following: 1.1 What is probability? 2.2 Random variables, parameters and likelihood.
Abstract: Preface. List of examples. 1. Introduction. 1.1 What are Bayesian methods? 1.2 What do we mean by 'health--care evaluation'? 1.3 A Bayesian approach to evaluation. 1.4 The aim of this book and the intended audience. 1.5 Structure of the book. 2. Basic Concepts from Traditional Statistical Analysis. 2.1 Probability. 2.1.1 What is probability? 2.1.2 Odds and log--odds. 2.1.3 Bayes theorem for simple events. 2.2 Random variables, parameters and likelihood. 2.2.1 Random variables and their distributions. 2.2.2 Expectation, variance, covariance and correlation. 2.2.3 Parametric distributions and conditional independence. 2.2.4 Likelihoods. 2.3 The normal distribution. 2.4 Normal likelihoods. 2.4.1 Normal approximations for binary data. 2.4.2 Normal likelihoods for survival data. 2.4.3 Normal likelihoods for count responses. 2.4.4 Normal likelihoods for continuous responses. 2.5 Classical inference. 2.6 A catalogue of useful distributionsaeo. 2.6.1 Binomial and Bernoulli. 2.6.2 Poisson. 2.6.3 Beta. 2.6.4 Uniform. 2.6.5 Gamma. 2.6.6 Root--inverse--gamma. 2.6.7 Half--normal. 2.6.8 Log--normal. 2.6.9 Student's t. 2.6.10 Bivariate normal. 2.7 Key points. Exercises. 3. An Overview of the Bayesian Approach. 3.1 Subjectivity and context. 3.2 Bayes theorem for two hypotheses. 3.3 Comparing simple hypotheses: likelihood ratios and Bayes factors. 3.4 Exchangeability and parametric modellingaeo. 3.5 Bayes theorem for general quantities. 3.6 Bayesian analysis with binary data. 3.6.1 Binary data with a discrete prior distribution. 3.6.2 Conjugate analysis for binary data. 3.7 Bayesian analysis with normal distributions. 3.8 Point estimation, interval estimation and interval hypotheses. 3.9 The prior distribution. 3.10 How to use Bayes theorem to interpret trial results. 3.11 The 'credibility' of significant trial resultsaeo. 3.12 Sequential use of Bayes theoremaeo. 3.13 Predictions. 3.13.1 Predictions in the Bayesian framework. 3.13.2 Predictions for binary dataaeo. 3.13.3 Predictions for normal data. 3.14 Decision--making. 3.15 Design. 3.16 Use of historical data. 3.17 Multiplicity, exchangeability and hierarchical models. 3.18 Dealing with nuisance parametersaeo. 3.18.1 Alternative methods for eliminating nuisance parametersaeo. 3.18.2 Profile likelihood in a hierarchical modelaeo. 3.19 Computational issues. 3.19.1 Monte Carlo methods. 3.19.2 Markov chain Monte Carlo methods. 3.19.3 WinBUGS. 3.20 Schools of Bayesians. 3.21 A Bayesian checklist. 3.22 Further reading. 3.23 Key points. Exercises. 4. Comparison of Alternative Approaches to Inference. 4.1 A structure for alternative approaches. 4.2 Conventional statistical methods used in health--care evaluation. 4.3 The likelihood principle, sequential analysis and types of error. 4.3.1 The likelihood principle. 4.3.2 Sequential analysis. 4.3.3 Type I and Type II error. 4.4 P--values and Bayes factorsaeo. 4.4.1 Criticism of P--values. 4.4.2 Bayes factors as an alternative to P--values: simple hypotheses. 4.4.3 Bayes factors as an alternative to P--values: composite hypotheses. 4.4.4 Bayes factors in preference studies. 4.4.5 Lindley's paradox. 4.5 Key points. Exercises. 5. Prior Distributions. 5.1 Introduction. 5.2 Elicitation of opinion: a brief review. 5.2.1 Background to elicitation. 5.2.2 Elicitation techniques. 5.2.3 Elicitation from multiple experts. 5.3 Critique of prior elicitation. 5.4 Summary of external evidenceaeo. 5.5 Default priors. 5.5.1 'Non--informative' or 'reference' priors: 5.5.2 'Sceptical' priors. 5.5.3 'Enthusiastic' priors. 5.5.4 Priors with a point mass at the null hypothesis ('lump--and--smear' priors)aeo. 5.6 Sensitivity analysis and 'robust' priors. 5.7 Hierarchical priors. 5.7.1 The judgement of exchangeability. 5.7.2 The form for the random--effects distribution. 5.7.3 The prior for the standard deviation of the random effectsaeo. 5.8 Empirical criticism of priors. 5.9 Key points. Exercises. 6. Randomised Controlled Trials. 6.1 Introduction. 6.2 Use of a loss function: is a clinical trial for inference or decision? 6.3 Specification of null hypotheses. 6.4 Ethics and randomisation: a brief review. 6.4.1 Is randomisation necessary? 6.4.2 When is it ethical to randomise? 6.5 Sample size of non--sequential trials. 6.5.1 Alternative approaches to sample--size assessment. 6.5.2 'Classical power': hybrid classical--Bayesian methods assuming normality. 6.5.3 'Bayesian power'. 6.5.4 Adjusting formulae for different hypotheses. 6.5.5 Predictive distribution of power and necessary sample size. 6.6 Monitoring of sequential trials. 6.6.1 Introduction. 6.6.2 Monitoring using the posterior distribution. 6.6.3 Monitoring using predictions: 'interim power'. 6.6.4 Monitoring using a formal loss function. 6.6.5 Frequentist properties of sequential Bayesian methods. 6.6.6 Bayesian methods and data monitoring committees. 6.7 The role of 'scepticism' in confirmatory studies. 6.8 Multiplicity in randomised trials. 6.8.1 Subset analysis. 6.8.2 Multi--centre analysis. 6.8.3 Cluster randomization. 6.8.4 Multiple endpoints and treatments. 6.9 Using historical controlsaeo. 6.10 Data--dependent allocation. 6.11 Trial designs other than two parallel groups. 6.12 Other aspects of drug development. 6.13 Further reading. 6.14 Key points. Exercises. 7. Observational Studies. 7.1 Introduction. 7.2 Alternative study designs. 7.3 Explicit modelling of biases. 7.4 Institutional comparisons. 7.5 Key points. Exercises. 8. Evidence Synthesis. 8.1 Introduction. 8.2 'Standard' meta--analysis. 8.2.1 A Bayesian perspective. 8.2.2 Some delicate issues in Bayesian meta--analysis. 8.2.3 The relationship between treatment effect and underlying risk. 8.3 Indirect comparison studies. 8.4 Generalised evidence synthesis. 8.5 Further reading. 8.6 Key points. Exercises. 9. Cost--effectiveness, Policy--Making and Regulation. 9.1 Introduction. 9.2 Contexts. 9.3 'Standard' cost--effectiveness analysis without uncertainty. 9.4 'Two--stage' and integrated approaches to uncertainty in cost--effectiveness modeling. 9.5 Probabilistic analysis of sensitivity to uncertainty about parameters: two--stage approach. 9.6 Cost--effectiveness analyses of a single study: integrated approach. 9.7 Levels of uncertainty in cost--effectiveness models. 9.8 Complex cost--effectiveness models. 9.8.1 Discrete--time, discrete--state Markov models. 9.8.2 Micro--simulation in cost--effectiveness models. 9.8.3 Micro--simulation and probabilistic sensitivity analysis. 9.8.4 Comprehensive decision modeling. 9.9 Simultaneous evidence synthesis and complex cost--effectiveness modeling. 9.9.1 Generalised meta--analysis of evidence. 9.9.2 Comparison of integrated Bayesian and two--stage approach. 9.10 Cost--effectiveness of carrying out research: payback models. 9.10.1 Research planning in the public sector. 9.10.2 Research planning in the pharmaceutical industry. 9.10.3 Value of information. 9.11 Decision theory in cost--effectiveness analysis, regulation and policy. 9.12 Regulation and health policy. 9.12.1 The regulatory context. 9.12.2 Regulation of pharmaceuticals. 9.12.3 Regulation of medical devices. 9.13 Conclusions. 9.14 Key points. Exercises. 10. Conclusions and Implications for Future Research. 10.1 Introduction. 10.2 General advantages and problems of a Bayesian approach. 10.3 Future research and development. Appendix: Websites and Software. A.1 The site for this book. A.2 Bayesian methods in health--care evaluation. A.3 Bayesian software. A.4 General Bayesian sites. References. Index.

1,038 citations


Journal ArticleDOI
TL;DR: This work presents an approximate hierarchical Bayesian method using a Markov chain Monte Carlo (MCMC) routine that ensures robustness against model misspecification by averaging over a large number of predefined site classes, and leaves the distribution of selection parameters essentially unconstrained.
Abstract: Model-based analyses of natural selection often categorize sites into a relatively small number of site classes. Forcing each site to belong to one of these classes places unrealistic constraints on the distribution of selection parameters, which can result in misleading inference due to model misspecification. We present an approximate hierarchical Bayesian method using a Markov chain Monte Carlo (MCMC) routine that ensures robustness against model misspecification by averaging over a large number of predefined site classes. This leaves the distribution of selection parameters essentially unconstrained, and also allows sites experiencing positive and purifying selection to be identified orders of magnitude faster than by existing methods. We demonstrate that popular random effects likelihood methods can produce misleading results when sites assigned to the same site class experience different levels of positive or purifying selection—an unavoidable scenario when using a small number of site classes. Our Fast Unconstrained Bayesian AppRoximation (FUBAR) is unaffected by this problem, while achieving higher power than existing unconstrained (fixed effects likelihood) methods. The speed advantage of FUBAR allows us to analyze larger data sets than other methods: We illustrate this on a large influenza hemagglutinin data set (3,142 sequences). FUBAR is available as a batch file within the latest HyPhy distribution (http://www.hyphy.org), as well as on the Datamonkey web server (http://www.datamonkey.org/).

939 citations


Journal ArticleDOI
TL;DR: This work applies Bayesian sparse linear mixed model (BSLMM) and compares it with other methods for two polygenic modeling applications: estimating the proportion of variance in phenotypes explained (PVE) by available genotypes, and phenotype (or breeding value) prediction, and demonstrates that BSLMM considerably outperforms either of the other two methods.
Abstract: Both linear mixed models (LMMs) and sparse regression models are widely used in genetics applications, including, recently, polygenic modeling in genome-wide association studies. These two approaches make very different assumptions, so are expected to perform well in different situations. However, in practice, for a given dataset one typically does not know which assumptions will be more accurate. Motivated by this, we consider a hybrid of the two, which we refer to as a “Bayesian sparse linear mixed model” (BSLMM) that includes both these models as special cases. We address several key computational and statistical issues that arise when applying BSLMM, including appropriate prior specification for the hyper-parameters and a novel Markov chain Monte Carlo algorithm for posterior inference. We apply BSLMM and compare it with other methods for two polygenic modeling applications: estimating the proportion of variance in phenotypes explained (PVE) by available genotypes, and phenotype (or breeding value) prediction. For PVE estimation, we demonstrate that BSLMM combines the advantages of both standard LMMs and sparse regression modeling. For phenotype prediction it considerably outperforms either of the other two methods, as well as several other large-scale regression methods previously suggested for this problem. Software implementing our method is freely available from http://stephenslab.uchicago.edu/software.html.

764 citations


Journal ArticleDOI
TL;DR: A new class of RFS distributions is proposed that is conjugate with respect to the multiobject observation likelihood and closed under the Chapman-Kolmogorov equation and is tested on a Bayesian multi-target tracking algorithm.
Abstract: The objective of multi-object estimation is to simultaneously estimate the number of objects and their states from a set of observations in the presence of data association uncertainty, detection uncertainty, false observations, and noise. This estimation problem can be formulated in a Bayesian framework by modeling the (hidden) set of states and set of observations as random finite sets (RFSs) that covers thinning, Markov shifts, and superposition. A prior for the hidden RFS together with the likelihood of the realization of the observed RFS gives the posterior distribution via the application of Bayes rule. We propose a new class of RFS distributions that is conjugate with respect to the multiobject observation likelihood and closed under the Chapman-Kolmogorov equation. This result is tested on a Bayesian multi-target tracking algorithm.

762 citations


Journal ArticleDOI
TL;DR: Modifications of common standards of evidence are proposed to reduce the rate of nonreproducibility of scientific research by a factor of 5 or greater and to correct the problem of unjustifiably high levels of significance.
Abstract: Recent advances in Bayesian hypothesis testing have led to the development of uniformly most powerful Bayesian tests, which represent an objective, default class of Bayesian hypothesis tests that have the same rejection regions as classical significance tests. Based on the correspondence between these two classes of tests, it is possible to equate the size of classical hypothesis tests with evidence thresholds in Bayesian tests, and to equate P values with Bayes factors. An examination of these connections suggest that recent concerns over the lack of reproducibility of scientific studies can be attributed largely to the conduct of significance tests at unjustifiably high levels of significance. To correct this problem, evidence thresholds required for the declaration of a significant finding should be increased to 25–50:1, and to 100–200:1 for the declaration of a highly significant finding. In terms of classical hypothesis tests, these evidence standards mandate the conduct of tests at the 0.005 or 0.001 level of significance.

671 citations


Journal ArticleDOI
TL;DR: It is suggested that both pervasive trait abnormalities and florid failures of inference in the psychotic state can be linked to factors controlling post-synaptic gain – such as NMDA receptor function and (dopaminergic) neuromodulation.
Abstract: This paper considers psychotic symptoms in terms of false inferences or beliefs. It is based on the notion that the brain is an inference machine that actively constructs hypotheses to explain or predict its sensations. This perspective provides a normative (Bayes optimal) account of action and perception that emphasises probabilistic representations; in particular, the confidence or precision of beliefs about the world. We will consider hallucinosis, abnormal eye movements, sensory attenuation deficits, catatonia and delusions as various expressions of the same core pathology: namely, an aberrant encoding of precision. From a cognitive perspective, this represents a pernicious failure of metacognition (beliefs about beliefs) that can confound perceptual inference. In the embodied setting of active (Bayesian) inference, it can lead to behaviours that are paradoxically more accurate than Bayes optimal behaviour. Crucially, this normative account is accompanied by a neuronally plausible process theory based upon hierarchical predictive coding. In predictive coding, precision is thought to be encoded by the postsynaptic gain of neurons reporting prediction error. This suggests that both pervasive trait abnormalities and florid failures of inference in the psychotic state can be linked to factors controlling postsynaptic gain – such as NMDA receptor function and (dopaminergic) neuromodulation. We illustrate these points using biologically plausible simulations of perceptual synthesis, smooth pursuit eye movements and attribution of agency – that all use the same predictive coding scheme and pathology: namely, a reduction in the precision of prior beliefs, relative to sensory evidence.

631 citations


Journal ArticleDOI
TL;DR: Approximate Bayesian computation (ABC) constitutes a class of computational methods rooted in Bayesian statistics that widen the realm of models for which statistical inference can be considered and exacerbates the challenges of parameter estimation and model selection.
Abstract: Approximate Bayesian computation (ABC) constitutes a class of computational methods rooted in Bayesian statistics. In all model-based statistical inference, the likelihood function is of central importance, since it expresses the probability of the observed data under a particular statistical model, and thus quantifies the support data lend to particular values of parameters and to choices among different models. For simple models, an analytical formula for the likelihood function can typically be derived. However, for more complex models, an analytical formula might be elusive or the likelihood function might be computationally very costly to evaluate. ABC methods bypass the evaluation of the likelihood function. In this way, ABC methods widen the realm of models for which statistical inference can be considered. ABC methods are mathematically well-founded, but they inevitably make assumptions and approximations whose impact needs to be carefully assessed. Furthermore, the wider application domain of ABC exacerbates the challenges of parameter estimation and model selection. ABC has rapidly gained popularity over the last years and in particular for the analysis of complex problems arising in biological sciences (e.g., in population genetics, ecology, epidemiology, and systems biology).

531 citations


Journal Article
TL;DR: In this paper, the authors define a widely applicable Bayesian information criterion (WBIC) by the average log likelihood function over the posterior distribution with the inverse temperature 1/log n, where n is the number of training samples.
Abstract: A statistical model or a learning machine is called regular if the map taking a parameter to a probability distribution is one-to-one and if its Fisher information matrix is always positive definite. If otherwise, it is called singular. In regular statistical models, the Bayes free energy, which is defined by the minus logarithm of Bayes marginal likelihood, can be asymptotically approximated by the Schwarz Bayes information criterion (BIC), whereas in singular models such approximation does not hold. Recently, it was proved that the Bayes free energy of a singular model is asymptotically given by a generalized formula using a birational invariant, the real log canonical threshold (RLCT), instead of half the number of parameters in BIC. Theoretical values of RLCTs in several statistical models are now being discovered based on algebraic geometrical methodology. However, it has been difficult to estimate the Bayes free energy using only training samples, because an RLCT depends on an unknown true distribution. In the present paper, we define a widely applicable Bayesian information criterion (WBIC) by the average log likelihood function over the posterior distribution with the inverse temperature 1/log n, where n is the number of training samples. We mathematically prove that WBIC has the same asymptotic expansion as the Bayes free energy, even if a statistical model is singular for or unrealizable by a statistical model. Since WBIC can be numerically calculated without any information about a true distribution, it is a generalized version of BIC onto singular statistical models.

459 citations


Journal ArticleDOI
TL;DR: The authors argue that the most successful forms of Bayesian statistics do not actually support that particular philosophy but rather accord much better with sophisticated forms of hypothetico-deductivism, and examine the actual role played by prior distributions in Bayesian models, and the crucial aspects of model checking and model revision.
Abstract: A substantial school in the philosophy of science identies Bayesian inference with inductive inference and even rationality as such, and seems to be strengthened by the rise and practical success of Bayesian statistics. We argue that the most successful forms of Bayesian statistics do not actually support that particular philosophy but rather accord much better with sophisticated forms of hypothetico-deductivism. We examine the actual role played by prior distributions in Bayesian models, and the crucial aspects of model checking and model revision, which fall outside the scope of Bayesian

Journal ArticleDOI
TL;DR: This work introduces several novel algorithms based on Hamming graphs and Bayesian subclustering in its new error correction tool BAYES HAMMER, which improves on existing error correction tools for multi-cell sequencing data while working much faster on real-life datasets.
Abstract: Error correction of sequenced reads remains a difficult task, especially in single-cell sequencing projects with extremely non-uniform coverage. While existing error correction tools designed for standard (multi-cell) sequencing data usually come up short in single-cell sequencing projects, algorithms actually used for single-cell error correction have been so far very simplistic. We introduce several novel algorithms based on Hamming graphs and Bayesian subclustering in our new error correction tool BAYES HAMMER. While BAYES HAMMER was designed for single-cell sequencing, we demonstrate that it also improves on existing error correction tools for multi-cell sequencing data while working much faster on real-life datasets. We benchmark BAYES HAMMER on both k-mer counts and actual assembly results with the SPADES genome assembler.

Journal ArticleDOI
TL;DR: In this article, a Gaussian Markov random field (GMRF) model was proposed for the analysis of multilocus sequence data and the time to the most recent common ancestor (TMRCA) was recovered.
Abstract: Effective population size is fundamental in population genetics and characterizes genetic diversity. To infer past population dynamics from molecular sequence data, coalescent-based models have been developed for Bayesian nonparametric estimation of effective population size over time. Among the most successful is a Gaussian Markov random field (GMRF) model for a single gene locus. Here, we present a generalization of the GMRF model that allows for the analysis of multilocus sequence data. Using simulated data, we demonstrate the improved performance of our method to recover true population trajectories and the time to the most recent common ancestor (TMRCA). We analyze a multilocus alignment of HIV-1 CRF02_AG gene sequences sampled from Cameroon. Our results are consistent with HIV prevalence data and uncover some aspects of the population history that go undetected in Bayesian parametric estimation. Finally, we recover an older and more reconcilable TMRCA for a classic ancient DNA data set.

Journal ArticleDOI
01 Jul 2013-Genetics
TL;DR: It is concluded that members of the alphabet have a room in whole-genome prediction of phenotypes, but have somewhat doubtful inferential value, at least when sample size is such that n ≪ p.
Abstract: Whole-genome enabled prediction of complex traits has received enormous attention in animal and plant breeding and is making inroads into human and even Drosophila genetics. The term “Bayesian alphabet” denotes a growing number of letters of the alphabet used to denote various Bayesian linear regressions that differ in the priors adopted, while sharing the same sampling model. We explore the role of the prior distribution in whole-genome regression models for dissecting complex traits in what is now a standard situation with genomic data where the number of unknown parameters (p) typically exceeds sample size (n). Members of the alphabet aim to confront this overparameterization in various manners, but it is shown here that the prior is always influential, unless n ≫ p. This happens because parameters are not likelihood identified, so Bayesian learning is imperfect. Since inferences are not devoid of the influence of the prior, claims about genetic architecture from these methods should be taken with caution. However, all such procedures may deliver reasonable predictions of complex traits, provided that some parameters (“tuning knobs”) are assessed via a properly conducted cross-validation. It is concluded that members of the alphabet have a room in whole-genome prediction of phenotypes, but have somewhat doubtful inferential value, at least when sample size is such that n ≪ p.

Journal ArticleDOI
TL;DR: The Bayesian network method provides greater value than the bow-tie model since it can consider common cause failures and conditional dependencies along with performing probability updating and sequential learning using accident precursors.

Journal ArticleDOI
TL;DR: In this paper, the authors develop the formalism of quantum conditional states, which provides a unified description of these two sorts of experiment, and they also show that remote steering of quantum states can be described within their formalism as a mere updating of beliefs about one system given new information about another, and retrodictive inferences can be expressed using the same belief propagation rule as is used for predictive inferences.
Abstract: Quantum theory can be viewed as a generalization of classical probability theory, but the analogy as it has been developed so far is not complete. Whereas the manner in which inferences are made in classical probability theory is independent of the causal relation that holds between the conditioned variable and the conditioning variable, in the conventional quantum formalism, there is a significant difference between how one treats experiments involving two systems at a single time and those involving a single system at two times. In this article, we develop the formalism of quantum conditional states, which provides a unified description of these two sorts of experiment. In addition, concepts that are distinct in the conventional formalism become unified: Channels, sets of states, and positive operator valued measures are all seen to be instances of conditional states; the action of a channel on a state, ensemble averaging, the Born rule, the composition of channels, and nonselective state-update rules are all seen to be instances of belief propagation. Using a quantum generalization of Bayes' theorem and the associated notion of Bayesian conditioning, we also show that the remote steering of quantum states can be described within our formalism as a mere updating of beliefs about one system given new information about another, and retrodictive inferences can be expressed using the same belief propagation rule as is used for predictive inferences. Finally, we show that previous arguments for interpreting the projection postulate as a quantum generalization of Bayesian conditioning are based on a misleading analogy and that it is best understood as a combination of belief propagation (corresponding to the nonselective state-update map) and conditioning on the measurement outcome.

Posted Content
TL;DR: SDA-Bayes as mentioned in this paper is a framework for streaming and distributed computation of a Bayesian posterior, which makes streaming updates to the estimated posterior according to a user-specified approximation batch primitive.
Abstract: We present SDA-Bayes, a framework for (S)treaming, (D)istributed, (A)synchronous computation of a Bayesian posterior. The framework makes streaming updates to the estimated posterior according to a user-specified approximation batch primitive. We demonstrate the usefulness of our framework, with variational Bayes (VB) as the primitive, by fitting the latent Dirichlet allocation model to two large-scale document collections. We demonstrate the advantages of our algorithm over stochastic variational inference (SVI) by comparing the two after a single pass through a known amount of data---a case where SVI may be applied---and in the streaming setting, where SVI does not apply.

Journal ArticleDOI
TL;DR: Variational Bayes is considered as an alternative scheme that provides formal constraints on the computational anatomy of inference and action—constraints that are remarkably consistent with neuroanatomy.
Abstract: This paper considers agency in the setting of embodied or active inference. In brief, we associate a sense of agency with prior beliefs about action and ask what sorts of beliefs underlie optimal behaviour. In particular, we consider prior beliefs that action minimises the Kullback-Leibler divergence between desired states and attainable states in the future. This allows one to formulate bounded rationality as approximate Bayesian inference that optimises a free energy bound on model evidence. We show that constructs like expected utility, exploration bonuses, softmax choice rules and optimism bias emerge as natural consequences of this formulation. Previous accounts of active inference have focused on predictive coding and Bayesian filtering schemes for minimising free energy. Here, we consider variational Bayes as an alternative scheme that provides formal constraints on the computational anatomy of inference and action – constraints that are remarkably consistent with neuroanatomy. Furthermore, this scheme contextualises optimal decision theory and economic (utilitarian) formulations as pure inference problems. For example, expected utility theory emerges as a special case of free energy minimisation, where the sensitivity or inverse temperature (of softmax functions and quantal response equilibria) has a unique and Bayes-optimal solution – that minimises free energy. This sensitivity corresponds to the precision of beliefs about behaviour, such that attainable goals are afforded a higher precision or confidence. In turn, this means that optimal behaviour entails a representation of confidence about outcomes that are under an agent's control.

Journal ArticleDOI
TL;DR: It is found that the estimated dispersion in existing methods does not adequately capture the heterogeneity of biological variance among samples, so a new empirical Bayes shrinkage estimate of the dispersion parameters is presented and improved DE detection is demonstrated.
Abstract: Recent developments in RNA-sequencing (RNA-seq) technology have led to a rapid increase in gene expression data in the form of counts. RNA-seq can be used for a variety of applications, however, identifying differential expression (DE) remains a key task in functional genomics. There have been a number of statistical methods for DE detection for RNA-seq data. One common feature of several leading methods is the use of the negative binomial (Gamma-Poisson mixture) model. That is, the unobserved gene expression is modeled by a gamma random variable and, given the expression, the sequencing read counts are modeled as Poisson. The distinct feature in various methods is how the variance, or dispersion, in the Gamma distribution is modeled and estimated. We evaluate several large public RNA-seq datasets and find that the estimated dispersion in existing methods does not adequately capture the heterogeneity of biological variance among samples. We present a new empirical Bayes shrinkage estimate of the dispersion parameters and demonstrate improved DE detection.

Journal ArticleDOI
TL;DR: In this paper, the authors developed, implemented and applied a Markov chain Monte Carlo (MCMC) Gibbs sampler for Bayesian estimation of a hybrid choice model (HCM), using stated data on both vehicle purchase decisions and environmental concerns.
Abstract: In this article we develop, implement and apply a Markov chain Monte Carlo (MCMC) Gibbs sampler for Bayesian estimation of a hybrid choice model (HCM), using stated data on both vehicle purchase decisions and environmental concerns. Our study has two main contributions. The first is the feasibility of the Bayesian estimator we derive. Whereas classical estimation of HCMs is fairly complex, we show that the Bayesian approach for HCMs is methodologically easier to implement than simulated maximum likelihood because the inclusion of latent variables translates into adding independent ordinary regressions; we also find that, using the Bayesian estimates, forecasting and deriving confidence intervals for willingness to pay measures is straightforward. The second is the capacity of HCMs to adapt to practical situations. Our empirical results coincide with a priori expectations, namely that environmentally-conscious consumers are willing to pay more for low-emission vehicles. The model outperforms standard discr...

Journal Article
TL;DR: A kernel method for realizing Bayes' rule is proposed, based on representations of probabilities in reproducing kernel Hilbert spaces, including Bayesian computation without likelihood and filtering with a nonparametric state-space model.
Abstract: A kernel method for realizing Bayes' rule is proposed, based on representations of probabilities in reproducing kernel Hilbert spaces. Probabilities are uniquely characterized by the mean of the canonical map to the RKHS. The prior and conditional probabilities are expressed in terms of RKHS functions of an empirical sample: no explicit parametric model is needed for these quantities. The posterior is likewise an RKHS mean of a weighted sample. The estimator for the expectation of a function of the posterior is derived, and rates of consistency are shown. Some representative applications of the kernel Bayes' rule are presented, including Bayesian computation without likelihood and filtering with a nonparametric state-space model.

Journal ArticleDOI
TL;DR: The results indicate that the weather condition variables, especially precipitation, play a key role in the crash occurrence models and imply that different active traffic management strategies should be designed based on seasons.

Journal ArticleDOI
TL;DR: The Intention-Driven Dynamics Model is proposed to probabilistically model the generative process of movements that are directed by the intention and allows the intention to be inferred from observed movements using Bayes’ theorem.
Abstract: Intention inference can be an essential step toward efficient human-robot interaction. For this purpose, we propose the Intention-Driven Dynamics Model (IDDM) to probabilistically model the generative process of movements that are directed by the intention. The IDDM allows the intention to be inferred from observed movements using Bayes' theorem. The IDDM simultaneously finds a latent state representation of noisy and high-dimensional observations, and models the intention-driven dynamics in the latent states. As most robotics applications are subject to real-time constraints, we develop an efficient online algorithm that allows for real-time intention inference. Two human-robot interaction scenarios, i.e. target prediction for robot table tennis and action recognition for interactive humanoid robots, are used to evaluate the performance of our inference algorithm. In both intention inference tasks, the proposed algorithm achieves substantial improvements over support vector machines and Gaussian processes.

Journal ArticleDOI
TL;DR: In this paper, a review of methods of inference for single and multiple change-points in time series, when data are of retrospective (off-line) type, is presented.
Abstract: The article reviews methods of inference for single and multiple change-points in time series, when data are of retrospective (off-line) type. The inferential methods reviewed for a single change-point in time series include likelihood, Bayes, Bayes-type and some relevant non-parametric methods. Inference for multiple change-points requires methods that can handle large data sets and can be implemented efficiently for estimating the number of change-points as well as their locations. Our review in this important area focuses on some of the recent advances in this direction. Greater emphasis is placed on multivariate data while reviewing inferential methods for a single change-point in time series. Throughout the article, more attention is paid to estimation of unknown change-point(s) in time series, and this is especially true in the case of multiple change-points. Some specific data sets for which change-point modelling has been carried out in the literature are provided as illustrative examples under both single and multiple change-point scenarios.

Journal ArticleDOI
TL;DR: A probabilistic classifier based on Bayes' theorem and a supervised learning using a perceptron convergence algorithm to address the emotion recognition problem from electroencephalogram signals.

Book ChapterDOI
TL;DR: It is shown how in Bayesian GWAS false positives can be controlled by limiting the proportion of false-positive results among all positives to some small value, so that the power of detecting associations is not inversely related to the number of markers.
Abstract: Bayesian multiple-regression methods are being successfully used for genomic prediction and selection. These regression models simultaneously fit many more markers than the number of observations available for the analysis. Thus, the Bayes theorem is used to combine prior beliefs of marker effects, which are expressed in terms of prior distributions, with information from data for inference. Often, the analyses are too complex for closed-form solutions and Markov chain Monte Carlo (MCMC) sampling is used to draw inferences from posterior distributions. This chapter describes how these Bayesian multiple-regression analyses can be used for GWAS. In most GWAS, false positives are controlled by limiting the genome-wise error rate, which is the probability of one or more false-positive results, to a small value. As the number of test in GWAS is very large, this results in very low power. Here we show how in Bayesian GWAS false positives can be controlled by limiting the proportion of false-positive results among all positives to some small value. The advantage of this approach is that the power of detecting associations is not inversely related to the number of markers.

Proceedings Article
05 Dec 2013
TL;DR: SDA-Bayes is presented, a framework for streaming updates to the estimated posterior of a Bayesian posterior, with variational Bayes (VB) as the primitive, and the usefulness of the framework is demonstrated by fitting the latent Dirichlet allocation model to two large-scale document collections.
Abstract: We present SDA-Bayes, a framework for (S)treaming, (D)istributed, (A)synchronous computation of a Bayesian posterior. The framework makes streaming updates to the estimated posterior according to a user-specified approximation batch primitive. We demonstrate the usefulness of our framework, with variational Bayes (VB) as the primitive, by fitting the latent Dirichlet allocation model to two large-scale document collections. We demonstrate the advantages of our algorithm over stochastic variational inference (SVI) by comparing the two after a single pass through a known amount of data—a case where SVI may be applied—and in the streaming setting, where SVI does not apply.

Journal ArticleDOI
TL;DR: Two probabilistic machine-learning algorithms were compared for in silico target prediction of bioactive molecules, namely the well-established Laplacian-modified Naïve Bayes classifier (NB) and the more recently introduced Parzen-Rosenblatt Window.
Abstract: In this study, two probabilistic machine-learning algorithms were compared for in silico target prediction of bioactive molecules, namely the well-established Laplacian-modified Naive Bayes classifier (NB) and the more recently introduced (to Cheminformatics) Parzen-Rosenblatt Window. Both classifiers were trained in conjunction with circular fingerprints on a large data set of bioactive compounds extracted from ChEMBL, covering 894 human protein targets with more than 155,000 ligand-protein pairs. This data set is also provided as a benchmark data set for future target prediction methods due to its size as well as the number of bioactivity classes it contains. In addition to evaluating the methods, different performance measures were explored. This is not as straightforward as in binary classification settings, due to the number of classes, the possibility of multiple class memberships, and the need to translate model scores into “yes/no” predictions for assessing model performance. Both algorithms achie...

Journal ArticleDOI
TL;DR: In this article, the Bernstein-von Mises theorems for nonparametric Bayes priors in the Gaussian white noise model are proved and it is demonstrated how such results justify Bayes methods as efficient frequentist inference procedures in a variety of concrete non-parametric problems.
Abstract: Bernstein–von Mises theorems for nonparametric Bayes priors in the Gaussian white noise model are proved. It is demonstrated how such results justify Bayes methods as efficient frequentist inference procedures in a variety of concrete nonparametric problems. Particularly Bayesian credible sets are constructed that have asymptotically exact $1-\alpha$ frequentist coverage level and whose $L^{2}$-diameter shrinks at the minimax rate of convergence (within logarithmic factors) over Holder balls. Other applications include general classes of linear and nonlinear functionals and credible bands for auto-convolutions. The assumptions cover nonconjugate product priors defined on general orthonormal bases of $L^{2}$ satisfying weak conditions.

Book
01 Jan 2013
TL;DR: A range of accessible examples are used to show how Bayes' rule is actually a natural consequence of commonsense reasoning, and Bayesian analysis is applied to parameter estimation using the MatLab programs provided.
Abstract: Discovered by an 18th century mathematician and preacher, Bayes' rule is a cornerstone of modern probability theory. In this richly illustrated book, a range of accessible examples is used to show how Bayes' rule is actually a natural consequence of commonsense reasoning. Bayes' rule is derived using intuitive graphical representations of probability, and Bayesian analysis is applied to parameter estimation using the MatLab programs provided. The tutorial style of writing, combined with a comprehensive glossary, makes this an ideal primer for the novice who wishes to become familiar with the basic principles of Bayesian analysis.