scispace - formally typeset
Search or ask a question

Showing papers on "Bayesian probability published in 2007"


Journal ArticleDOI
TL;DR: The incremental algorithm is compared experimentally to an earlier batch Bayesian algorithm, as well as to one based on maximum-likelihood, which have comparable classification performance on small training sets, but incremental learning is significantly faster, making real-time learning feasible.

2,597 citations


Journal ArticleDOI
TL;DR: Three new methods for sampling and evidence evaluation from distributions that may contain multiple modes and significant degeneracies in very high dimensions are presented, leading to a further substantial improvement in sampling efficiency and robustness and an even more efficient technique for estimating the uncertainty on the evaluated evidence.
Abstract: In performing a Bayesian analysis of astronomical data, two difficult problems often emerge. First, in estimating the parameters of some model for the data, the resulting posterior distribution may be multimodal or exhibit pronounced (curving) degeneracies, which can cause problems for traditional MCMC sampling methods. Second, in selecting between a set of competing models, calculation of the Bayesian evidence for each model is computationally expensive. The nested sampling method introduced by Skilling (2004), has greatly reduced the computational expense of calculating evidences and also produces posterior inferences as a by-product. This method has been applied successfully in cosmological applications by Mukherjee et al. (2006), but their implementation was efficient only for unimodal distributions without pronounced degeneracies. Shaw et al. (2007), recently introduced a clustered nested sampling method which is significantly more efficient in sampling from multimodal posteriors and also determines the expectation and variance of the final evidence from a single run of the algorithm, hence providing a further increase in efficiency. In this paper, we build on the work of Shaw et al. and present three new methods for sampling and evidence evaluation from distributions that may contain multiple modes and significant degeneracies; we also present an even more efficient technique for estimating the uncertainty on the evaluated evidence. These methods lead to a further substantial improvement in sampling efficiency and robustness, and are applied to toy problems to demonstrate the accuracy and economy of the evidence calculation and parameter estimation. Finally, we discuss the use of these methods in performing Bayesian object detection in astronomical datasets.

1,264 citations


Journal ArticleDOI
TL;DR: Based on the concept of automatic relevance determination, this paper uses an empirical Bayesian prior to estimate a convenient posterior distribution over candidate basis vectors and consistently places its prominent posterior mass on the appropriate region of weight-space necessary for simultaneous sparse recovery.
Abstract: Given a large overcomplete dictionary of basis vectors, the goal is to simultaneously represent L>1 signal vectors using coefficient expansions marked by a common sparsity profile. This generalizes the standard sparse representation problem to the case where multiple responses exist that were putatively generated by the same small subset of features. Ideally, the associated sparse generating weights should be recovered, which can have physical significance in many applications (e.g., source localization). The generic solution to this problem is intractable and, therefore, approximate procedures are sought. Based on the concept of automatic relevance determination, this paper uses an empirical Bayesian prior to estimate a convenient posterior distribution over candidate basis vectors. This particular approximation enforces a common sparsity profile and consistently places its prominent posterior mass on the appropriate region of weight-space necessary for simultaneous sparse recovery. The resultant algorithm is then compared with multiple response extensions of matching pursuit, basis pursuit, FOCUSS, and Jeffreys prior-based Bayesian methods, finding that it often outperforms the others. Additional motivation for this particular choice of cost function is also provided, including the analysis of global and local minima and a variational derivation that highlights the similarities and differences between the proposed algorithm and previous approaches.

796 citations


Journal ArticleDOI
TL;DR: A new model is proposed, the latent position cluster model, under which the probability of a tie between two actors depends on the distance between them in an unobserved Euclidean ‘social space’, and the actors’ locations in the latent social space arise from a mixture of distributions, each corresponding to a cluster.
Abstract: Summary. Network models are widely used to represent relations between interacting units or actors. Network data often exhibit transitivity, meaning that two actors that have ties to a third actor are more likely to be tied than actors that do not, homophily by attributes of the actors or dyads, and clustering. Interest often focuses on finding clusters of actors or ties, and the number of groups in the data is typically unknown. We propose a new model, the latent position cluster model, under which the probability of a tie between two actors depends on the distance between them in an unobserved Euclidean ‘social space’, and the actors’ locations in the latent social space arise from a mixture of distributions, each corresponding to a cluster. We propose two estimation methods: a two-stage maximum likelihood method and a fully Bayesian method that uses Markov chain Monte Carlo sampling. The former is quicker and simpler, but the latter performs better. We also propose a Bayesian way of determining the number of clusters that are present by using approximate conditional Bayes factors. Our model represents transitivity, homophily by attributes and clustering simultaneously and does not require the number of clusters to be known. The model makes it easy to simulate realistic networks with clustering, which are potentially useful as inputs to models of more complex systems of which the network is part, such as epidemic models of infectious disease. We apply the model to two networks of social relations. A free software package in the R statistical language, latentnet, is available to analyse data by using the model.

785 citations


Journal ArticleDOI
01 Apr 2007
TL;DR: The authors present a Bayesian framework for understanding how adults and children learn the meanings of words, and explains how learners can generalize meaningfully from just one or a few positive examples of a novel word's referents.
Abstract: The authors present a Bayesian framework for understanding how adults and children learn the meanings of words. The theory explains how learners can generalize meaningfully from just one or a few positive examples of a novel word’s referents, by making rational inductive inferences that integrate prior knowledge about plausible word meanings with the statistical structure of the observed examples. The theory addresses shortcomings of the two best known approaches to modeling word learning, based on deductive hypothesis elimination and associative learning. Three experiments with adults and children test the Bayesian account’s predictions in the context of learning words for object categories at multiple levels of a taxonomic hierarchy. Results provide strong support for the Bayesian account over competing accounts, in terms of both quantitative model fits and the ability to explain important qualitative phenomena. Several extensions of the basic theory are discussed, illustrating the broader potential for Bayesian models of word learning.

738 citations


Journal ArticleDOI
TL;DR: The Deviance Information Criterion as mentioned in this paper combines ideas from both heritages; it is readily computed from Monte Carlo posterior samples and, unlike the AIC and BIC, allows for parameter degeneracy.
Abstract: Model selection is the problem of distinguishing competing models, perhaps featuring different numbers of parameters. The statistics literature contains two distinct sets of tools, those based on information theory such as the Akaike Information Criterion (AIC), and those on Bayesian inference such as the Bayesian evidence and Bayesian Information Criterion (BIC). The Deviance Information Criterion combines ideas from both heritages; it is readily computed from Monte Carlo posterior samples and, unlike the AIC and BIC, allows for parameter degeneracy. I describe the properties of the information criteria, and as an example compute them from Wilkinson Microwave Anisotropy Probe 3-yr data for several cosmological models. I find that at present the information theory and Bayesian approaches give significantly different conclusions from that data.

725 citations


Posted Content
TL;DR: The winner of the 2004 DeGroot Prize, the authors, is a graduate-level textbook that introduces Bayesian statistics and decision theory, covering both the basic ideas of statistical theory, and also some of the more modern and advanced topics of bayesian statistics such as complete class theorems, the Stein effect, Bayesian model choice, hierarchical and empirical Bayes modeling, Monte Carlo integration including Gibbs sampling, and other MCMC techniques.
Abstract: Winner of the 2004 DeGroot Prize This paperback edition, a reprint of the 2001 edition, is a graduate-level textbook that introduces Bayesian statistics and decision theory. It covers both the basic ideas of statistical theory, and also some of the more modern and advanced topics of Bayesian statistics such as complete class theorems, the Stein effect, Bayesian model choice, hierarchical and empirical Bayes modeling, Monte Carlo integration including Gibbs sampling, and other MCMC techniques. It was awarded the 2004 DeGroot Prize by the International Society for Bayesian Analysis (ISBA) for setting "a new standard for modern textbooks dealing with Bayesian methods, especially those using MCMC techniques, and that it is a worthy successor to DeGroot's and Berger's earlier texts".

630 citations


Posted Content
TL;DR: This work examines the case where the model parameters before and after the changepoint are independent and derives an online algorithm for exact inference of the most recent changepoint and its implementation is highly modular so that the algorithm may be applied to a variety of types of data.
Abstract: Changepoints are abrupt variations in the generative parameters of a data sequence. Online detection of changepoints is useful in modelling and prediction of time series in application areas such as finance, biometrics, and robotics. While frequentist methods have yielded online filtering and prediction techniques, most Bayesian papers have focused on the retrospective segmentation problem. Here we examine the case where the model parameters before and after the changepoint are independent and we derive an online algorithm for exact inference of the most recent changepoint. We compute the probability distribution of the length of the current ``run,'' or time since the last changepoint, using a simple message-passing algorithm. Our implementation is highly modular so that the algorithm may be applied to a variety of types of data. We illustrate this modularity by demonstrating the algorithm on three different real-world data sets.

622 citations


Journal ArticleDOI
TL;DR: This paper presents a newly developed simulation-based approach for Bayesian model updating, model class selection, and model averaging called the transitional Markov chain Monte Carlo (TMCMC) approach, motivated by the adaptive Metropolis–Hastings method.
Abstract: This paper presents a newly developed simulation-based approach for Bayesian model updating, model class selection, and model averaging called the transitional Markov chain Monte Carlo (TMCMC) approach. The idea behind TMCMC is to avoid the problem of sampling from difficult target probability density functions (PDFs) but sampling from a series of intermediate PDFs that converge to the target PDF and are easier to sample. The TMCMC approach is motivated by the adaptive Metropolis–Hastings method developed by Beck and Au in 2002 and is based on Markov chain Monte Carlo. It is shown that TMCMC is able to draw samples from some difficult PDFs (e.g., multimodal PDFs, very peaked PDFs, and PDFs with flat manifold). The TMCMC approach can also estimate evidence of the chosen probabilistic model class conditioning on the measured data, a key component for Bayesian model class selection and model averaging. Three examples are used to demonstrate the effectiveness of the TMCMC approach in Bayesian model updating, ...

616 citations


Book
12 Mar 2007
TL;DR: There are two related, but philosophically distinct, approaches to scientific inference: “Classical” statistical analysis is most closely allied with the frequency interpretation, whereas “Bayesian” analysis is allied withThe subjective interpretation.
Abstract: Scientists perform experiments and gather data to gain evidence for or against various hypotheses about how the world works. This sounds straightforward, but exactly how we use this data to make quantitative statements about various hypotheses requires a bit more care. In fact, there are two related, but philosophically distinct, approaches to scientific inference. To some extent these two approaches parallel the frequency interpretation and the subjective interpretation of probability. “Classical” statistical analysis is most closely allied with the frequency interpretation, whereas “Bayesian” analysis is allied with the subjective interpretation.

590 citations


Journal ArticleDOI
TL;DR: This manuscript presents the user's manual for boa, which outlines the use of and methodology upon which the software is based, and includes a description of the menu system, data management capabilities, and statistical/graphical methods for convergence assessment and posterior inference.
Abstract: Markov chain Monte Carlo (MCMC) is the most widely used method of estimating joint posterior distributions in Bayesian analysis. The idea of MCMC is to iteratively produce parameter values that are representative samples from the joint posterior. Unlike frequentist analysis where iterative model fitting routines are monitored for convergence to a single point, MCMC output is monitored for convergence to a distribution. Thus, specialized diagnostic tools are needed in the Bayesian setting. To this end, the R package boa was created. This manuscript presents the user's manual for boa, which outlines the use of and methodology upon which the software is based. Included is a description of the menu system, data management capabilities, and statistical/graphical methods for convergence assessment and posterior inference. Throughout the manual, a linear regression example is used to illustrate the software.

Book
10 May 2007
TL;DR: In this paper, the authors present a tutorial for running WinBUGS and a case study of Mark-Recapture analysis with MCMC algorithms, as well as an analysis of variance and population dynamics.
Abstract: 1. Introduction 2. Critiques of statistical methods 3. Analysing averages and frequencies 4. How good are the models? 5. Regression and correlation 6. Analysis of variance Case studies 7. Mark-recapture analysis 8. Effects of marking frogs 9. Population dynamics 10. Subjective priors 11. Conclusion Appendix A. A tutorial for running WinBUGS Appendix B. Probability distributions Appendix C. MCMC algorithms.

Journal ArticleDOI
TL;DR: In this paper, the authors developed a model for an investor with multiple priors and aversion to ambiguity, and they characterized the multiple prior by a "confidence interval" around the estimated expected returns.
Abstract: We develop a model for an investor with multiple priors and aversion to ambiguity. We characterize the multiple priors by a "confidence interval" around the estimated expected returns and we model ambiguity aversion via a minimization over the priors. Our model has several attractive features: (1) it has a solid axiomatic foundation; (2) it is flexible enough to allow for different degrees of uncertainty about expected returns for various subsets of assets and also about the return-generating model; and (3) it delivers closed-form expressions for the optimal portfolio. Our empirical analysis suggests that, compared with portfolios from classical and Bayesian models, ambiguity-averse portfolios are more stable over time and deliver a higher out-of sample Sharpe ratio. (JEL G11) Copyright 2007, Oxford University Press.

Journal ArticleDOI
TL;DR: It is demonstrated that the proposed 'bayesian epistasis association mapping' method is significantly more powerful than existing approaches and that genome-wide case-control epistasis mapping with many thousands of markers is both computationally and statistically feasible.
Abstract: Epistatic interactions among multiple genetic variants in the human genome may be important in determining individual susceptibility to common diseases. Although some existing computational methods for identifying genetic interactions have been effective for small-scale studies, we here propose a method, denoted 'bayesian epistasis association mapping' (BEAM), for genome-wide case-control studies. BEAM treats the disease-associated markers and their interactions via a bayesian partitioning model and computes, via Markov chain Monte Carlo, the posterior probability that each marker set is associated with the disease. Testing this on an age-related macular degeneration genome-wide association data set, we demonstrate that the method is significantly more powerful than existing approaches and that genome-wide case-control epistasis mapping with many thousands of markers is both computationally and statistically feasible.

Journal ArticleDOI
TL;DR: This work presents a reformulation of the Bayesian approach to inverse problems, that seeks to accelerate Bayesian inference by using polynomial chaos expansions to represent random variables, and evaluates the utility of this technique on a transient diffusion problem arising in contaminant source inversion.

Journal ArticleDOI
TL;DR: The Bayesian false-discovery probability (BFDP) shares the ease of calculation of the recently proposed false-positive report probability (FPRP) but uses more information, has a noteworthy threshold defined naturally in terms of the costs of false discovery and nondiscovery, and has a sound methodological foundation.
Abstract: In light of the vast amounts of genomic data that are now being generated, we propose a new measure, the Bayesian false-discovery probability (BFDP), for assessing the noteworthiness of an observed association. BFDP shares the ease of calculation of the recently proposed false-positive report probability (FPRP) but uses more information, has a noteworthy threshold defined naturally in terms of the costs of false discovery and nondiscovery, and has a sound methodological foundation. In addition, in a multiple-testing situation, it is straightforward to estimate the expected numbers of false discoveries and false nondiscoveries. We provide an in-depth discussion of FPRP, including a comparison with the q value, and examine the empirical behavior of these measures, along with BFDP, via simulation. Finally, we use BFDP to assess the association between 131 single-nucleotide polymorphisms and lung cancer in a case-control study.

Journal ArticleDOI
TL;DR: In this article, the predictive probability density functions (PDFs) for weather quantities are represented as a weighted average of PDFs centered on the individual bias-corrected forecasts, where the weights are posterior probabilities of the models generating the forecasts and reflect the forecasts' relative contributions to predictive skill over a training period.
Abstract: Bayesian model averaging (BMA) is a statistical way of postprocessing forecast ensembles to create predictive probability density functions (PDFs) for weather quantities. It represents the predictive PDF as a weighted average of PDFs centered on the individual bias-corrected forecasts, where the weights are posterior probabilities of the models generating the forecasts and reflect the forecasts’ relative contributions to predictive skill over a training period. It was developed initially for quantities whose PDFs can be approximated by normal distributions, such as temperature and sea level pressure. BMA does not apply in its original form to precipitation, because the predictive PDF of precipitation is nonnormal in two major ways: it has a positive probability of being equal to zero, and it is skewed. In this study BMA is extended to probabilistic quantitative precipitation forecasting. The predictive PDF corresponding to one ensemble member is a mixture of a discrete component at zero and a gam...

Journal ArticleDOI
TL;DR: In this article, the authors discuss in the context of several ongoing public health and social surveys how to develop general families of multilevel probability models that yield reasonable Bayesian inferences.
Abstract: The general principles of Bayesian data analysis imply that models for survey responses should be constructed conditional on all variables that affect the probability of inclusion and nonresponse, which are also the variables used in survey weighting and clustering. However, such models can quickly become very complicated, with potentially thousands of poststratification cells. It is then a challenge to develop general families of multilevel probability models that yield reasonable Bayesian inferences. We discuss in the context of several ongoing public health and social surveys. This work is currently open-ended, and we conclude with thoughts on how research could proceed to solve these problems.

Journal ArticleDOI
TL;DR: An infinite-sites theory, which predicts that when the amount of sequence data approaches infinity, the width of the posterior credibility interval and the posterior mean of divergence times form a perfect linear relationship, with the slope indicating uncertainties in time estimates that cannot be reduced by sequence data alone.
Abstract: We extend our recently developed Markov chain Monte Carlo algorithm for Bayesian estimation of species divergence times to allow variable evolutionary rates among lineages. The method can use heterogeneous data from multiple gene loci and accommodate multiple fossil calibrations. Uncertainties in fossil calibrations are described using flexible statistical distributions. The prior for divergence times for nodes lacking fossil calibrations is specified by use of a birth-death process with species sampling. The prior for lineage-specific substitution rates is specified using either a model with autocorrelated rates among adjacent lineages (based on a geometric Brownian motion model of rate drift) or a model with independent rates among lineages specified by a log-normal probability distribution. We develop an infinite-sites theory, which predicts that when the amount of sequence data approaches infinity, the width of the posterior credibility interval and the posterior mean of divergence times form a perfect linear relationship, with the slope indicating uncertainties in time estimates that cannot be reduced by sequence data alone. Simulations are used to study the influence of among-lineage rate variation and the number of loci sampled on the uncertainty of divergence time estimates. The analysis suggests that posterior time estimates typically involve considerable uncertainties even with an infinite amount of sequence data, and that the reliability and precision of fossil calibrations are critically important to divergence time estimation. We apply our new algorithms to two empirical data sets and compare the results with those obtained in previous Bayesian and likelihood analyses. The results demonstrate the utility of our new algorithms.

Journal ArticleDOI
TL;DR: It is suggested that hierarchical Bayesian models can help to explain how overhypotheses about feature variability and the grouping of categories into ontological kinds like objects and substances are acquired.
Abstract: Inductive learning is impossible without overhypotheses, or constraints on the hypotheses considered by the learner. Some of these overhypotheses must be innate, but we suggest that hierarchical Bayesian models can help to explain how the rest are acquired. To illustrate this claim, we develop models that acquire two kinds of overhypotheses--overhypotheses about feature variability (e.g. the shape bias in word learning) and overhypotheses about the grouping of categories into ontological kinds like objects and substances.

Book
01 Jan 2007
TL;DR: In this article, the authors present a self-contained entry to computational Bayesian statistics, focusing on standard statistical models and backed up by discussed real datasets available from the book website.
Abstract: This Bayesian modeling book is intended for practitioners and applied statisticians looking for a self-contained entry to computational Bayesian statistics. Focusing on standard statistical models and backed up by discussed real datasets available from the book website, it provides an operational methodology for conducting Bayesian inference, rather than focusing on its theoretical justifications. Special attention is paid to the derivation of prior distributions in each case and specific reference solutions are given for each of the models. Similarly, computational details are worked out to lead the reader towards an effective programming of the methods given in the book.

Proceedings Article
01 Jun 2007
TL;DR: This model has the structure of a standard trigram HMM, yet its accuracy is closer to that of a state-of-the-art discriminative model (Smith and Eisner, 2005), up to 14 percentage points better than MLE.
Abstract: Unsupervised learning of linguistic structure is a difficult problem. A common approach is to define a generative model and maximize the probability of the hidden structure given the observed data. Typically, this is done using maximum-likelihood estimation (MLE) of the model parameters. We show using part-of-speech tagging that a fully Bayesian approach can greatly improve performance. Rather than estimating a single set of parameters, the Bayesian approach integrates over all possible parameter values. This difference ensures that the learned structure will have high probability over a range of possible parameters, and permits the use of priors favoring the sparse distributions that are typical of natural language. Our model has the structure of a standard trigram HMM, yet its accuracy is closer to that of a state-of-the-art discriminative model (Smith and Eisner, 2005), up to 14 percentage points better than MLE. We find improvements both when training from data alone, and using a tagging dictionary.

Journal ArticleDOI
TL;DR: The performance of a recent Bayesass method, bayesass, is evaluated, which allows the estimation of recent migration rates among populations, as well as the inbreeding coefficient of each local population.
Abstract: Bayesian methods have become extremely popular in molecular ecology studies because they allow us to estimate demographic parameters of complex demographic scenarios using genetic data. Articles presenting new methods generally include sensitivity studies that evaluate their performance, but they tend to be limited and need to be followed by a more thorough evaluation. Here we evaluate the performance of a recent method, bayesass, which allows the estimation of recent migration rates among populations, as well as the inbreeding coefficient of each local population. We expand the simulation study of the original publication by considering multi-allelic markers and scenarios with varying number of populations. We also investigate the effect of varying migration rates and F(ST) more thoroughly in order to identify the region of parameter space where the method is and is not able to provide accurate estimates of migration rate. Results indicate that if the demographic history of the species being studied fits the assumptions of the inference model, and if genetic differentiation is not too low (F(ST) > or = 0.05), then the method can give fairly accurate estimates of migration rates even when they are fairly high (about 0.1). However, when the assumptions of the inference model are violated, accurate estimates are obtained only if migration rates are very low (m = 0.01) and genetic differentiation is high (F(ST) > or = 0.10). Our results also show that using posterior assignment probabilities as an indication of how much confidence we can place on the assignments is problematical since the posterior probability of assignment can be very high even when the individual assignments are very inaccurate.

Journal ArticleDOI
TL;DR: Investigation of the effect of improper data partitioning on phylogenetic accuracy, as well as the type I error rate and sensitivity of Bayes factors, a commonly used method for choosing among different partitioning strategies in Bayesian analyses, suggest that model partitioning is important for large data sets.
Abstract: As larger, more complex data sets are being used to infer phylogenies, accuracy of these phylogenies increasingly requires models of evolution that accommodate heterogeneity in the processes of molecular evolution. We investigated the effect of improper data partitioning on phylogenetic accuracy, as well as the type I error rate and sensitivity of Bayes factors, a commonly used method for choosing among different partitioning strategies in Bayesian analyses. We also used Bayes factors to test empirical data for the need to divide data in a manner that has no expected biological meaning. Posterior probability estimates are misleading when an incorrect partitioning strategy is assumed. The error was greatest when the assumed model was underpartitioned. These results suggest that model partitioning is important for large data sets. Bayes factors performed well, giving a 5% type I error rate, which is remarkably consistent with standard frequentist hypothesis tests. The sensitivity of Bayes factors was found to be quite high when the across-class model heterogeneity reflected that of empirical data. These results suggest that Bayes factors represent a robust method of choosing among partitioning strategies. Lastly, results of tests for the inclusion of unexpected divisions in empirical data mirrored the simulation results, although the outcome of such tests is highly dependent on accounting for rate variation among classes. We conclude by discussing other approaches for partitioning data, as well as other applications of Bayes factors.

Journal ArticleDOI
TL;DR: In this article, a hierarchical framework for modeling speed and accuracy on test items is presented as an alternative to these models, allowing a "plug-and-play" approach with alternative choices of models for the response and response-time distributions as well as the distributions of their parameters.
Abstract: Current modeling of response times on test items has been strongly influenced by the paradigm of experimental reaction-time research in psychology. For instance, some of the models have a parameter structure that was chosen to represent a speed-accuracy tradeoff, while others equate speed directly with response time. Also, several response-time models seem to be unclear as to the level of parametrization they represent. A hierarchical framework for modeling speed and accuracy on test items is presented as an alternative to these models. The framework allows a "plug-and-play approach" with alternative choices of models for the response and response-time distributions as well as the distributions of their parameters. Bayesian treatment of the framework with Markov chain Monte Carlo (MCMC) computation facilitates the approach. Use of the framework is illustrated for the choice of a normal-ogive response model, a lognormal model for the response times, and multivariate normal models for their parameters with Gibbs sampling from the joint posterior distribution.

Proceedings ArticleDOI
20 Jun 2007
TL;DR: This work considers the problem of multi-task reinforcement learning, where the agent needs to solve a sequence of Markov Decision Processes chosen randomly from a fixed but unknown distribution, using a hierarchical Bayesian infinite mixture model.
Abstract: We consider the problem of multi-task reinforcement learning, where the agent needs to solve a sequence of Markov Decision Processes (MDPs) chosen randomly from a fixed but unknown distribution. We model the distribution over MDPs using a hierarchical Bayesian infinite mixture model. For each novel MDP, we use the previously learned distribution as an informed prior for modelbased Bayesian reinforcement learning. The hierarchical Bayesian framework provides a strong prior that allows us to rapidly infer the characteristics of new environments based on previous environments, while the use of a nonparametric model allows us to quickly adapt to environments we have not encountered before. In addition, the use of infinite mixtures allows for the model to automatically learn the number of underlying MDP components. We evaluate our approach and show that it leads to significant speedups in convergence to an optimal policy after observing only a small number of tasks.

Journal ArticleDOI
TL;DR: A dual principle provides an extended Bayesian framework for understanding the functional reasons for the integration of spatial cues and brings some order to the diversity by taking into account the subjective discrepancy in the dictates of multiple cues.
Abstract: Spatial judgments and actions are often based on multiple cues. The authors review a multitude of phenomena on the integration of spatial cues in diverse species to consider how nearly optimally animals combine the cues. Under the banner of Bayesian perception, cues are sometimes combined and weighted in a near optimal fashion. In other instances when cues are combined, how optimal the integration is might be unclear. Only 1 cue may be relied on, or cues may seem to compete with one another. The authors attempt to bring some order to the diversity by taking into account the subjective discrepancy in the dictates of multiple cues. When cues are too discrepant, it may be best to rely on 1 cue source. When cues are not too discrepant, it may be advantageous to combine cues. Such a dual principle provides an extended Bayesian framework for understanding the functional reasons for the integration of spatial cues.

Journal ArticleDOI
01 Apr 2007-Genetics
TL;DR: The notion of the mean population partition is developed, which is the partition of individuals to populations that minimizes the squared partition distance to the partitions sampled by the MCMC algorithm.
Abstract: Inferring population structure from genetic data sampled from some number of individuals is a formidable statistical problem. One widely used approach considers the number of populations to be fixed and calculates the posterior probability of assigning individuals to each population. More recently, the assignment of individuals to populations and the number of populations have both been considered random variables that follow a Dirichlet process prior. We examined the statistical behavior of assignment of individuals to populations under a Dirichlet process prior. First, we examined a best-case scenario, in which all of the assumptions of the Dirichlet process prior were satisfied, by generating data under a Dirichlet process prior. Second, we examined the performance of the method when the genetic data were generated under a population genetics model with symmetric migration between populations. We examined the accuracy of population assignment using a distance on partitions. The method can be quite accurate with a moderate number of loci. As expected, inferences on the number of populations are more accurate when θ = 4Neu is large and when the migration rate (4Nem) is low. We also examined the sensitivity of inferences of population structure to choice of the parameter of the Dirichlet process model. Although inferences could be sensitive to the choice of the prior on the number of populations, this sensitivity occurred when the number of loci sampled was small; inferences are more robust to the prior on the number of populations when the number of sampled loci is large. Finally, we discuss several methods for summarizing the results of a Bayesian Markov chain Monte Carlo (MCMC) analysis of population structure. We develop the notion of the mean population partition, which is the partition of individuals to populations that minimizes the squared partition distance to the partitions sampled by the MCMC algorithm.

Journal ArticleDOI
TL;DR: These findings quantify to what extent the inclusion of independent prior knowledge improves the network reconstruction accuracy, and the values of the hyperparameters inferred with the proposed scheme were found to be close to optimal with respect to minimizing the reconstruction error.
Abstract: There have been various attempts to reconstruct gene regulatory networks from microarray expression data in the past. However, owing to the limited amount of independent experimental conditions and noise inherent in the measurements, the results have been rather modest so far. For this reason it seems advisable to include biological prior knowledge, related, for instance, to transcription factor binding locations in promoter regions or partially known signalling pathways from the literature. In the present paper, we consider a Bayesian approach to systematically integrate expression data with multiple sources of prior knowledge. Each source is encoded via a separate energy function, from which a prior distribution over network structures in the form of a Gibbs distribution is constructed. The hyperparameters associated with the different sources of prior knowledge, which measure the influence of the respective prior relative to the data, are sampled from the posterior distribution with MCMC. We have evaluated the proposed scheme on the yeast cell cycle and the Raf signalling pathway. Our findings quantify to what extent the inclusion of independent prior knowledge improves the network reconstruction accuracy, and the values of the hyperparameters inferred with the proposed scheme were found to be close to optimal with respect to minimizing the reconstruction error.

Journal ArticleDOI
TL;DR: A Bayesian probabilistic inferential framework, which provides a natural means for incorporating both errors and prior information about the source, is presented and the inverse source determination method is validated against real data sets acquired in a highly disturbed flow field in an urban environment.