scispace - formally typeset
Search or ask a question

Showing papers on "Bayesian probability published in 2003"


Journal ArticleDOI
TL;DR: A new algorithm is introduced that combines the modeling strategy of one method with the computational strategies of another and outperforms all three existing methods for inferring haplotypes from genotype data in a population sample.
Abstract: In this report, we compare and contrast three previously published Bayesian methods for inferring haplotypes from genotype data in a population sample. We review the methods, emphasizing the differences between them in terms of both the models ("priors") they use and the computational strategies they employ. We introduce a new algorithm that combines the modeling strategy of one method with the computational strategies of another. In comparisons using real and simulated data, this new algorithm outperforms all three existing methods. The new algorithm is included in the software package PHASE, version 2.0, available online (http://www.stat.washington.edu/stephens/software.html).

3,556 citations


Proceedings Article
09 Dec 2003
TL;DR: A Bayesian approach is taken to generate an appropriate prior via a distribution on partitions that allows arbitrarily large branching factors and readily accommodates growing data collections.
Abstract: We address the problem of learning topic hierarchies from data. The model selection problem in this domain is daunting—which of the large collection of possible trees to use? We take a Bayesian approach, generating an appropriate prior via a distribution on partitions that we refer to as the nested Chinese restaurant process. This nonparametric prior allows arbitrarily large branching factors and readily accommodates growing data collections. We build a hierarchical topic model by combining this prior with a likelihood that is based on a hierarchical variant of latent Dirichlet allocation. We illustrate our approach on simulated data and with an application to the modeling of NIPS abstracts.

1,055 citations


Journal ArticleDOI
TL;DR: Computer simulation is used to investigate the behavior of three phylogenetic confidence methods: Bayesian posterior probabilities calculated via Markov chain Monte Carlo sampling (BMCMC-PP), maximum likelihood bootstrap proportion (ML-BP), and maximum parsimony boot strap proportion (MP-BP).
Abstract: Bayesian Markov chain Monte Carlo sampling has become increasingly popular in phylogenetics as a method for both estimating the maximum likelihood topology and for assessing nodal confidence. Despite the growing use of posterior probabilities, the relationship between the Bayesian measure of confidence and the most commonly used confidence measure in phylogenetics, the nonparametric bootstrap proportion, is poorly understood. We used computer simulation to investigate the behavior of three phylogenetic confidence methods: Bayesian posterior probabilities calculated via Markov chain Monte Carlo sampling (BMCMC-PP), maximum likelihood bootstrap proportion (ML-BP), and maximum parsimony bootstrap proportion (MP-BP). We simulated the evolution of DNA sequence on 17-taxon topologies under 18 evolutionary scenarios and examined the performance of these methods in assigning confidence to correct monophyletic and incorrect monophyletic groups, and we examined the effects of increasing character number on support value. BMCMC-PP and ML-BP were often strongly correlated with one another but could provide substantially different estimates of support on short internodes. In contrast, BMCMC-PP correlated poorly with MP-BP across most of the simulation conditions that we examined. For a given threshold value, more correct monophyletic groups were supported by BMCMC-PP than by either ML-BP or MP-BP. When threshold values were chosen that fixed the rate of accepting incorrect monophyletic relationship as true at 5%, all three methods recovered most of the correct relationships on the simulated topologies, although BMCMC-PP and ML-BP performed better than MP-BP. BMCMC-PP was usually a less biased predictor of phylogenetic accuracy than either bootstrapping method. BMCMC-PP provided high support values for correct topological bipartitions with fewer characters than was needed for nonparametric bootstrap.

949 citations


Journal ArticleDOI
01 Jan 2003-Genetics
TL;DR: A Bayesian method for estimating hidden population substructure using multilocus molecular markers and geographical information provided by the sampling design is introduced, suggesting that this method is capable of estimating a population subst structure, while not artificially enforcing a substructure when it does not exist.
Abstract: We introduce a Bayesian method for estimating hidden population substructure using multilocus molecular markers and geographical information provided by the sampling design. The joint posterior distribution of the substructure and allele frequencies of the respective populations is available in an analytical form when the number of populations is small, whereas an approximation based on a Markov chain Monte Carlo simulation approach can be obtained for a moderate or large number of populations. Using the joint posterior distribution, posteriors can also be derived for any evolutionary population parameters, such as the traditional fixation indices. A major advantage compared to most earlier methods is that the number of populations is treated here as an unknown parameter. What is traditionally considered as two genetically distinct populations, either recently founded or connected by considerable gene flow, is here considered as one panmictic population with a certain probability based on marker data and prior information. Analyses of previously published data on the Moroccan argan tree (Argania spinosa) and of simulated data sets suggest that our method is capable of estimating a population substructure, while not artificially enforcing a substructure when it does not exist. The software (BAPS) used for the computations is freely available from http://www.rni.helsinki.fi/~mjs.

855 citations


Journal Article
TL;DR: The posterior mean deviance is suggested as a Bayesian measure of fit or adequacy, and the contributions of individual observations to the fit and complexity can give rise to a diagnostic plot of deviance residuals against leverages.
Abstract: Summary. We consider the problem of comparing complex hierarchical models in which the number of parameters is not clearly defined. Using an information theoretic argument we derive a measure pD for the effective number of parameters in a model as the difference between the posterior mean of the deviance and the deviance at the posterior means of the parameters of interest. In general pD approximately corresponds to the trace of the product of Fisher's information and the posterior covariance, which in normal models is the trace of the ‘hat’ matrix projecting observations onto fitted values. Its properties in exponential families are explored. The posterior mean deviance is suggested as a Bayesian measure of fit or adequacy, and the contributions of individual observations to the fit and complexity can give rise to a diagnostic plot of deviance residuals against leverages. Adding pD to the posterior mean deviance gives a deviance information criterion for comparing models, which is related to other information criteria and has an approximate decision theoretic justification. The procedure is illustrated in some examples, and comparisons are drawn with alternative Bayesian and classical proposals. Throughout it is emphasized that the quantities required are trivial to compute in a Markov chain Monte Carlo analysis.

763 citations


Journal ArticleDOI
TL;DR: It is shown that Bayesian posterior probabilities are significantly higher than corresponding nonparametric bootstrap frequencies for true clades, but also that erroneous conclusions will be made more often.
Abstract: Many empirical studies have revealed considerable differences between nonparametric bootstrapping and Bayesian posterior probabilities in terms of the support values for branches, despite claimed predictions about their approximate equivalence. We investigated this problem by simulating data, which were then analyzed by maximum likelihood bootstrapping and Bayesian phylogenetic analysis using identical models and reoptimization of parameter values. We show that Bayesian posterior probabilities are significantly higher than corresponding nonparametric bootstrap frequencies for true clades, but also that erroneous conclusions will be made more often. These errors are strongly accentuated when the models used for analyses are underparameterized. When data are analyzed under the correct model, nonparametric bootstrapping is conservative. Bayesian posterior probabilities are also conservative in this respect, but less so.

620 citations


Journal ArticleDOI
TL;DR: A Bayesian approach to supervised learning, which leads to sparse solutions; that is, in which irrelevant parameters are automatically set exactly to zero, and involves no tuning or adjustment of sparseness-controlling hyperparameters.
Abstract: The goal of supervised learning is to infer a functional mapping based on a set of training examples. To achieve good generalization, it is necessary to control the "complexity" of the learned function. In Bayesian approaches, this is done by adopting a prior for the parameters of the function being learned. We propose a Bayesian approach to supervised learning, which leads to sparse solutions; that is, in which irrelevant parameters are automatically set exactly to zero. Other ways to obtain sparse classifiers (such as Laplacian priors, support vector machines) involve (hyper)parameters which control the degree of sparseness of the resulting classifiers; these parameters have to be somehow adjusted/estimated from the training data. In contrast, our approach does not involve any (hyper)parameters to be adjusted or estimated. This is achieved by a hierarchical-Bayes interpretation of the Laplacian prior, which is then modified by the adoption of a Jeffreys' noninformative hyperprior. Implementation is carried out by an expectation-maximization (EM) algorithm. Experiments with several benchmark data sets show that the proposed approach yields state-of-the-art performance. In particular, our method outperforms SVMs and performs competitively with the best alternative techniques, although it involves no tuning or adjustment of sparseness-controlling hyperparameters.

579 citations


16 Sep 2003
TL;DR: This method constructs and optimises a lower bound on the marginal likelihood using variational calculus, resulting in an iterative algorithm which generalises the EM algorithm by maintaining posterior distributions over both latent variables and parameters.
Abstract: We present an efficient procedure for estimating the marginal likelihood of probabilistic models with latent variables or incomplete data. This method constructs and optimises a lower bound on the marginal likelihood using variational calculus, resulting in an iterative algorithm which generalises the EM algorithm by maintaining posterior distributions over both latent variables and parameters. We define the family of conjugate-exponential models—which includes finite mixtures of exponential family models, factor analysis, hidden Markov models, linear state-space models, and other models of interest—for which this bound on the marginal likelihood can be computed very simply through a modification of the standard EM algorithm. In particular, we focus on applying these bounds to the problem of scoring discrete directed graphical model structures (Bayesian networks). Extensive simulations comparing the variational bounds to the usual approach based on the Bayesian Information Criterion (BIC) and to a sampling-based gold standard method known as Annealed Importance Sampling (AIS) show that variational bounds substantially outperform BIC in finding the correct model structure at relatively little computational cost, while approaching the performance of the much more costly AIS procedure. Using AIS allows us to provide the first serious case study of the tightness of variational bounds. We also analyse the perfomance of AIS through a variety of criteria, discuss the use of other variational approaches to estimating marginal likelihoods based on Bethe and Kikuchi approximations, and outline directions in which this work can be extended.

527 citations


Journal ArticleDOI
TL;DR: The nonparametric bootstrap resampling procedure is applied to the Bayesian approach and shows that the relation between posterior probabilities and bootstrapped maximum likelihood percentages is highly variable but that very strong correlations always exist when Bayesian node support is estimated onbootstrapped character matrices.
Abstract: Owing to the exponential growth of genome databases, phylogenetic trees are now widely used to test a variety of evolutionary hypotheses. Nevertheless, computation time burden limits the application of methods such as maximum likelihood nonparametric bootstrap to assess reliability of evolutionary trees. As an alternative, the much faster Bayesian inference of phylogeny, which expresses branch support as posterior probabilities, has been introduced. However, marked discrepancies exist between nonparametric bootstrap proportions and Bayesian posterior probabilities, leading to difficulties in the interpretation of sometimes strongly conflicting results. As an attempt to reconcile these two indices of node reliability, we apply the nonparametric bootstrap resampling procedure to the Bayesian approach. The correlation between posterior probabilities, bootstrap maximum likelihood percentages, and bootstrapped posterior probabilities was studied for eight highly diverse empirical data sets and were also investigated using experimental simulation. Our results show that the relation between posterior probabilities and bootstrapped maximum likelihood percentages is highly variable but that very strong correlations always exist when Bayesian node support is estimated on bootstrapped character matrices. Moreover, simulations corroborate empirical observations in suggesting that, being more conservative, the bootstrap approach might be less prone to strongly supporting a false phylogenetic hypothesis. Thus, apparent conflicts in topology recovered by the Bayesian approach were reduced after bootstrapping. Both posterior probabilities and bootstrap supports are of great interest to phylogeny as potential upper and lower bounds of node reliability, but they are surely not interchangeable and cannot be directly compared.

501 citations


Journal ArticleDOI
TL;DR: A hierarchical Bayesian model for gene (variable) selection is proposed and applied to cancer classification via cDNA microarrays where the genes BRCA1 and BRCa2 are associated with a hereditary disposition to breast cancer, and the method is used to identify a set of significant genes.
Abstract: Selection of significant genes via expression patterns is an important problem in microarray experiments. Owing to small sample size and the large number of variables (genes), the selection process can be unstable. This paper proposes a hierarchical Bayesian model for gene (variable) selection. We employ latent variables to specialize the model to a regression setting and uses a Bayesian mixture prior to perform the variable selection. We control the size of the model by assigning a prior distribution over the dimension (number of significant genes) of the model. The posterior distributions of the parameters are not in explicit form and we need to use a combination of truncated sampling and Markov Chain Monte Carlo (MCMC) based computation techniques to simulate the parameters from the posteriors. The Bayesian model is flexible enough to identify significant genes as well as to perform future predictions. The method is applied to cancer classification via cDNA microarrays where the genes BRCA1 and BRCA2 are associated with a hereditary disposition to breast cancer, and the method is used to identify a set of significant genes. The method is also applied successfully to the leukemia data.

382 citations


Journal ArticleDOI
TL;DR: A general framework for DBN modelling is outlined and both discrete and continuous DBN models are constructed systematically and criteria for learning network structures are introduced from a Bayesian statistical viewpoint.
Abstract: Dynamic Bayesian networks (DBNs) are considered as a promising model for inferring gene networks from time series microarray data. DBNs have overtaken Bayesian networks (BNs) as DBNs can construct cyclic regulations using time delay information. In this paper, a general framework for DBN modelling is outlined. Both discrete and continuous DBN models are constructed systematically and criteria for learning network structures are introduced from a Bayesian statistical viewpoint. This paper reviews the applications of DBNs over the past years. Real data applications for Saccharomyces cerevisiae time series gene expression data are also shown.

Journal ArticleDOI
TL;DR: By applying the PAC-Bayesian theorem of McAllester (1999a), this paper proves distribution-free generalisation error bounds for a wide range of approximate Bayesian GP classification techniques, giving a strong learning-theoretical justification for the use of these techniques.
Abstract: Approximate Bayesian Gaussian process (GP) classification techniques are powerful non-parametric learning methods, similar in appearance and performance to support vector machines. Based on simple probabilistic models, they render interpretable results and can be embedded in Bayesian frameworks for model selection, feature selection, etc. In this paper, by applying the PAC-Bayesian theorem of McAllester (1999a), we prove distribution-free generalisation error bounds for a wide range of approximate Bayesian GP classification techniques. We also provide a new and much simplified proof for this powerful theorem, making use of the concept of convex duality which is a backbone of many machine learning techniques. We instantiate and test our bounds for two particular GPC techniques, including a recent sparse method which circumvents the unfavourable scaling of standard GP algorithms. As is shown in experiments on a real-world task, the bounds can be very tight for moderate training sample sizes. To the best of our knowledge, these results provide the tightest known distribution-free error bounds for approximate Bayesian GPC methods, giving a strong learning-theoretical justification for the use of these techniques.

Journal ArticleDOI
TL;DR: The results corroborate the findings of others that posterior probability values are excessively high and suggest that extrapolations from single topology branch-length studies are unlikely to provide any general conclusions regarding the relationship between bootstrap and posterior probabilities.
Abstract: Assessment of the reliability of a given phylogenetic hypothesis is an important step in phylogenetic analysis. Historically, the nonparametric bootstrap procedure has been the most frequently used method for assessing the support for specific phylogenetic relationships. The recent employment of Bayesian methods for phylogenetic inference problems has resulted in clade support being expressed in terms of posterior probabilities. We used simulated data and the four-taxon case to explore the relationship between nonparametric bootstrap values (as inferred by maximum likelihood) and posterior probabilities (as inferred by Bayesian analysis). The results suggest a complex association between the two measures. Three general regions of tree space can be identified: (1) the neutral zone, where differences between mean bootstrap and mean posterior probability values are not significant, (2) near the two-branch corner, and (3) deep in the two-branch corner. In the last two regions, significant differences occur between mean bootstrap and mean posterior probability values. Whether bootstrap or posterior probability values are higher depends on the data in support of alternative topologies. Examination of star topologies revealed that both bootstrap and posterior probability values differ significantly from theoretical expectations; in particular, there are more posterior probability values in the range 0.85-1 than expected by theory. Therefore, our results corroborate the findings of others that posterior probability values are excessively high. Our results also suggest that extrapolations from single topology branch-length studies are unlikely to provide any general conclusions regarding the relationship between bootstrap and posterior probability values. (Bayesian analysis; Markov chain Monte Carlo sampling; maximum likelihood; phylogenetics.)

Book ChapterDOI
Michael E. Tipping1
TL;DR: This article gives a basic introduction to the principles of Bayesian inference in a machine learning context, with an emphasis on the importance of marginalisation for dealing with uncertainty.
Abstract: This article gives a basic introduction to the principles of Bayesian inference in a machine learning context, with an emphasis on the importance of marginalisation for dealing with uncertainty. We begin by illustrating concepts via a simple regression task before relating ideas to practical, contemporary, techniques with a description of ‘sparse Bayesian’ models and the ‘relevance vector machine’.

Journal ArticleDOI
David McAllester1
TL;DR: A PAC-Bayesian performance guarantee for stochastic model selection that is superior to analogous guarantees for deterministic model selection and shown that the posterior optimizing the performance guarantee is a Gibbs distribution.
Abstract: PAC-Bayesian learning methods combine the informative priors of Bayesian methods with distribution-free PAC guarantees. Stochastic model selection predicts a class label by stochastically sampling a classifier according to a “posterior distribution” on classifiers. This paper gives a PAC-Bayesian performance guarantee for stochastic model selection that is superior to analogous guarantees for deterministic model selection. The guarantee is stated in terms of the training error of the stochastic classifier and the KL-divergence of the posterior from the prior. It is shown that the posterior optimizing the performance guarantee is a Gibbs distribution. Simpler posterior distributions are also derived that have nearly optimal performance guarantees.

Journal ArticleDOI
TL;DR: In this paper, a Bayesian framework for exploratory data analysis based on posterior predictive checks is presented, which can be used to create reference distributions for EDA graphs, and how this approach resolves some theoretical problems in Bayesian data analysis.
Abstract: Summary Exploratory data analysis (EDA) and Bayesian inference (or, more generally, complex statistical modeling)—which are generally considered as unrelated statistical paradigms—can be particularly effective in combination. In this paper, we present a Bayesian framework for EDA based on posterior predictive checks. We explain how posterior predictive simulations can be used to create reference distributions for EDA graphs, and how this approach resolves some theoretical problems in Bayesian data analysis. We show how the generalization of Bayesian inference to include replicated data yrep and replicated parameters θrep follows a long tradition of generalizations in Bayesian theory. On the theoretical level, we present a predictive Bayesian formulation of goodness-of-fit testing, distinguishing between p-values (posterior probabilities that specified antisymmetric discrepancy measures will exceed 0) and u-values (data summaries with uniform sampling distributions). We explain that p-values, unlike u-values, are Bayesian probability statements in that they condition on observed data. Having reviewed the general theoretical framework, we discuss the implications for statistical graphics and exploratory data analysis, with the goal being to unify exploratory data analysis with more formal statistical methods based on probability models. We interpret various graphical displays as posterior predictive checks and discuss how Bayesian inference can be used to determine reference distributions. The goal of this work is not to downgrade descriptive statistics, or to suggest they be replaced by Bayesian modeling, but rather to suggest how exploratory data analysis fits into the probability-modeling paradigm. We conclude with a discussion of the implications for practical Bayesian inference. In particular, we anticipate that Bayesian software can be generalized to draw simulations of replicated data and parameters from their posterior predictive distribution, and these can in turn be used to calibrate EDA graphs.

Journal ArticleDOI
TL;DR: A family of hierarchical Bayesian models is developed which allows for the simultaneous inference of informant accuracy and social structure in the presence of measurement error and missing data.

Journal ArticleDOI
TL;DR: The multiple sources of uncertainty that are relevant to such models, and their relation to either probabilistic or deterministic sensitivity analysis are described, and a Bayesian approach appears natural in this context.
Abstract: Increasingly complex models are being used to evaluate the cost-effectiveness of medical interventions. We describe the multiple sources of uncertainty that are relevant to such models, and their relation to either probabilistic or deterministic sensitivity analysis. A Bayesian approach appears natural in this context. We explore how sensitivity analysis to patient heterogeneity and parameter uncertainty can be simultaneously investigated, and illustrate the necessary computation when expected costs and benefits can be calculated in closed form, such as in discrete-time discrete-state Markov models. Information about parameters can either be expressed as a prior distribution, or derived as a posterior distribution given a generalized synthesis of available data in which multiple sources of evidence can be differentially weighted according to their assumed quality. The resulting joint posterior distributions on costs and benefits can then provide inferences on incremental cost-effectiveness, best presented as posterior distributions over net-benefit and cost-effectiveness acceptability curves. These ideas are illustrated with a detailed running example concerning the cost-effectiveness of hip prostheses in different age-sex subgroups. All computations are carried out using freely available software for conducting Markov chain Monte Carlo analysis.

Journal ArticleDOI
TL;DR: The Variational Bayesian (VB) framework is made use which approximates the true posterior density with a factorised density and provides a natural extension to previous Bayesian analyses which have used Empirical Bayes.

Dissertation
01 Jul 2003
TL;DR: The tractability and usefulness of simple greedy forward selection with information-theoretic criteria previously used in active learning is demonstrated and generic schemes for automatic model selection with many (hyper)parameters are developed.
Abstract: Non-parametric models and techniques enjoy a growing popularity in the field of machine learning, and among these Bayesian inference for Gaussian process (GP) models has recently received significant attention. We feel that GP priors should be part of the standard toolbox for constructing models relevant to machine learning in the same way as parametric linear models are, and the results in this thesis help to remove some obstacles on the way towards this goal. In the first main chapter, we provide a distribution-free finite sample bound on the difference between generalisation and empirical (training) error for GP classification methods. While the general theorem (the PAC-Bayesian bound) is not new, we give a much simplified and somewhat generalised derivation and point out the underlying core technique (convex duality) explicitly. Furthermore, the application to GP models is novel (to our knowledge). A central feature of this bound is that its quality depends crucially on task knowledge being encoded faithfully in the model and prior distributions, so there is a mutual benefit between a sharp theoretical guarantee and empirically well-established statistical practices. Extensive simulations on real-world classification tasks indicate an impressive tightness of the bound, in spite of the fact that many previous bounds for related kernel machines fail to give non-trivial guarantees in this practically relevant regime. In the second main chapter, sparse approximations are developed to address the problem of the unfavourable scaling of most GP techniques with large training sets. Due to its high importance in practice, this problem has received a lot of attention recently. We demonstrate the tractability and usefulness of simple greedy forward selection with information-theoretic criteria previously used in active learning (or sequential design) and develop generic schemes for automatic model selection with many (hyper)parameters. We suggest two new generic schemes and evaluate some of their variants on large real-world classification and regression tasks. These schemes and their underlying principles (which are clearly stated and analysed) can be applied to obtain sparse approximations for a wide regime of GP models far beyond the special cases we studied here.

16 Sep 2003
TL;DR: This work proposes to use a Gaussian Process model of the (log of the) posterior for most of the computations required by HMC, allowing Bayesian treatment of models with posteriors that are computationally demanding, such as models involving computer simulation.
Abstract: Hybrid Monte Carlo (HMC) is often the method of choice for computing Bayesian integrals that are not analytically tractable. However the success of this method may require a very large number of evaluations of the (un-normalized) posterior and its partial derivatives. In situations where the posterior is computationally costly to evaluate, this may lead to an unacceptable computational load for HMC. I propose to use a Gaussian Process model of the (log of the) posterior for most of the computations required by HMC. Within this scheme only occasional evaluation of the actual posterior is required to guarantee that the samples generated have exactly the desired distribution, even if the GP model is somewhat inaccurate. The method is demonstrated on a 10 dimensional problem, where 200 evaluations suffice for the generation of 100 roughly independent points from the posterior. Thus, the proposed scheme allows Bayesian treatment of models with posteriors that are computationally demanding, such as models involving computer simulation.

Book
01 Jan 2003
TL;DR: This book discusses Bayes' Theorem, Bayesian Inference in the General Linear Model, and Applications of Bayesian Statistical Science.
Abstract: Preface. Preface to the First Edition. A Bayesian Hall of Fame. PART I: FOUNDATIONS AND PRINCIPLES. 1. Background. 2. A Bayesian Perspective on Probability. 3. The Likelihood Function. 4. Bayes' Theorem. 5. Prior Distributions. PART II: NUMERICAL IMPLEMENTATION OF THE BAYESIAN PARADIGM. 6. Markov Chain Monte Carlo Methods (Siddhartha Chib). 7. Large Sample Posterior Distributions and Approximations. PART III: BAYESIAN STATISTICAL INFERENCE AND DECISION MAKING. 8. Bayesian Estimation. 9. Bayesian Hypothesis Testing. 10. Predictivism. 11. Bayesian Decision Making. PART IV: MODELS AND APPLICATIONS. 12. Bayesian Inference in the General Linear Model. 13. Model Averaging (Merlise Clyde). 14. Hierarchical Bayesian Modeling (Alan Zaslavsky). 15. Bayesian Factor Analysis. 16. Bayesian Inference in Classification and Discrimination. Description of Appendices. Appendix 1. Bayes, Thomas, (Hilary L. Seal). Appendix 2. Thomas Bayes. A Bibliographical Note (George A. Barnard). Appendix 3. Communication of Bayes' Essay to the Philosophical Transactions of the Royal Society of London (Richard Price). Appendix 4. An Essay Towards Solving a Problem in the Doctrine of Chances (Reverend Thomas Bayes). Appendix 5. Applications of Bayesian Statistical Science. Appendix 6. Selecting the Bayesian Hall of Fame. Appendix 7. Solutions to Selected Exercises. Bibliography. Subject Index. Author Index.

Journal ArticleDOI
TL;DR: The authors argue that the Dirichlet distribution, the multivariate equivalent of the beta distribution, is appropriate for this purpose and illustrate its use for generating a fully probabilistic transition matrix for a Markov model.
Abstract: In structuring decision models of medical interventions, it is commonly recommended that only 2 branches be used for each chance node to avoid logical inconsistencies that can arise during sensitivity analyses if the branching probabilities do not sum to 1. However, information may be naturally available in an unconditional form, and structuring a tree in conditional form may complicate rather than simplify the sensitivity analysis of the unconditional probabilities. Current guidance emphasizes using probabilistic sensitivity analysis, and a method is required to provide probabilistic probabilities over multiple branches that appropriately represents uncertainty while satisfying the requirement that mutually exclusive event probabilities should sum to 1. The authors argue that the Dirichlet distribution, the multivariate equivalent of the beta distribution, is appropriate for this purpose and illustrate its use for generating a fully probabilistic transition matrix for a Markov model. Furthermore, they demonstrate that by adopting a Bayesian approach, the problem of observing zero counts for transitions of interest can be overcome.

Journal Article
TL;DR: Adopting a Bayesian perspective, the Bayesian SSD problem is moved from the rather elementary models addressed in the literature to date in the direction of the wide range of hierarchical models which dominate the current Bayesian landscape.
Abstract: Sample size determination (SSD) is a crucial aspect of experimental design. Two SSD problems are considered here. The first concerns how to select a sample size to achieve specified performance with regard to one or more features of a model. Adopting a Bayesian perspective, we move the Bayesian SSD problem from the rather elementary models addressed in the literature to date in the direction of the wide range of hierarchical models which dominate the current Bayesian landscape. Our approach is generic and thus, in principle, broadly applicable. However, it requires full model specification and computationally intensive simulation, perhaps limiting it practically to simple instances of such models. Still, insight from such cases is of useful design value. In addition, we present some theoretical tools for studying performance as a function of sample size, with a variety of illustrative results. Such results provide guidance with regard to what is achievable. We also offer two examples, a survival model with censoring and a logistic regression model. The second problem concerns how to select a sample size to achieve specified separation of two models. We approach this problem by adopting a screening criterion which in turn forms a model choice criterion. This criterion is set up to choose model 1 when the value is large, model 2 when the value is small. The SSD problem then requires choosing $n_{1}$ to make the probability of selecting model 1 when model 1 is true sufficiently large and choosing $n_{2}$ to make the probability of selecting model 2 when model 2 is true sufficiently large. The required n is $\max(n_{1}, n_{2})$. Here, we again provide two illustrations. One considers separating normal errors from t errors, the other separating a common growth curve model from a model with individual growth curves.

Proceedings ArticleDOI
18 Jun 2003
TL;DR: A classification driven stochastic structure search algorithm for learning the structure of Bayesian network classifiers is introduced, and it is shown that with moderate size labeled training sets and large amount of unlabeled data, the method can utilize unlabeling data to improve classification performance.
Abstract: Understanding human emotions is one of the necessary skills for the computer to interact intelligently with human users. The most expressive way humans display emotions is through facial expressions. In this paper, we report on several advances we have made in building a system for classification of facial expressions from continuous video input. We use Bayesian network classifiers for classifying expressions from video. One of the motivating factor in using the Bayesian network classifiers is their ability to handle missing data, both during inference and training. In particular, we are interested in the problem of learning with both labeled and unlabeled data. We show that when using unlabeled data to learn classifiers, using correct modeling assumptions is critical for achieving improved classification performance. Motivated by this, we introduce a classification driven stochastic structure search algorithm for learning the structure of Bayesian network classifiers. We show that with moderate size labeled training sets and large amount of unlabeled data, our method can utilize unlabeled data to improve classification performance. We also provide results using the Naive Bayes (NB) and the Tree-Augmented Naive Bayes (TAN) classifiers, showing that the two can achieve good performance with labeled training sets, but perform poorly when unlabeled data are added to the training set.

Journal ArticleDOI
TL;DR: An algorithm for building ensembles of simple Bayesian classifiers in random subspaces, which includes a hill-climbing-based refinement cycle, which tries to improve the accuracy and diversity of the base classifiers built on random feature subsets.

Journal ArticleDOI
01 Jun 2003-Ecology
TL;DR: It is shown how standard time-series models for population dynamics can be extended to include both observational and process error and how to perform inference on parameters in these models in the Bayesian setting.
Abstract: Many standard statistical models used to examine population dynamics ignore significant sources of stochasticity. Usually only process error is included, and uncertainty due to errors in data collection is omitted or not directly specified in the model. We show how standard time-series models for population dynamics can be extended to include both observational and process error and how to perform inference on parameters in these models in the Bayesian setting. Using simulated data, we show how ignoring observation error can be misleading. We argue that the standard Bayesian techniques used to perform inference, including freely available software, are generally applicable to a variety of time-series models. Corresponding Editor: O. N. Bjornstad.

Journal Article
TL;DR: This paper considers the class of Bayesian mixture algorithms, where an estimator is formed by constructing a data-dependent mixture over some hypothesis space, and demonstrates that mixture approaches are particularly robust, and allow for the construction of highly complex estimators, while avoiding undesirable overfitting effects.
Abstract: Bayesian approaches to learning and estimation have played a significant role in the Statistics literature over many years. While they are often provably optimal in a frequentist setting, and lead to excellent performance in practical applications, there have not been many precise characterizations of their performance for finite sample sizes under general conditions. In this paper we consider the class of Bayesian mixture algorithms, where an estimator is formed by constructing a data-dependent mixture over some hypothesis space. Similarly to what is observed in practice, our results demonstrate that mixture approaches are particularly robust, and allow for the construction of highly complex estimators, while avoiding undesirable overfitting effects. Our results, while being data-dependent in nature, are insensitive to the underlying model assumptions, and apply whether or not these hold. At a technical level, the approach applies to unbounded functions, constrained only by certain moment conditions. Finally, the bounds derived can be directly applied to non-Bayesian mixture approaches such as Boosting and Bagging.

Journal ArticleDOI
TL;DR: A method for comparingSemiparametric Bayesian models, constructed under the Dirichlet process mixture (DPM) framework, with alternative semiparameteric or parametericBayesian models is presented, apparently the first method of its kind.
Abstract: We present a method for comparing semiparametric Bayesian models, constructed under the Dirichlet process mixture (DPM) framework, with alternative semiparameteric or parameteric Bayesian models. A distinctive feature of the method is that it can be applied to semiparametric models containing covariates and hierarchical prior structures, and is apparently the first method of its kind. Formally, the method is based on the marginal likelihood estimation approach of Chib (1995) and requires estimation of the likelihood and posterior ordinates of the DPM model at a single high-density point. An interesting computation is involved in the estimation of the likelihood ordinate, which is devised via collapsed sequential importance sampling. Extensive experiments with synthetic and real data involving semiparametric binary data regression models and hierarchical longitudinal mixed-effects models are used to illustrate the implementation, performance, and applicability of the method.

Journal ArticleDOI
TL;DR: An importance sampling approximation is considered that can be improved upon through replication of both random effects and data and may aid the criticism of any Bayesian hierarchical model.
Abstract: When fitting complex hierarchical disease mapping models, it can be important to identify regions that diverge from the assumed model. Since full leave-one-out cross-validatory assessment is extremely time-consuming when using Markov chain Monte Carlo (MCMC) estimation methods, Stern and Cressie consider an importance sampling approximation. We show that this can be improved upon through replication of both random effects and data. Our approach is simple to apply, entirely generic, and may aid the criticism of any Bayesian hierarchical model.