scispace - formally typeset
Search or ask a question

Showing papers on "Bayesian inference published in 2003"


Book
17 Dec 2003
TL;DR: Matrix Theory and Spatial Computing Methods Answers to Selected Exercises REFERENCES AUTHOR INDEX SUBJECT INDEX Short TOC
Abstract: OVERVIEW OF SPATIAL DATA PROBLEMS Introduction to Spatial Data and Models Fundamentals of Cartography Exercises BASICS OF POINT-REFERENCED DATA MODELS Elements of Point-Referenced Modeling Spatial Process Models Exploratory Approaches for Point-Referenced Data Classical Spatial Prediction Computer Tutorials Exercises BASICS OF AREAL DATA MODELS Exploratory Approaches for Areal Data Brook's Lemma and Markov Random Fields Conditionally Autoregressive (CAR) Models Simultaneous Autoregressive (SAR) Models Computer Tutorials Exercises BASICS OF BAYESIAN INFERENCE Introduction to Hierarchical Modeling and Bayes Theorem Bayesian Inference Bayesian Computation Computer Tutorials Exercises HIERARCHICAL MODELING FOR UNIVARIATE SPATIAL DATA Stationary Spatial Process Models Generalized Linear Spatial Process Modeling Nonstationary Spatial Process Models Areal Data Models General Linear Areal Data Modeling Exercises SPATIAL MISALIGNMENT Point-Level Modeling Nested Block-Level Modeling Nonnested Block-Level Modeling Misaligned Regression Modeling Exercises MULTIVARIATE SPATIAL MODELING Separable Models Coregionalization Models Other Constructive Approaches Multivariate Models for Areal Data Exercises SPATIOTEMPORAL MODELING General Modeling Formulation Point-Level Modeling with Continuous Time Nonseparable Spatio-Temporal Models Dynamic Spatio-Temporal Models Block-Level Modeling Exercises SPATIAL SURVIVAL MODELS Parametric Models Semiparametric Models Spatio-Temporal Models Multivariate Models Spatial Cure Rate Models Exercises SPECIAL TOPICS IN SPATIAL PROCESS MODELING Process Smoothness Revisited Spatially Varying Coefficient Models Spatial CDFs APPENDICES Matrix Theory and Spatial Computing Methods Answers to Selected Exercises REFERENCES AUTHOR INDEX SUBJECT INDEX Short TOC

2,991 citations


Journal ArticleDOI
TL;DR: This work proposes a new theoretical setting based on the mathematical framework of hierarchical Bayesian inference for reasoning about the visual system, and suggests that the algorithms of particle filtering and Bayesian-belief propagation might model these interactive cortical computations.
Abstract: Traditional views of visual processing suggest that early visual neurons in areas V1 and V2 are static spatiotemporal filters that extract local features from a visual scene. The extracted information is then channeled through a feedforward chain of modules in successively higher visual areas for further analysis. Recent electrophysiological recordings from early visual neurons in awake behaving monkeys reveal that there are many levels of complexity in the information processing of the early visual cortex, as seen in the long-latency responses of its neurons. These new findings suggest that activity in the early visual cortex is tightly coupled and highly interactive with the rest of the visual system. They lead us to propose a new theoretical setting based on the mathematical framework of hierarchical Bayesian inference for reasoning about the visual system. In this framework, the recurrent feedforward/feedback loops in the cortex serve to integrate top-down contextual priors and bottom-up observations so as to implement concurrent probabilistic inference along the visual hierarchy. We suggest that the algorithms of particle filtering and Bayesian-belief propagation might model these interactive cortical computations. We review some recent neurophysiological evidences that support the plausibility of these ideas.

1,431 citations


Book
25 Sep 2003
TL;DR: The aim of this chapter is to clarify the role of simulation in the development of Markov Point Processes and to discuss its application in the context of Unified Framework Space-Time Processes.
Abstract: EXAMPLES OF SPATIAL POINT PATTERNS INTRODUCTION TO POINT PROCESSES Point Processes on R^d Marked Point Processes and Multivariate Point Processes Unified Framework Space-Time Processes POISSON POINT PROCESSES Basic Properties Further Results Marked Poisson Processes SUMMARY STATISTICS First and Second Order Properties Summary Statistics Nonparametric Estimation Summary Statistics for Multivariate Point Processes Summary Statistics for Marked Point Processes COX PROCESSES Definition and Simple Examples Basic Properties Neyman-Scott Processes as Cox Processes Shot Noise Cox Processes Approximate Simulation of SNCPs Log Gaussian Cox Processes Simulation of Gaussian Fields and LGCPs Multivariate Cox Processes MARKOV POINT PROCESSES Finite Point Processes with a Density Pairwise Interaction Point Processes Markov Point Processes Extensions of Markov Point Processes to R^d Inhomogeneous Markov Point Processes Marked and Multivariate Markov Point Processes METROPOLIS-HASTINGS ALGORITHMS Description of Algorithms Background Material for Markov Chains Convergence Properties of Algorithms SIMULATION-BASED INFERENCE Monte Carlo Methods and Output Analysis Estimation of Ratios of Normalising Constants Approximate Likelihood Inference Using MCMC Monte Carlo Error Distribution of Estimates and Hypothesis Tests Approximate MissingData Likelihoods INFERENCE FOR MARKOV POINT PROCESSES Maximum Likelihood Inference Pseudo Likelihood Bayesian Inference INFERENCE FOR COX PROCESSES Minimum Contrast Estimation Conditional Simulation and Prediction Maximum Likelihood Inference Bayesian Inference BIRTH-DEATH PROCESSES AND PERFECT SIMULATION Spatial Birth-Death Processes Perfect Simulation APPENDICES History, Bibliography, and Software Measure Theoretical Details Moment Measures and Palm Distributions Perfect Simulation of SNCPs Simulation of Gaussian Fields Nearest-Neighbour Markov Point Processes Results for Spatial Birth-Death Processes References Subject Index Notation Index

1,167 citations


Journal ArticleDOI
TL;DR: This paper axiomatizes an intertemporal version of multiple-ptiors utility that is consistent with a rich set of possibilities for dynamic behavior under ambiguity and argues that dynamic consitency is intuitive in a wide range of situations.

900 citations


Journal ArticleDOI
TL;DR: In this paper, a general large-sample likelihood apparatus is presented, in which limiting distributions and risk properties of estimators post-selection as well as of model average estimators are precisely described, also explicitly taking modeling bias into account.
Abstract: The traditional use of model selection methods in practice is to proceed as if the final selected model had been chosen in advance, without acknowledging the additional uncertainty introduced by model selection. This often means underreporting of variability and too optimistic confidence intervals. We build a general large-sample likelihood apparatus in which limiting distributions and risk properties of estimators post-selection as well as of model average estimators are precisely described, also explicitly taking modeling bias into account. This allows a drastic reduction in complexity, as competing model averaging schemes may be developed, discussed, and compared inside a statistical prototype experiment where only a few crucial quantities matter. In particular, we offer a frequentist view on Bayesian model averaging methods and give a link to generalized ridge estimators. Our work also leads to new model selection criteria. The methods are illustrated with real data applications.

662 citations


Journal ArticleDOI
TL;DR: It is shown that Bayesian posterior probabilities are significantly higher than corresponding nonparametric bootstrap frequencies for true clades, but also that erroneous conclusions will be made more often.
Abstract: Many empirical studies have revealed considerable differences between nonparametric bootstrapping and Bayesian posterior probabilities in terms of the support values for branches, despite claimed predictions about their approximate equivalence. We investigated this problem by simulating data, which were then analyzed by maximum likelihood bootstrapping and Bayesian phylogenetic analysis using identical models and reoptimization of parameter values. We show that Bayesian posterior probabilities are significantly higher than corresponding nonparametric bootstrap frequencies for true clades, but also that erroneous conclusions will be made more often. These errors are strongly accentuated when the models used for analyses are underparameterized. When data are analyzed under the correct model, nonparametric bootstrapping is conservative. Bayesian posterior probabilities are also conservative in this respect, but less so.

620 citations


Journal ArticleDOI
TL;DR: The findings demonstrate how the network inference performance varies with the training set size, the degree of inadequacy of prior assumptions, the experimental sampling strategy and the inclusion of further, sequence-based information.
Abstract: Motivation: Bayesian networks have been applied to infer genetic regulatory interactions from microarray gene expression data. This inference problem is particularly hard in that interactions between hundreds of genes have to be learned from very small data sets, typically containing only a few dozen time points during a cell cycle. Most previous studies have assessed the inference results on real gene expression data by comparing predicted genetic regulatory interactions with those known from the biological literature. This approach is controversial due to the absence of known gold standards, which renders the estimation of the sensitivity and specificity, that is, the true and (complementary) false detection rate, unreliable and difficult. The objective of the present study is to test the viability of the Bayesian network paradigm in a realistic simulation study. First, gene expression data are simulated from a realistic biological network involving DNAs, mRNAs, inactive protein monomers and active protein dimers. Then, interaction networks are inferred from these data in a reverse engineering approach, using Bayesian networks and Bayesian learning with Markov chain Monte Carlo. Results: The simulation results are presented as receiver operator characteristics curves. This allows estimating the proportion of spurious gene interactions incurred for a specified target proportion of recovered true interactions. The findings demonstrate how the network inference performance varies with the training set size, the degree of inadequacy of prior assumptions, the experimental sampling strategy and the inclusion of further, sequence-based information. Availability: The programs and data used in the present study are available from http://www.bioss.sari.ac.uk/~dirk/ Supplements

564 citations


Journal ArticleDOI
TL;DR: The nonparametric bootstrap resampling procedure is applied to the Bayesian approach and shows that the relation between posterior probabilities and bootstrapped maximum likelihood percentages is highly variable but that very strong correlations always exist when Bayesian node support is estimated onbootstrapped character matrices.
Abstract: Owing to the exponential growth of genome databases, phylogenetic trees are now widely used to test a variety of evolutionary hypotheses. Nevertheless, computation time burden limits the application of methods such as maximum likelihood nonparametric bootstrap to assess reliability of evolutionary trees. As an alternative, the much faster Bayesian inference of phylogeny, which expresses branch support as posterior probabilities, has been introduced. However, marked discrepancies exist between nonparametric bootstrap proportions and Bayesian posterior probabilities, leading to difficulties in the interpretation of sometimes strongly conflicting results. As an attempt to reconcile these two indices of node reliability, we apply the nonparametric bootstrap resampling procedure to the Bayesian approach. The correlation between posterior probabilities, bootstrap maximum likelihood percentages, and bootstrapped posterior probabilities was studied for eight highly diverse empirical data sets and were also investigated using experimental simulation. Our results show that the relation between posterior probabilities and bootstrapped maximum likelihood percentages is highly variable but that very strong correlations always exist when Bayesian node support is estimated on bootstrapped character matrices. Moreover, simulations corroborate empirical observations in suggesting that, being more conservative, the bootstrap approach might be less prone to strongly supporting a false phylogenetic hypothesis. Thus, apparent conflicts in topology recovered by the Bayesian approach were reduced after bootstrapping. Both posterior probabilities and bootstrap supports are of great interest to phylogeny as potential upper and lower bounds of node reliability, but they are surely not interchangeable and cannot be directly compared.

501 citations


Journal ArticleDOI
TL;DR: This work introduces spatial autoregression parameters for multivariate conditional autoregressive models and proposes to employ these models as specifications for second-stage spatial effects in hierarchical models.
Abstract: In the past decade conditional autoregressive modelling specifications have found considerable application for the analysis of spatial data. Nearly all of this work is done in the univariate case and employs an improper specification. Our contribution here is to move to multivariate conditional autoregressive models and to provide rich, flexible classes which yield proper distributions. Our approach is to introduce spatial autoregression parameters. We first clarify what classes can be developed from the family of Mardia (1988) and contrast with recent work of Kim et al. (2000). We then present a novel parametric linear transformation which provides an extension with attractive interpretation. We propose to employ these models as specifications for second-stage spatial effects in hierarchical models. Two applications are discussed; one for the two-dimensional case modelling spatial patterns of child growth, the other for a four-dimensional situation modelling spatial variation in HLA-B allele frequencies. In each case, full Bayesian inference is carried out using Markov chain Monte Carlo simulation.

447 citations


Journal ArticleDOI
TL;DR: This paper reviews both approaches to neural computation, with a particular emphasis on the latter, which the authors see as a very promising framework for future modeling and experimental work.
Abstract: In the vertebrate nervous system, sensory stimuli are typically encoded through the concerted activity of large populations of neurons. Classically, these patterns of activity have been treated as encoding the value of the stimulus (e.g., the orientation of a contour), and computation has been formalized in terms of function approximation. More recently, there have been several suggestions that neural computation is akin to a Bayesian inference process, with population activity patterns representing uncertainty about stimuli in the form of probability distributions (e.g., the probability density function over the orientation of a contour). This paper reviews both approaches, with a particular emphasis on the latter, which we see as a very promising framework for future modeling and experimental work.

445 citations


Journal ArticleDOI
TL;DR: A key formal element of this much broader and less formal strategy that concerns rendering optimum hydrologic predictions by means of several competing deterministic or stochastic models and assessing their joint predictive uncertainty is described.
Abstract: Hydrologic analyses typically rely on a single conceptual-mathematical model. Yet hydrologic environments are open and complex, rendering them prone to multiple interpretations and mathematical descriptions. Adopting only one of these may lead to statistical bias and underestimation of uncertainty. A comprehensive strategy for constructing alternative conceptual-mathematical models of subsurface flow and transport, selecting the best among them, and using them jointly to render optimum predictions under uncertainty has recently been developed by Neuman and Wierenga (2003). This paper describes a key formal element of this much broader and less formal strategy that concerns rendering optimum hydrologic predictions by means of several competing deterministic or stochastic models and assessing their joint predictive uncertainty. The paper proposes a Maximum Likelihood Bayesian Model Averaging (MLBMA) method to accomplish this goal. MLBMA incorporates both site characterization and site monitoring data so as to base the outcome on an optimum combination of prior information (scientific knowledge plus data) and model predictions. A preliminary example based on real data is included in the paper.

Journal ArticleDOI
TL;DR: This work develops a novel approach to model selection, which is based on the Bayesian information criterion, but incorporates relative branch-length error as a performance measure in a decision theory (DT) framework.
Abstract: Phylogenetic estimation has largely come to rely on explicitly model-based methods. This approach requires that a model be chosen and that that choice be justified. To date, justification has largely been accomplished through use of likelihood-ratio tests (LRTs) to assess the relative fit of a nested series of reversible models. While this approach certainly represents an important advance over arbitrary model selection, the best fit of a series of models may not always provide the most reliable phylogenetic estimates for finite real data sets, where all available models are surely incorrect. Here, we develop a novel approach to model selection, which is based on the Bayesian information criterion, but incorporates relative branch-length error as a performance measure in a decision theory (DT) framework. This DT method includes a penalty for overfitting, is applicable prior to running extensive analyses, and simultaneously compares all models being considered and thus does not rely on a series of pairwise comparisons of models to traverse model space. We evaluate this method by examining four real data sets and by using those data sets to define simulation conditions. In the real data sets, the DT method selects the same or simpler models than conventional LRTs. In order to lend generality to the simulations, codon-based models (with parameters estimated from the real data sets) were used to generate simulated data sets, which are therefore more complex than any of the models we evaluate. On average, the DT method selects models that are simpler than those chosen by conventional LRTs. Nevertheless, these simpler models provide estimates of branch lengths that are more accurate both in terms of relative error and absolute error than those derived using the more complex (yet still wrong) models chosen by conventional LRTs. This method is available in a program called DT-ModSel. (Bayesian model selection; decision theory; incorrect models; likelihood ratio test; maximum likelihood; nucleotide-substitution model; phylogeny.)

Journal ArticleDOI
TL;DR: A hierarchical Bayesian model for gene (variable) selection is proposed and applied to cancer classification via cDNA microarrays where the genes BRCA1 and BRCa2 are associated with a hereditary disposition to breast cancer, and the method is used to identify a set of significant genes.
Abstract: Selection of significant genes via expression patterns is an important problem in microarray experiments. Owing to small sample size and the large number of variables (genes), the selection process can be unstable. This paper proposes a hierarchical Bayesian model for gene (variable) selection. We employ latent variables to specialize the model to a regression setting and uses a Bayesian mixture prior to perform the variable selection. We control the size of the model by assigning a prior distribution over the dimension (number of significant genes) of the model. The posterior distributions of the parameters are not in explicit form and we need to use a combination of truncated sampling and Markov Chain Monte Carlo (MCMC) based computation techniques to simulate the parameters from the posteriors. The Bayesian model is flexible enough to identify significant genes as well as to perform future predictions. The method is applied to cancer classification via cDNA microarrays where the genes BRCA1 and BRCA2 are associated with a hereditary disposition to breast cancer, and the method is used to identify a set of significant genes. The method is also applied successfully to the leukemia data.

Journal ArticleDOI
TL;DR: In this paper, a Bayesian model is proposed to address the anisot- ropy problem, where the correlation function of the spatial process is defined by reference to a latent space, denoted by D, where stationarity and isotropy hold.
Abstract: Summary. In geostatistics it is common practice to assume that the underlying spatial process is stationary and isotropic, i.e. the spatial distribution is unchanged when the origin of the index set is translated and under rotation about the origin. However, in environmental problems, such assumptions are not realistic since local influences in the correlation structure of the spatial process may be found in the data. The paper proposes a Bayesian model to address the anisot- ropy problem. Following Sampson and Guttorp, we define the correlation function of the spatial process by reference to a latent space, denoted by D, where stationarity and isotropy hold. The space where the gauged monitoring sites lie is denoted by G. We adopt a Bayesian approach in which the mapping between G and D is represented by an unknown function d(·). A Gaussian process prior distribution is defined for d(·). Unlike the Sampson–Guttorp approach, the mapping of both gauged and ungauged sites is handled in a single framework, and predictive inferences take explicit account of uncertainty in the mapping. Markov chain Monte Carlo methods are used to obtain samples from the posterior distributions. Two examples are discussed: a simulated data set and the solar radiation data set that also was analysed by Sampson and Guttorp.

Journal ArticleDOI
TL;DR: By applying the PAC-Bayesian theorem of McAllester (1999a), this paper proves distribution-free generalisation error bounds for a wide range of approximate Bayesian GP classification techniques, giving a strong learning-theoretical justification for the use of these techniques.
Abstract: Approximate Bayesian Gaussian process (GP) classification techniques are powerful non-parametric learning methods, similar in appearance and performance to support vector machines. Based on simple probabilistic models, they render interpretable results and can be embedded in Bayesian frameworks for model selection, feature selection, etc. In this paper, by applying the PAC-Bayesian theorem of McAllester (1999a), we prove distribution-free generalisation error bounds for a wide range of approximate Bayesian GP classification techniques. We also provide a new and much simplified proof for this powerful theorem, making use of the concept of convex duality which is a backbone of many machine learning techniques. We instantiate and test our bounds for two particular GPC techniques, including a recent sparse method which circumvents the unfavourable scaling of standard GP algorithms. As is shown in experiments on a real-world task, the bounds can be very tight for moderate training sample sizes. To the best of our knowledge, these results provide the tightest known distribution-free error bounds for approximate Bayesian GPC methods, giving a strong learning-theoretical justification for the use of these techniques.

Book
03 Jul 2003
TL;DR: This paper discusses Bayesian inference methods for stochastic epidemic models, Markov chain Monte Carlo techniques, and some further aspects of spatio-temporal modelling.
Abstract: Introduction 1. Some modern applications of graphical models Analysing social science data with graphical Markov models Analysis of DNA mixtures using Bayesian networks 2. Causal inference using influence diagrams: the problem of partial compliance Commentary: causality and statistics Semantics of causal DAG models and the identification of direct and indirect effects 3. Causal inference via ancestral graph models Other approaches to description of conditional independence structures On ancestral graph Markov models 4. Causality and graphical models in times series analysis Graphical models for stochastic processes Discussion of "Causality and graphical models in times series analysis" 5. Linking theory and practice of MCMC Advances in MCMC: a discussion On some current research in MCMC 6. Trans-dimensional Markov chain Monte Carlo Proposal densities and product space methods Trans-dimensional Bayesian nonparametrics with spatial point processes 7. Particle filtering methods for dynamic and static Bayesian problems Some further topics on Monte Carlo methods for dynamic Bayesian problems General principles in sequential Monte Carlo methods 8. Spatial models in epidemiological applications Some remarks on Gaussian Markov random field models A compariosn of spatial point process models in epidemiological applications 9. Spatial hierarchical Bayesian modeld in ecological applications Likelihood analysis of binary data in space and time Some further aspects of spatio-temporal modelling 10. Advances in Bayesian image analysis Probabilistic image modelling Prospects in Bayesian image analysis 11. Preventing epidemics in heterogeneous environments MCMC methods for stochastic epidemic models Towards Bayesian inference in epidemic models 12. Genetic linkage analysis using Markov chain Monte Carlo techniques Graphical models for mapping continuous traits Statistical approaches to Genetic Mapping 13. The genealogy of neutral mutation Linked versus unlinked DNA data - a comparison based on ancestral inference The age of a rare mutation 14. HSSS model criticism What 'base' distribution for model criticism? Some comments on model criticism 15. Topics in nonparametric Bayesian statistics Asymptotics of Nonparametirc Posteriors A predictive point of view on Bayesian nonparametrics

Journal ArticleDOI
TL;DR: The results corroborate the findings of others that posterior probability values are excessively high and suggest that extrapolations from single topology branch-length studies are unlikely to provide any general conclusions regarding the relationship between bootstrap and posterior probabilities.
Abstract: Assessment of the reliability of a given phylogenetic hypothesis is an important step in phylogenetic analysis. Historically, the nonparametric bootstrap procedure has been the most frequently used method for assessing the support for specific phylogenetic relationships. The recent employment of Bayesian methods for phylogenetic inference problems has resulted in clade support being expressed in terms of posterior probabilities. We used simulated data and the four-taxon case to explore the relationship between nonparametric bootstrap values (as inferred by maximum likelihood) and posterior probabilities (as inferred by Bayesian analysis). The results suggest a complex association between the two measures. Three general regions of tree space can be identified: (1) the neutral zone, where differences between mean bootstrap and mean posterior probability values are not significant, (2) near the two-branch corner, and (3) deep in the two-branch corner. In the last two regions, significant differences occur between mean bootstrap and mean posterior probability values. Whether bootstrap or posterior probability values are higher depends on the data in support of alternative topologies. Examination of star topologies revealed that both bootstrap and posterior probability values differ significantly from theoretical expectations; in particular, there are more posterior probability values in the range 0.85-1 than expected by theory. Therefore, our results corroborate the findings of others that posterior probability values are excessively high. Our results also suggest that extrapolations from single topology branch-length studies are unlikely to provide any general conclusions regarding the relationship between bootstrap and posterior probability values. (Bayesian analysis; Markov chain Monte Carlo sampling; maximum likelihood; phylogenetics.)

Journal ArticleDOI
TL;DR: A subjective Bayesian model employing subjective probability estimates for predicting success and failure of health care improvement projects was effective in predicting the outcome of actual improvement projects.
Abstract: Objective To test the effectiveness of a Bayesian model employing subjective probability estimates for predicting success and failure of health care improvement projects.

Dissertation
01 Jan 2003
TL;DR: This thesis devises a novel methodology based on probability theory, suitable for the construction of term-weighting models of Information Retrieval, and shows that even language modelling approach can be exploited to assign term-frequency normalization to the models of divergence from randomness.
Abstract: This thesis devises a novel methodology based on probability theory, suitable for the construction of term-weighting models of Information Retrieval. Our term-weighting functions are created within a general framework made up of three components. Each of the three components is built independently from the others. We obtain the term-weighting functions from the general model in a purely theoretic way instantiating each component with different probability distribution forms. The thesis begins with investigating the nature of the statistical inference involved in Information Retrieval. We explore the estimation problem underlying the process of sampling. De Finetti’s theorem is used to show how to convert the frequentist approach into Bayesian inference and we display and employ the derived estimation techniques in the context of Information Retrieval. We initially pay a great attention to the construction of the basic sample spaces of Information Retrieval. The notion of single or multiple sampling from different populations in the context of Information Retrieval is extensively discussed and used through-out the thesis. The language modelling approach and the standard probabilistic model are studied under the same foundational view and are experimentally compared to the divergence-from-randomness approach. In revisiting the main information retrieval models in the literature, we show that even language modelling approach can be exploited to assign term-frequency normalization to the models of divergence from randomness. We finally introduce a novel framework for the query expansion. This framework is based on the models of divergence-from-randomness and it can be applied to arbitrary models of IR, divergence-based, language modelling and probabilistic models included. We have done a very large number of experiment and results show that the framework generates highly effective Information Retrieval models.

Book ChapterDOI
Michael E. Tipping1
TL;DR: This article gives a basic introduction to the principles of Bayesian inference in a machine learning context, with an emphasis on the importance of marginalisation for dealing with uncertainty.
Abstract: This article gives a basic introduction to the principles of Bayesian inference in a machine learning context, with an emphasis on the importance of marginalisation for dealing with uncertainty. We begin by illustrating concepts via a simple regression task before relating ideas to practical, contemporary, techniques with a description of ‘sparse Bayesian’ models and the ‘relevance vector machine’.

Journal ArticleDOI
TL;DR: In this paper, a Bayesian framework for exploratory data analysis based on posterior predictive checks is presented, which can be used to create reference distributions for EDA graphs, and how this approach resolves some theoretical problems in Bayesian data analysis.
Abstract: Summary Exploratory data analysis (EDA) and Bayesian inference (or, more generally, complex statistical modeling)—which are generally considered as unrelated statistical paradigms—can be particularly effective in combination. In this paper, we present a Bayesian framework for EDA based on posterior predictive checks. We explain how posterior predictive simulations can be used to create reference distributions for EDA graphs, and how this approach resolves some theoretical problems in Bayesian data analysis. We show how the generalization of Bayesian inference to include replicated data yrep and replicated parameters θrep follows a long tradition of generalizations in Bayesian theory. On the theoretical level, we present a predictive Bayesian formulation of goodness-of-fit testing, distinguishing between p-values (posterior probabilities that specified antisymmetric discrepancy measures will exceed 0) and u-values (data summaries with uniform sampling distributions). We explain that p-values, unlike u-values, are Bayesian probability statements in that they condition on observed data. Having reviewed the general theoretical framework, we discuss the implications for statistical graphics and exploratory data analysis, with the goal being to unify exploratory data analysis with more formal statistical methods based on probability models. We interpret various graphical displays as posterior predictive checks and discuss how Bayesian inference can be used to determine reference distributions. The goal of this work is not to downgrade descriptive statistics, or to suggest they be replaced by Bayesian modeling, but rather to suggest how exploratory data analysis fits into the probability-modeling paradigm. We conclude with a discussion of the implications for practical Bayesian inference. In particular, we anticipate that Bayesian software can be generalized to draw simulations of replicated data and parameters from their posterior predictive distribution, and these can in turn be used to calibrate EDA graphs.

Journal ArticleDOI
TL;DR: In this paper, the authors show that Bayesian model averaging (BMA) is the best approach to account for model uncertainty in statistical inference and that BMA predictive distributions have optimal performance in the log score sense.
Abstract: In their article “Frequentist Model Average Estimators,” Hjort and Claeskens—hereafter HC—make the point that statistical inference conditionalon a model selected among several on the basis of data will tend to underestimate variability. We strongly agree. They argue that the way to overcome this is by model averaging,and again we agree. There is much support for these arguments: These points have been made by many authors in a long line of literature going back at least to Leamer (1977). HC point out that Bayesian model averaging (BMA) dominates the literature on accounting for model uncertainty in statistical inference. Their search for a frequentist alternative is largely motivated by the feeling that the performance of BMA in repeated datasets or experiments has been inadequately studied. Or, as they put it, “even though BMA ‘works,’: : : , rather little appears to be known about the actual performance or behavior of the consequent inferences such as estimator precision.” This is a somewhat surprising statement, as the performance of Bayesian model selection and BMA has, in fact, been extensively studied. There are three main strands of results: general theoretical results going back to Jeffreys (1939), simulation studies, and results on out-of-sample predictive performance. HC do not refer to any of this literature. The theoretical results are well known but somewhat scattered in the literature. In brief, when used for model selection, the Bayes factor minimizes the total error rate (sum of Type I and Type II error probabilities); BMA point estimators and predictions minimize mean squared error (MSE); BMA estimation and prediction intervals are calibrated; and BMA predictive distributions have optimal performance in the log score sense. We bring these results together in our Section 2. These results for BMA are quite general and do not rely on the assumption that all uncertain parameters are small [essentially HC’s local misspeciŽ cation assumption, required by frequentist model averaging (FMA)]. They also do not require the standard regularity conditions assumed by HC in deriving FMA, which are violated in many models of practical interest, such as changepointmodels, or models involving unknown population size. There are also several realistic simulation studies of the performance of BMA relative to other methods in a variety of situations, includinglinear regression (George and McCulloch1993; Raftery, Madigan, and Hoeting 1997), log-linearmodels (Clyde

Journal ArticleDOI
TL;DR: A family of hierarchical Bayesian models is developed which allows for the simultaneous inference of informant accuracy and social structure in the presence of measurement error and missing data.

Journal ArticleDOI
TL;DR: This technical note describes the construction of posterior probability maps that enable conditional or Bayesian inferences about regionally specific effects in neuroimaging and compares Bayesian and classical inference through the equivalent PPMs and SPMs testing for the same effect in the same data.

Journal ArticleDOI
TL;DR: The Variational Bayesian (VB) framework is made use which approximates the true posterior density with a factorised density and provides a natural extension to previous Bayesian analyses which have used Empirical Bayes.

Journal ArticleDOI
TL;DR: In this paper, Bayesian inference for switching regression models and their generalizations can be achieved by the specification of loss functions which overcome the label switching problem common to all mixture models.
Abstract: This article shows how Bayesian inference for switching regression models and their generalizations can be achieved by the specification of loss functions which overcome the label switching problem common to all mixture models. We also derive an extension to models where the number of components in the mixture is unknown, based on the birthand-death technique developed in recent literature. The methods are illustrated on various real datasets.

Book
16 Jun 2003
TL;DR: A critical review and outline of the Bayesian alternative: uncertainty in physics and the usual methods of handling it a probabilistic theory of measurement uncertainty.
Abstract: Critical review and outline of the Bayesian alternative: uncertainty in physics and the usual methods of handling it a probabilistic theory of measurement uncertainty. A Bayesian primer: subjective probability and Bayes' theorem probability distributions (a concise reminder) Bayesian inference of continuous quantities Gaussian likelihood counting experiments bypassing Bayes' theorem for routine applications Bayesian unfolding. Further comments, examples and applications: miscellanea on general issues in probability and inference combination of experimental results - a closer look asymmetric uncertainties and nonlinear propagation which priors for frontier physics? Concluding matter: conclusions and bibliography.

Book
01 Jan 2003
TL;DR: The Importance of Risk and Uncertainty Assessments and How to Use Risk Analysis to Support Decision-Making are discussed.
Abstract: Preface. 1 Introduction. 1.1 The Importance of Risk and Uncertainty Assessments. 1.2 The Need to Develop a Proper Risk Analysis Framework. Bibliographic Notes. 2 Common Thinking about Risk and Risk Analysis. 2.1 Accident Risk. 2.1.1 Accident Statistics. 2.1.2 Risk Analysis. 2.1.3 Reliability Analysis. 2.2 Economic Risk. 2.2.1 General Definitions of Economic Risk in Business and Project Management. 2.2.2 A Cost Risk Analysis. 2.2.3 Finance and Portfolio Theory. 2.2.4 Treatment of Risk in Project Discounted Cash Flow Analysis. 2.3 Discussion and Conclusions. 2.3.1 The Classical Approach. 2.3.2 The Bayesian Paradigm. 2.3.3 Economic Risk and Rational Decision-Making. 2.3.4 Other Perspectives and Applications. 2.3.5 Conclusions. Bibliographic Notes. 3 How to Think about Risk and Risk Analysis. 3.1 Basic Ideas and Principles. 3.1.1 Background Information. 3.1.2 Models and Simplifications in Probability Considerations. 3.1.3 Observable Quantities. 3.2 Economic Risk. 3.2.1 A Simple Cost Risk Example. 3.2.2 Production Risk. 3.2.3 Business and Project Management. 3.2.4 Investing Money in a Stock Market. 3.2.5 Discounted Cash Flow Analysis. 3.3 Accident Risk. Bibliographic Notes. 4 How to Assess Uncertainties and Specify Probabilities. 4.1 What Is a Good Probability Assignment? 4.1.1 Criteria for Evaluating Probabilities. 4.1.2 Heuristics and Biases. 4.1.3 Evaluation of the Assessors. 4.1.4 Standardization and Consensus. 4.2 Modelling. 4.2.1 Examples of Models. 4.2.2 Discussion. 4.3 Assessing Uncertainty of Y. 4.3.1 Assignments Based on Classical Statistical Methods. 4.3.2 Analyst Judgements Using All Sources of Information. 4.3.3 Formal Expert Elicitation. 4.3.4 Bayesian Analysis. 4.4 Uncertainty Assessments of a Vector X. 4.4.1 Cost Risk. 4.4.2 Production Risk. 4.4.3 Reliability Analysis. 4.5 Discussion and Conclusions. Bibliographic Notes. 5 How to Use Risk Analysis to Support Decision-Making. 5.1 What Is a Good Decision? 5.1.1 Features of a Decision-Making Model. 5.1.2 Decision-Support Tools. 5.1.3 Discussion. 5.2 Some Examples. 5.2.1 Accident Risk. 5.2.2 Scrap in Place or Complete Removal of Plant. 5.2.3 Production System. 5.2.4 Reliability Target. 5.2.5 Health Risk. 5.2.6 Warranties. 5.2.7 Offshore Development Project. 5.2.8 Risk Assessment: National Sector. 5.2.9 Multi-Attribute Utility Example. 5.3 Risk Problem Classification Schemes. 5.3.1 A Scheme Based on Potential Consequences and Uncertainties. 5.3.2 A Scheme Based on Closeness to Hazard and Level of Authority. Bibliographic Notes. 6 Summary and Conclusions. Appendix A: Basic Theory of Probability and Statistics. A.1 Probability Theory. A.1.1 Types of Probabilities. A.1.2 Probability Rules. A.1.3 Random Quantities (Random Variables). A.1.4 Some Common Discrete Probability Distributions (Models). A.1.5 Some Common Continuous Distributions (Models). A.1.6 Some Remarks on Probability Models and Their Parameters. A.1.7 Random Processes. A.2 Classical Statistical Inference. A.2.1 Non-Parametric Estimation. A.2.2 Estimation of Distribution Parameters. A.2.3 Testing Hypotheses. A.2.4 Regression. A.3 Bayesian Inference. A.3.1 Statistical (Bayesian) Decision Analysis. Bibliographic Notes. Appendix B: Terminology. Bibliography. Index.

Proceedings Article
Gerald Tesauro1
09 Dec 2003
TL;DR: This paper proposes a fundamentally different approach to Q-Learning, dubbed Hyper-Q, in which values of mixed strategies rather than base actions are learned, and in which other agents' strategies are estimated from observed actions via Bayesian inference.
Abstract: Recent multi-agent extensions of Q-Learning require knowledge of other agents' payoffs and Q-functions, and assume game-theoretic play at all times by all other agents. This paper proposes a fundamentally different approach, dubbed "Hyper-Q" Learning, in which values of mixed strategies rather than base actions are learned, and in which other agents' strategies are estimated from observed actions via Bayesian inference. Hyper-Q may be effective against many different types of adaptive agents, even if they are persistently dynamic. Against certain broad categories of adaptation, it is argued that Hyper-Q may converge to exact optimal time-varying policies. In tests using Rock-Paper-Scissors, Hyper-Q learns to significantly exploit an Infinitesimal Gradient Ascent (IGA) player, as well as a Policy Hill Climber (PHC) player. Preliminary analysis of Hyper-Q against itself is also presented.

Journal ArticleDOI
TL;DR: In this paper, empirical likelihood tests have many of the same asymptotic properties as those derived from parametric likelihoods, which leads naturally to the possibility of using empirical likelihood as the basis for Bayesian inference.
Abstract: SUMMARY Research has shown that empirical likelihood tests have many of the same asymptotic properties as those derived from parametric likelihoods. This leads naturally to the possibility of using empirical likelihood as the basis for Bayesian inference. Different ways in which this goal might be accomplished are considered. The validity of the resultant posterior inferences is examined, as are frequentist properties of the Bayesian empirical likelihood intervals.