scispace - formally typeset
Search or ask a question

Showing papers on "Bayesian inference published in 2005"


Journal ArticleDOI
TL;DR: The assignment mechanism as discussed by the authors is a probabilistic model for the treatment each unit receives as a function of covariates and potential outcomes, and it is defined as comparisons of potential outcomes under different treatments on a common set of units.
Abstract: Causal effects are defined as comparisons of potential outcomes under different treatments on a common set of units. Observed values of the potential outcomes are revealed by the assignment mechanism—a probabilistic model for the treatment each unit receives as a function of covariates and potential outcomes. Fisher made tremendous contributions to causal inference through his work on the design of randomized experiments, but the potential outcomes perspective applies to other complex experiments and nonrandomized studies as well. As noted by Kempthorne in his 1976 discussion of Savage's Fisher lecture, Fisher never bridged his work on experimental design and his work on parametric modeling, a bridge that appears nearly automatic with an appropriate view of the potential outcomes framework, where the potential outcomes and covariates are given a Bayesian distribution to complete the model specification. Also, this framework crisply separates scientific inference for causal effects and decisions based on s...

1,546 citations


Journal ArticleDOI
TL;DR: This paper presents the latest release of the program RAxML-III for rapid maximum likelihood-based inference of large evolutionary trees which allows for computation of 1.000-taxon trees in less than 24 hours on a single PC processor.
Abstract: Motivation: The computation of large phylogenetic trees with statistical models such as maximum likelihood or bayesian inference is computationally extremely intensive. It has repeatedly been demonstrated that these models are able to recover the true tree or a tree which is topologically closer to the true tree more frequently than less elaborate methods such as parsimony or neighbor joining. Due to the combinatorial and computational complexity the size of trees which can be computed on a Biologist's PC workstation within reasonable time is limited to trees containing approximately 100 taxa. Results: In this paper we present the latest release of our program RAxML-III for rapid maximum likelihood-based inference of large evolutionary trees which allows for computation of 1.000-taxon trees in less than 24 hours on a single PC processor. We compare RAxML-III to the currently fastest implementations for maximum likelihood and bayesian inference: PHYML and MrBayes. Whereas RAxML-III performs worse than PHYML and MrBayes on synthetic data it clearly outperforms both programs on all real data alignments used in terms of speed and final likelihood values. Availability Supplementary information: RAxML-III including all alignments and final trees mentioned in this paper is freely available as open source code at http://wwwbode.cs.tum/~stamatak Contact: stamatak@cs.tum.edu

1,423 citations


Journal ArticleDOI
TL;DR: A framework for understanding how zero-inflated data sets originate and deciding how best to model them is proposed and the different kinds of zeros that occur in ecological data are defined and classified.
Abstract: A common feature of ecological data sets is their tendency to contain many zero values. Statistical inference based on such data are likely to be inefficient or wrong unless careful thought is given to how these zeros arose and how best to model them. In this paper, we propose a framework for understanding how zero-inflated data sets originate and deciding how best to model them. We define and classify the different kinds of zeros that occur in ecological data and describe how they arise: either from 'true zero' or 'false zero' observations. After reviewing recent developments in modelling zero-inflated data sets, we use practical examples to demonstrate how failing to account for the source of zero inflation can reduce our ability to detect relationships in ecological data and at worst lead to incorrect inference. The adoption of methods that explicitly model the sources of zero observations will sharpen insights and improve the robustness of ecological analyses.

870 citations


Book
23 May 2005
TL;DR: The how-to of Bayesian inference is explained and a model fitting guide is given for linear model fitting and nonlinear model fitting of the Markov Chain Monte Carlo model.
Abstract: Preface Acknowledgements 1. Role of probability theory in science 2. Probability theory as extended logic 3. The how-to of Bayesian inference 4. Assigning probabilities 5. Frequentist statistical inference 6. What is a statistic? 7. Frequentist hypothesis testing 8. Maximum entropy probabilities 9. Bayesian inference (Gaussian errors) 10. Linear model fitting (Gaussian errors) 11. Nonlinear model fitting 12. Markov Chain Monte Carlo 13. Bayesian spectral analysis 14. Bayesian inference (Poisson sampling) Appendix A. Singular value decomposition Appendix B. Discrete Fourier transforms Appendix C. Difference in two samples Appendix D. Poisson ON/OFF details Appendix E. Multivariate Gaussian from maximum entropy References Index.

806 citations


Journal ArticleDOI
TL;DR: Variational Message Passing is introduced, a general purpose algorithm for applying variational inference to Bayesian Networks and can be applied to very general class of conjugate-exponential models because it uses a factorised variational approximation.
Abstract: Bayesian inference is now widely established as one of the principal foundations for machine learning. In practice, exact inference is rarely possible, and so a variety of approximation techniques have been developed, one of the most widely used being a deterministic framework called variational inference. In this paper we introduce Variational Message Passing (VMP), a general purpose algorithm for applying variational inference to Bayesian Networks. Like belief propagation, VMP proceeds by sending messages between nodes in the network and updating posterior beliefs using local operations at each node. Each such update increases a lower bound on the log evidence (unless already at a local maximum). In contrast to belief propagation, VMP can be applied to a very general class of conjugate-exponential models because it uses a factorised variational approximation. Furthermore, by introducing additional variational parameters, VMP can be applied to models containing non-conjugate distributions. The VMP framework also allows the lower bound to be evaluated, and this can be used both for model comparison and for detection of convergence. Variational message passing has been implemented in the form of a general purpose inference engine called VIBES ('Variational Inference for BayEsian networkS') which allows models to be specified graphically and then solved variationally without recourse to coding.

741 citations


Journal ArticleDOI
TL;DR: An object detection scheme that has three innovations over existing approaches that is based on a model of the background as a single probability density, and the posterior function is maximized efficiently by finding the minimum cut of a capacitated graph.
Abstract: Accurate detection of moving objects is an important precursor to stable tracking or recognition. In this paper, we present an object detection scheme that has three innovations over existing approaches. First, the model of the intensities of image pixels as independent random variables is challenged and it is asserted that useful correlation exists in intensities of spatially proximal pixels. This correlation is exploited to sustain high levels of detection accuracy in the presence of dynamic backgrounds. By using a nonparametric density estimation method over a joint domain-range representation of image pixels, multimodal spatial uncertainties and complex dependencies between the domain (location) and range (color) are directly modeled. We propose a model of the background as a single probability density. Second, temporal persistence is proposed as a detection criterion. Unlike previous approaches to object detection which detect objects by building adaptive models of the background, the foregrounds modeled to augment the detection of objects (without explicit tracking) since objects detected in the preceding frame contain substantial evidence for detection in the current frame. Finally, the background and foreground models are used competitively in a MAP-MRF decision framework, stressing spatial context as a condition of detecting interesting objects and the posterior function is maximized efficiently by finding the minimum cut of a capacitated graph. Experimental validation of the proposed method is performed and presented on a diverse set of dynamic scenes.

685 citations


Journal ArticleDOI
TL;DR: The primary goal is to provide veterinary researchers with a concise presentation of the computational aspects involved in using the Bayesian framework for test evaluation.

490 citations


Journal ArticleDOI
TL;DR: In this paper, the authors show that for any model selection criterion to be consistent, it must behave suboptimally for estimating the regression function in terms of minimax rate of covergence; and Bayesian model averaging cannot be minimax-rate optimal for regression estimation.
Abstract: A traditional approach to statistical inference is to identify the true or best model first with little or no consideration of the specific goal of inference in the model identification stage. Can the pursuit of the true model also lead to optimal regression estimation? In model selection, it is well known that BIC is consistent in selecting the true model, and AIC is minimax-rate optimal for estimating the regression function. A recent promising direction is adaptive model selection, in which, in contrast to AIC and BIC, the penalty term is data-dependent. Some theoretical and empirical results have been obtained in support of adaptive model selection, but it is still not clear if it can really share the strengths of AIC and BIC. Model combining or averaging has attracted increasing attention as a means to overcome the model selection uncertainty. Can Bayesian model averaging be optimal for estimating the regression function in a minimax sense? We show that the answers to these questions are basically in the negative: for any model selection criterion to be consistent, it must behave suboptimally for estimating the regression function in terms of minimax rate of covergence; and Bayesian model averaging cannot be minimax-rate optimal for regression estimation.

419 citations


Journal ArticleDOI
TL;DR: In this article, applied Bayesian modeling and Causal Inference from Incomplete-Data Perspectives are presented. But they do not consider the use of complete-data data for inference.
Abstract: (2005). Applied Bayesian Modeling and Causal Inference From Incomplete-Data Perspectives. Technometrics: Vol. 47, No. 4, pp. 519-519.

390 citations


Journal ArticleDOI
TL;DR: This paper provides a simple, yet comprehensive, set of programs for the implementation of nonparametric Bayesian analysis in WinBUGS, where good mixing properties of the MCMC chains are obtained by using low-rank thin-plate splines, while simulation times per iteration are reduced employing WinBUGs specific computational tricks.
Abstract: Penalized splines can be viewed as BLUPs in a mixed model framework, which allows the use of mixed model software for smoothing. Thus, software originally developed for Bayesian analysis of mixed models can be used for penalized spline regression. Bayesian inference for nonparametric models enjoys the flexibility of nonparametric models and the exact inference provided by the Bayesian inferential machinery. This paper provides a simple, yet comprehensive, set of programs for the implementation of nonparametric Bayesian analysis in WinBUGS. Good mixing properties of the MCMC chains are obtained by using low-rank thin-plate splines, while simulation times per iteration are reduced employing WinBUGS specific computational tricks.

346 citations


Journal ArticleDOI
TL;DR: This work reviews and compares Laplace's method and Expectation Propagation for approximate Bayesian inference in the binary Gaussian process classification model, and presents a comprehensive comparison of the approximations, their predictive performance and marginal likelihood estimates to results obtained by MCMC sampling.
Abstract: Gaussian process priors can be used to define flexible, probabilistic classification models. Unfortunately exact Bayesian inference is analytically intractable and various approximation techniques have been proposed. In this work we review and compare Laplace's method and Expectation Propagation for approximate Bayesian inference in the binary Gaussian process classification model. We present a comprehensive comparison of the approximations, their predictive performance and marginal likelihood estimates to results obtained by MCMC sampling. We explain theoretically and corroborate empirically the advantages of Expectation Propagation compared to Laplace's method.

Journal ArticleDOI
08 Jul 2005-Science
TL;DR: This work used Bayesian inference to derive a probability distribution that represents the unknown structure and its precision and implemented this approach by using Markov chain Monte Carlo techniques, providing an objective figure of merit and improves structural quality.
Abstract: Macromolecular structures calculated from nuclear magnetic resonance data are not fully determined by experimental data but depend on subjective choices in data treatment and parameter settings. This makes it difficult to objectively judge the precision of the structures. We used Bayesian inference to derive a probability distribution that represents the unknown structure and its precision. This probability distribution also determines additional unknowns, such as theory parameters, that previously had to be chosen empirically. We implemented this approach by using Markov chain Monte Carlo techniques. Our method provides an objective figure of merit and improves structural quality.

Proceedings ArticleDOI
07 Aug 2005
TL;DR: It is shown that, for a wide range of benchmark datasets, naive Bayes models learned using EM have accuracy and learning time comparable to Bayesian networks with context-specific independence.
Abstract: Naive Bayes models have been widely used for clustering and classification. However, they are seldom used for general probabilistic learning and inference (i.e., for estimating and computing arbitrary joint, conditional and marginal distributions). In this paper we show that, for a wide range of benchmark datasets, naive Bayes models learned using EM have accuracy and learning time comparable to Bayesian networks with context-specific independence. Most significantly, naive Bayes inference is orders of magnitude faster than Bayesian network inference using Gibbs sampling and belief propagation. This makes naive Bayes models a very attractive alternative to Bayesian networks for general probability estimation, particularly in large or real-time domains.

Journal ArticleDOI
TL;DR: In this article, the authors present a method for generating samples from an unnormalized posterior distribution using Markov chain Monte Carlo (MCMC) in which the evaluation of f(·) is very difficult or computationally demanding.
Abstract: This article presents a method for generating samples from an unnormalized posterior distribution f(·) using Markov chain Monte Carlo (MCMC) in which the evaluation of f(·) is very difficult or computationally demanding. Commonly, a less computationally demanding, perhaps local, approximation to f(·) is available, say f**x(·). An algorithm is proposed to generate an MCMC that uses such an approximation to calculate acceptance probabilities at each step of a modified Metropolis–Hastings algorithm. Once a proposal is accepted using the approximation, f(·) is calculated with full precision ensuring convergence to the desired distribution. We give sufficient conditions for the algorithm to converge to f(·) and give both theoretical and practical justifications for its usage. Typical applications are in inverse problems using physical data models where computing time is dominated by complex model simulation. We outline Bayesian inference and computing for inverse problems. A stylized example is given of recove...

Journal ArticleDOI
TL;DR: In this paper, a Bayesian inference scheme was proposed to constrain the inversion of CO2 sources and sinks at the Earth surface by using a variational formulation rather than a pure matrix-based one in order to cope with the large amount of satellite data.
Abstract: Properly handling satellite data to constrain the inversion of CO2 sources and sinks at the Earth surface is a challenge motivated by the limitations of the current surface observation network. In this paper we present a Bayesian inference scheme to tackle this issue. It is based on the same theoretical principles as most inversions of the flask network but uses a variational formulation rather than a pure matrix-based one in order to cope with the large amount of satellite data. The minimization algorithm iteratively computes the optimum solution to the inference problem as well as an estimation of its error characteristics and some quantitative measures of the observation information content. A global climate model, guided by analyzed winds, provides information about the atmospheric transport to the inversion scheme. A surface flux climatology regularizes the inference problem. This new system has been applied to 1 year's worth of retrievals of vertically integrated CO2 concentrations from the Television Infrared Observation Satellite Operational Vertical Sounder (TOVS). Consistent with a recent study that identified regional biases in the TOVS retrievals, the inferred fluxes are not useful for biogeochemical analyses. In addition to the detrimental impact of these biases, we find a sensitivity of the results to the formulation of the prior uncertainty and to the accuracy of the transport model. Notwithstanding these difficulties, four-dimensional inversion schemes of the type presented here could form the basis of multisensor data assimilation systems for the estimation of the surface fluxes of key atmospheric compounds.

Journal ArticleDOI
TL;DR: Variational approximations are used to perform the analogous model selection task in the Bayesian context and place JunB and JunD at the centre of the mechanisms that control apoptosis and proliferation.
Abstract: Motivation: We have used state-space models (SSMs) to reverse engineer transcriptional networks from highly replicated gene expression profiling time series data obtained from a well-established model of T cell activation. SSMs are a class of dynamic Bayesian networks in which the observed measurements depend on some hidden state variables that evolve according to Markovian dynamics. These hidden variables can capture effects that cannot be directly measured in a gene expression profiling experiment, for example: genes that have not been included in the microarray, levels of regulatory proteins, the effects of mRNA and protein degradation, etc. Results: We have approached the problem of inferring the model structure of these state-space models using both classical and Bayesian methods. In our previous work, a bootstrap procedure was used to derive classical confidence intervals for parameters representing 'gene--gene' interactions over time. In this article, variational approximations are used to perform the analogous model selection task in the Bayesian context. Certain interactions are present in both the classical and the Bayesian analyses of these regulatory networks. The resulting models place JunB and JunD at the centre of the mechanisms that control apoptosis and proliferation. These mechanisms are key for clonal expansion and for controlling the long term behavior (e.g. programmed cell death) of these cells. Availability: Supplementary data is available at http://public.kgi.edu/wild/index.htm and Matlab source code for variational Bayesian learning of SSMs is available at http://www.cse.ebuffalo.edu/faculty/mbeal/software.html Contact: David_Wild@kgi.edu

Proceedings ArticleDOI
27 Dec 2005
TL;DR: This work describes a hierarchical model of invariant visual pattern recognition in the visual cortex that exhibits invariance across a wide variety of transformations and is robust in the presence of noise.
Abstract: We describe a hierarchical model of invariant visual pattern recognition in the visual cortex. In this model, the knowledge of how patterns change when objects move is learned and encapsulated in terms of high probability sequences at each level of the hierarchy. Configuration of object parts is captured by the patterns of coincident high probability sequences. This knowledge is then encoded in a highly efficient Bayesian network structure. The learning algorithm uses a temporal stability criterion to discover object concepts and movement patterns. We show that the architecture and algorithms are biologically plausible. The large scale architecture of the system matches the large scale organization of the cortex and the micro-circuits derived from the local computations match the anatomical data on cortical circuits. The system exhibits invariance across a wide variety of transformations and is robust in the presence of noise. Moreover, the model also offers alternative explanations for various known cortical phenomena.

Proceedings Article
09 Jul 2005
TL;DR: An efficient translation from Bayesian networks to weighted model counting, extend the best model-counting algorithms to weightedmodel counting, develop an efficient method for computing all marginals in a single counting pass, and evaluate the approach on computationally challenging reasoning problems.
Abstract: Over the past decade general satisfiability testing algorithms have proven to be surprisingly effective at solving a wide variety of constraint satisfaction problem, such as planning and scheduling (Kautz and Selman 2003) Solving such NP-complete tasks by "compilation to SAT" has turned out to be an approach that is of both practical and theoretical interest Recently, (Sang et al 2004) have shown that state of the art SAT algorithms can be efficiently extended to the harder task of counting the number of models (satisfying assignments) of a formula, by employing a technique called component caching This paper begins to investigate the question of whether "compilation to model-counting" could be a practical technique for solving real-world #P-complete problems, in particular Bayesian inference We describe an efficient translation from Bayesian networks to weighted model counting, extend the best model-counting algorithms to weighted model counting, develop an efficient method for computing all marginals in a single counting pass, and evaluate the approach on computationally challenging reasoning problems

Journal ArticleDOI
01 Jun 2005-Test
TL;DR: In this paper, a review of methods used for analyzing ordered categorical (ordinal) response variables is presented, with the main emphasis on maximum likelihood inference, although some models (e.g., marginal models, multi-level models) are computationally difficult.
Abstract: This article review methodologies used for analyzing ordered categorical (ordinal) response variables. We begin by surveying models for data with a single ordinal response variable. We also survey recently proposed strategies for modeling ordinal response variables when the data have some type of clustering or when repeated measurement occurs at various occasions for each subject, such as in longitudinal studies. Primary models in that case includemarginal models andcluster-specific (conditional) models for which effects apply conditionally at the cluster level. Related discussion refers to multi-level and transitional models. The main emphasis is on maximum likelihood inference, although we indicate certain models (e.g., marginal models, multi-level models) for which this can be computationally difficult. The Bayesian approach has also received considerable attention for categorical data in the past decade, and we survey recent Bayesian approaches to modeling ordinal response variables. Alternative, non-model-based, approaches are also available for certain types of inference.

Journal ArticleDOI
TL;DR: The capabilities of the free software package BayesX for estimating regression models with structured additive predictor based on MCMC inference are described, which extends the capabilities of existing software for semiparametric regression included in S-PLUS, SAS, R or Stata.
Abstract: There has been much recent interest in Bayesian inference for generalized additive and related models. The increasing popularity of Bayesian methods for these and other model classes is mainly caused by the introduction of Markov chain Monte Carlo (MCMC) simulation techniques which allow realistic modeling of complex problems. This paper describes the capabilities of the free software package BayesX for estimating regression models with structured additive predictor based on MCMC inference. The program extends the capabilities of existing software for semiparametric regression included in S-PLUS, SAS, R or Stata. Many model classes well known from the literature are special cases of the models supported by BayesX. Examples are generalized additive (mixed) models, dynamic models, varying coefficient models, geoadditive models, geographically weighted regression and models for space-time regression. BayesX supports the most common distributions for the response variable. For univariate responses these are Gaussian, Binomial, Poisson, Gamma, negative Binomial, zero inflated Poisson and zero inflated negative binomial. For multicategorical responses, both multinomial logit and probit models for unordered categories of the response as well as cumulative threshold models for ordered categories can be estimated. Moreover, BayesX allows the estimation of complex continuous time survival and hazard rate models.

Journal ArticleDOI
TL;DR: The Bayesian estimation of stochastic rate constants in the context of dynamic models of intracellular processes is concerned with the estimation of parameters in a prokaryotic autoregulatory gene network.
Abstract: Summary. This article is concerned with the Bayesian estimation of stochastic rate constants in the context of dynamic models of intracellular processes. The underlying discrete stochastic kinetic model is replaced b ya diffusion approximation (or stochastic differential equation approach) where a white noise term models stochastic behavior and the model is identified using equispaced time course data. The estimation framework involves the introduction of m − 1 latent data points between every pair of observations. MCMC methods are then used to sample the posterior distribution of the latent process and the model parameters. The methodology is applied to the estimation of parameters in a prokaryotic autoregulatory gene network.

BookDOI
14 Jul 2005
TL;DR: Applied Bayesian modeling and causal inference from incomplete-data perspectives as discussed by the authors, applied Bayesian modelling and causality from incomplete data perspectives, Applied Bayesian model and inference in incomplete data perspective.
Abstract: Applied Bayesian modeling and causal inference from incomplete-data perspectives , Applied Bayesian modeling and causal inference from incomplete-data perspectives , کتابخانه دیجیتال جندی شاپور اهواز

Proceedings ArticleDOI
20 Jun 2005
TL;DR: A driver intent inference system that is based on lane positional information, vehicle parameters, and driver head motion, which is applied and evaluated on real-world data collected in a modular intelligent vehicle test bed.
Abstract: In this paper we demonstrate a driver intent inference system (DIIS) based on lane positional information, vehicle parameters, and driver head motion We present robust computer vision methods for identifying and tracking freeway lanes and driver head motion These algorithms are then applied and evaluated on real-world data collected in a modular intelligent vehicle test-bed Analysis of the data for lane change intent is performed using a sparse Bayesian learning methodology Finally, the system as a whole is evaluated using a novel metric and real-world data of vehicle parameters, lane position, and driver head motion

Journal ArticleDOI
TL;DR: In this paper, the authors examine how one's propensity to use Bayes' rule is affected by whether this rule is aligned with reinforcement or clashes with it, and find that when these forces clash, around 50% of all decisions are inconsistent with Bayesian updating.
Abstract: We examine decision-making under risk and uncertainty in a laboratory experiment. The heart of our design examines how one’s propensity to use Bayes’ rule is affected by whether this rule is aligned with reinforcement or clashes with it. In some cases, we create environments where Bayesian updating after a successful outcome should lead a decision-maker to make a change, while no change should be made after observing an unsuccessful outcome. We observe striking patterns: When payoff reinforcement and Bayesian updating are aligned, nearly all people respond as expected. However, when these forces clash, around 50% of all decisions are inconsistent with Bayesian updating. While people tend to make costly initial choices that eliminate complexity in a subsequent decision, we find that complexity alone cannot explain our results. Finally, when a draw provides only information (and no payment), switching errors occur much less frequently, suggesting that the ‘emotional reinforcement’ (affect) induced by payments is a critical factor in deviations from Bayesian updating. There is considerable behavioral heterogeneity; we identify different types in the population and find that people who make ‘switching errors’ are more likely to have cross-period ‘reinforcement’ tendencies.

Journal ArticleDOI
TL;DR: A causal model is developed from the literature and, using a data set of 33 real-world software projects, how decision-making risks can be incorporated in the Bayesian networks are illustrated and compared with popular nonparametric neural-network and regression tree forecasting models.
Abstract: Recently, Bayesian probabilistic models have been used for predicting software development effort. One of the reasons for the interest in the use of Bayesian probabilistic models, when compared to traditional point forecast estimation models, is that Bayesian models provide tools for risk estimation and allow decision-makers to combine historical data with subjective expert estimates. In this paper, we use a Bayesian network model and illustrate how a belief updating procedure can be used to incorporate decision-making risks. We develop a causal model from the literature and, using a data set of 33 real-world software projects, we illustrate how decision-making risks can be incorporated in the Bayesian networks. We compare the predictive performance of the Bayesian model with popular nonparametric neural-network and regression tree forecasting models and show that the Bayesian model is a competitive model for forecasting software development effort.

Journal ArticleDOI
TL;DR: An easily implemented adaptive algorithm is developed that improves on the work of Gerlach et al. and promises to significantly reduce computing time in a variety of problems including mixture innovation, change-point, regime switching, and outlier detection.
Abstract: Time series subject to parameter shifts of random magnitude and timing are commonly modeled with a change-point approach using Chib's (1998) algorithm to draw the break dates. We outline some advantages of an alternative approach in which breaks come through mixture distributions in state innovations, and for which the sampler of Gerlach, Carter and Kohn (2000) allows reliable and efficient inference. We show how the same sampler can be used to (i) model shifts in variance that occur independently of shifts in other parameters (ii) draw the break dates in O(n) rather than O(n³) operations in the change-point model of Koop and Potter (2004b), the most general to date. Finally, we introduce to the time series literature the concept of adaptive Metropolis-Hastings sampling for discrete latent variable models. We develop an easily implemented adaptive algorithm that improves on Gerlach et al. (2000) and promises to significantly reduce computing time in a variety of problems including mixture innovation, change-point, regime-switching, and outlier detection. The efficiency gains on two models for U.S. inflation and real interest rates are 257% and 341%.

Journal ArticleDOI
TL;DR: The resulting posterior and predictive inference enables summaries in the form of tables and maps, which help to reveal the nature of the spatiotemporal behaviour as well as the associated uncertainty.
Abstract: There is a considerable literature in spatiotemporal modelling The approach adopted here applies to the setting where space is viewed as continuous but time is taken to be discrete We view the data as a time series of spatial processes and work in the setting of dynamic models, achieving a class of dynamic models for such data We seek rich, flexible, easy-to-specify, easy-to-interpret, computationally tractable specifications which allow very general mean structures and also non-stationary association structures Our modelling contributions are as follows In the case where univariate data are collected at the spatial locations, we propose the use of a spatiotemporally varying coefficient form In the case where multivariate data are collected at the locations, we need to capture associations among measurements at a given location and time as well as dependence across space and time We propose the use of suitable multivariate spatial process models developed through coregionalization We adopt a Bayesian inference framework The resulting posterior and predictive inference enables summaries in the form of tables and maps, which help to reveal the nature of the spatiotemporal behaviour as well as the associated uncertainty We illuminate various computational issues and then apply our models to the analysis of climate data obtained from the National Center for Atmospheric Research to analyze precipitation and temperature measurements obtained in Colorado in 1997 Copyright © 2005 John Wiley & Sons, Ltd

Book ChapterDOI
TL;DR: This presentation provides a brief overview of the Bayesian approach to the estimation of causal effects of treatments based on the concept of potential outcomes.
Abstract: A central problem in statistics is how to draw inferences about the causal effects of treatments (i.e., interventions) from randomized and nonrandomized data. For example, does the new job-training program really improve the quality of jobs for those trained, or does exposure to that chemical in drinking water increase cancer rates? This presentation provides a brief overview of the Bayesian approach to the estimation of such causal effects based on the concept of potential outcomes.

Dissertation
01 Jun 2005
TL;DR: The present work shows that mutual information, information gain, correlation, attribute importance, association and many other concepts, are all merely special cases of the above principle.
Abstract: Two attributes $A$ and $B$ are said to interact when it helps to observe the attribute values of both attributes together. This is an example of a $2$-way interaction. In general, a group of attributes ${\cal X}$ is involved in a $k$-way interaction when we cannot reconstruct their relationship merely with $\ell$-way interactions, $\ell < k$. These two definitions formalize the notion of an interaction in a nutshell. An additional notion is the one of context. We interpret context as just another attribute. There are two ways in which we can consider context. Context can be something that specifies our focus: we may examine interactions only in a given context, only for the instances that are in the context. Alternatively, context can be something that we are interested in: if we seek to predict weather, only the interactions involving the weather will be interesting to us. This is especially relevant for classification: we only want to examine the interactions involving the labelled class attribute and other attributes (unless there are missing or uncertain attribute values). But the definitions are not complete. We need to specify the model that assumes the interaction: how to we represent the pattern of co-appearance of several attributes? We also need to specify a model that does not assume the interaction: how do we reconstruct the pattern of co-appearance of several attributes without actually observing them all simultaneously? We need to specify a loss function that measures how good a particular model is, with respect to another model or with respect to the data. We need an algorithm that builds both models from the data. Finally, we need the data in order to assess whether it supports the hypothesis of interaction. The present work shows that mutual information, information gain, correlation, attribute importance, association and many other concepts, are all merely special cases of the above principle. Furthermore, the analysis of interactions generalizes the notions of analysis of variance, variable clustering, structure learning of Bayesian networks, and several other problems. There is an intriguing history of reinvention in the area of information theory on the topic of interactions. In our work, we focus on models founded on probability theory, and employ entropy and Kullback-Leibler divergence as our loss functions. Generally, whether an interaction exists or not, and to what extent, depends on what kind of models we are working with. The concept of McGill's interaction information in information theory, for example, is based upon Kullback-Leibler divergence as the loss function, and non-normalized Kirkwood superposition approximation models. Pearson's correlation coefficient is based on the proportion of explained standard deviation as the loss function, and on the multivariate Gaussian model. Most applications of mutual information are based on Kullback-Leibler divergence and the multinomial model. When there is a limited amount of data, it becomes unclear what model can be used to interpret it. Even if we fix the family of models, we remain uncertain about what would be the best choice of a model in the family. In all, uncertainty pervades the choice of the model. The underlying idea of Bayesian statistics is that the uncertainty about the model is to be handled in the same was as the uncertainty about the correct prediction in nondeterministic domains. The uncertainty, however, implies that we know neither if is an interaction with complete certainty, nor how important is the interaction. We propose a Bayesian approach to performing significance tests: an interaction is significant if it is very unlikely that a model assuming the interaction would suffer a greater loss than a model not assuming it, even if the interaction truly exists, among all the foreseeable posterior models. We also propose Bayesian confidence intervals to assess the probability distribution of the expected loss of assuming that an interaction does not exist. We compare significance tests based on permutations, bootstrapping, cross-validation, Bayesian statistics and asymptotic theory, and find that they often disagree. It is important, therefore, to understand the assumptions that underlie the tests. Interactions are a natural way of understanding the regularities in the data. We propose interaction analysis, a methodology for analyzing the data. It has a long history, but our novel contribution is a series of diagrams that illustrate the discovered interactions in data. The diagrams include information graphs, interaction graphs and dendrograms. We use interactions to identify concept drift and ignorability of missing data. We use interactions to cluster attribute values and build taxonomies automatically. When we say that there is an interaction, we still need to explain what it looks like. Generally, the interaction can be explained by inferring a higher-order construction. For that purpose, we provide visualizations for several models that allow for interactions. We also provide a probabilistic account of rule inference: a rule can be interpreted as a constructed attribute. We also describe interactions involving individual attribute values with other attributes: this can help us break complex attributes down into simpler components. We also provide an approach to handling the curse of dimensionality: we dynamically maintain a structure of attributes as individual attributes are entering our model one by one. We conclude this work by presenting two practical algorithms: an efficient heuristic for selecting attributes within the naive Bayesian classifier, and a complete approach to prediction with interaction models, the Kikuchi-Bayes model. Kikuchi-Bayes combines Bayesian model averaging, a parsimonious prior, and search for interactions that determine the model. Kikuchi-Bayes outperforms most popular machine learning methods, such as classification trees, logistic regression, the naive Bayesian classifier, and sometimes even the support vector machines. However, Kikuchi-Bayes models are highly interpretable and can be easily visualized as interaction graphs.

Journal ArticleDOI
Jeffrey N. Rouder1, Jun Lu1, Paul L. Speckman1, Dongchu Sun1, Yi Jiang1 
TL;DR: A hierarchical Bayesian model that provides a means of estimating the shape, scale, and location of RT distributions and provides a principled and efficient means of pooling information across disparate data from different individuals is presented.
Abstract: We present a statistical model for inference with response time (RT) distributions. The model has the following features. First, it provides a means of estimating the shape, scale, and location (shift) of RT distributions. Second, it is hierarchical and models between-subjects and within-subjects variability simultaneously. Third, inference with the model is Bayesian and provides a principled and efficient means of pooling information across disparate data from different individuals. Because the model efficiently pools information across individuals, it is particularly well suited for those common cases in which the researcher collects a limited number of observations from several participants. Monte Carlo simulations reveal that the hierarchical Bayesian model provides more accurate estimates than several popular competitors do. We illustrate the model by providing an analysis of the symbolic distance effect in which participants can more quickly ascertain the relationship between nonadjacent digits than that between adjacent digits.