scispace - formally typeset
Search or ask a question

Showing papers on "Bayes' theorem published in 2004"


Journal ArticleDOI
TL;DR: The hierarchical model of Lonnstedt and Speed (2002) is developed into a practical approach for general microarray experiments with arbitrary numbers of treatments and RNA samples and the moderated t-statistic is shown to follow a t-distribution with augmented degrees of freedom.
Abstract: The problem of identifying differentially expressed genes in designed microarray experiments is considered. Lonnstedt and Speed (2002) derived an expression for the posterior odds of differential expression in a replicated two-color experiment using a simple hierarchical parametric model. The purpose of this paper is to develop the hierarchical model of Lonnstedt and Speed (2002) into a practical approach for general microarray experiments with arbitrary numbers of treatments and RNA samples. The model is reset in the context of general linear models with arbitrary coefficients and contrasts of interest. The approach applies equally well to both single channel and two color microarray experiments. Consistent, closed form estimators are derived for the hyperparameters in the model. The estimators proposed have robust behavior even for small numbers of arrays and allow for incomplete data arising from spot filtering or spot quality weights. The posterior odds statistic is reformulated in terms of a moderated t-statistic in which posterior residual standard deviations are used in place of ordinary standard deviations. The empirical Bayes approach is equivalent to shrinkage of the estimated sample variances towards a pooled estimate, resulting in far more stable inference when the number of arrays is small. The use of moderated t-statistics has the advantage over the posterior odds that the number of hyperparameters which need to estimated is reduced; in particular, knowledge of the non-null prior for the fold changes are not required. The moderated t-statistic is shown to follow a t-distribution with augmented degrees of freedom. The moderated t inferential approach extends to accommodate tests of composite null hypotheses through the use of moderated F-statistics. The performance of the methods is demonstrated in a simulation study. Results are presented for two publicly available data sets.

11,864 citations


Journal ArticleDOI
TL;DR: It is argued that the most commonly implemented model selection approach, the hierarchical likelihood ratio test, is not the optimal strategy for model selection in phylogenetics, and that approaches like the Akaike Information Criterion (AIC) and Bayesian methods offer important advantages.
Abstract: Model selection is a topic of special relevance in molecular phylogenetics that affects many, if not all, stages of phylogenetic inference. Here we discuss some fundamental concepts and techniques of model selection in the context of phylogenetics. We start by reviewing different aspects of the selection of substitution models in phylogenetics from a theoretical, philosophical and practical point of view, and summarize this comparison in table format. We argue that the most commonly implemented model selection approach, the hierarchical likelihood ratio test, is not the optimal strategy for model selection in phylogenetics, and that approaches like the Akaike Information Criterion (AIC) and Bayesian methods offer important advantages. In particular, the latter two methods are able to simultaneously compare multiple nested or nonnested models, assess model selection uncertainty, and allow for the estimation of phylogenies and model parameters using all available models (model-averaged inference or multimodel inference). We also describe how the relative importance of the different parameters included in substitution models can be depicted. To illustrate some of these points, we have applied AIC-based model averaging to 37 mitochondrial DNA sequences from the subgenus Ohomopterus (genus Carabus) ground beetles described by Sota and Vogler (2001). (AIC; Bayes factors; BIC; likelihood ratio tests; model averaging; model uncertainty; model selection; multimodel inference.) It is clear that models of nucleotide substitution (henceforth models of evolution) play a significant role in molecular phylogenetics, particularly in the context of distance, maximum likelihood (ML), and Bayesian es- timation. We know that the use of one or other model affects many, if not all, stages of phylogenetic inference. For example, estimates of phylogeny, substitution rates, bootstrap values, posterior probabilities, or tests of the molecular clock are clearly influenced by the model of evolution used in the analysis (Buckley, 2002; Buckley

3,712 citations


Journal ArticleDOI
TL;DR: A major challenge for neuroscientists is to test ideas for how this might be achieved in populations of neurons experimentally, and so determine whether and how neurons code information about sensory uncertainty.

2,067 citations


Journal ArticleDOI
TL;DR: A range of Bayesian hierarchical models using the Markov chain Monte Carlo software WinBUGS are presented that allow for variation in true treatment effects across trials, and models where the between-trials variance is homogeneous across treatment comparisons are considered.
Abstract: Mixed treatment comparison (MTC) meta-analysis is a generalization of standard pairwise meta-analysis for A vs B trials, to data structures that include, for example, A vs B, B vs C, and A vs C trials. There are two roles for MTC: one is to strengthen inference concerning the relative efficacy of two treatments, by including both 'direct' and 'indirect' comparisons. The other is to facilitate simultaneous inference regarding all treatments, in order for example to select the best treatment. In this paper, we present a range of Bayesian hierarchical models using the Markov chain Monte Carlo software WinBUGS. These are multivariate random effects models that allow for variation in true treatment effects across trials. We consider models where the between-trials variance is homogeneous across treatment comparisons as well as heterogeneous variance models. We also compare models with fixed (unconstrained) baseline study effects with models with random baselines drawn from a common distribution. These models are applied to an illustrative data set and posterior parameter distributions are compared. We discuss model critique and model selection, illustrating the role of Bayesian deviance analysis, and node-based model criticism. The assumptions underlying the MTC models and their parameterization are also discussed.

1,861 citations


Journal ArticleDOI
15 Jan 2004-Nature
TL;DR: This work shows that subjects internally represent both the statistical distribution of the task and their sensory uncertainty, combining them in a manner consistent with a performance-optimizing bayesian process.
Abstract: When we learn a new motor skill, such as playing an approaching tennis ball, both our sensors and the task possess variability. Our sensors provide imperfect information about the ball's velocity, so we can only estimate it. Combining information from multiple modalities can reduce the error in this estimate. On a longer time scale, not all velocities are a priori equally probable, and over the course of a match there will be a probability distribution of velocities. According to bayesian theory, an optimal estimate results from combining information about the distribution of velocities-the prior-with evidence from sensory feedback. As uncertainty increases, when playing in fog or at dusk, the system should increasingly rely on prior knowledge. To use a bayesian strategy, the brain would need to represent the prior distribution and the level of uncertainty in the sensory feedback. Here we control the statistical variations of a new sensorimotor task and manipulate the uncertainty of the sensory feedback. We show that subjects internally represent both the statistical distribution of the task and their sensory uncertainty, combining them in a manner consistent with a performance-optimizing bayesian process. The central nervous system therefore employs probabilistic models during sensorimotor learning.

1,811 citations


Journal ArticleDOI
TL;DR: A Bayesian MCMC approach to the analysis of combined data sets was developed and its utility in inferring relationships among gall wasps based on data from morphology and four genes was explored, supporting the utility of morphological data in multigene analyses.
Abstract: The recent development of Bayesian phylogenetic inference using Markov chain Monte Carlo (MCMC) techniques has facilitated the exploration of parameter-rich evolutionary models. At the same time, stochastic models have become more realistic (and complex) and have been extended to new types of data, such as morphology. Based on this foundation, we developed a Bayesian MCMC approach to the analysis of combined data sets and explored its utility in inferring relationships among gall wasps based on data from morphology and four genes (nuclear and mitochondrial, ribosomal and protein coding). Examined models range in complexity from those recognizing only a morphological and a molecular partition to those having complex substitution models with independent parameters for each gene. Bayesian MCMC analysis deals efficiently with complex models: convergence occurs faster and more predictably for complex models, mixing is adequate for all parameters even under very complex models, and the parameter update cycle is virtually unaffected by model partitioning across sites. Morphology contributed only 5% of the characters in the data set but nevertheless influenced the combined-data tree, supporting the utility of morphological data in multigene analyses. We used Bayesian criteria (Bayes factors) to show that process heterogeneity across data partitions is a significant model component, although not as important as among-site rate variation. More complex evolutionary models are associated with more topological uncertainty and less conflict between morphology and molecules. Bayes factors sometimes favor simpler models over considerably more parameter-rich models, but the best model overall is also the most complex and Bayes factors do not support exclusion of apparently weak parameters from this model. Thus, Bayes factors appear to be useful for selecting among complex models, but it is still unclear whether their use strikes a reasonable balance between model complexity and error in parameter estimates.

1,758 citations


Proceedings Article
01 Jan 2004
TL;DR: A sufficient condition for the optimality of naive Bayes is presented and proved, in which the dependence between attributes do exist, and evidence that dependence among attributes may cancel out each other is provided.
Abstract: Naive Bayes is one of the most efficient and effective inductive learning algorithms for machine learning and data mining. Its competitive performance in classification is surprising, because the conditional independence assumption on which it is based, is rarely true in realworld applications. An open question is: what is the true reason for the surprisingly good performance of naive Bayes in classification? In this paper, we propose a novel explanation on the superb classification performance of naive Bayes. We show that, essentially, the dependence distribution; i.e., how the local dependence of a node distributes in each class, evenly or unevenly, and how the local dependencies of all nodes work together, consistently (supporting a certain classification) or inconsistently (canceling each other out), plays a crucial role. Therefore, no matter how strong the dependences among attributes are, naive Bayes can still be optimal if the dependences distribute evenly in classes, or if the dependences cancel each other out. We propose and prove a sufficient and necessary conditions for the optimality of naive Bayes. Further, we investigate the optimality of naive Bayes under the Gaussian distribution. We present and prove a sufficient condition for the optimality of naive Bayes, in which the dependence between attributes do exist. This provides evidence that dependence among attributes may cancel out each other. In addition, we explore when naive Bayes works well. Naive Bayes and Augmented Naive Bayes Classification is a fundamental issue in machine learning and data mining. In classification, the goal of a learning algorithm is to construct a classifier given a set of training examples with class labels. Typically, an example E is represented by a tuple of attribute values (x1, x2, , · · · , xn), where xi is the value of attribute Xi. Let C represent the classification variable, and let c be the value of C. In this paper, we assume that there are only two classes: + (the positive class) or − (the negative class). A classifier is a function that assigns a class label to an example. From the probability perspective, according to Bayes Copyright c © 2004, American Association for Artificial Intelligence (www.aaai.org). All rights reserved. Rule, the probability of an example E = (x1, x2, · · · , xn) being class c is p(c|E) = p(E|c)p(c) p(E) . E is classified as the class C = + if and only if fb(E) = p(C = +|E) p(C = −|E) ≥ 1, (1) where fb(E) is called a Bayesian classifier. Assume that all attributes are independent given the value of the class variable; that is, p(E|c) = p(x1, x2, · · · , xn|c) = n ∏

1,536 citations


Journal ArticleDOI
TL;DR: The results suggest that the complexity of the pattern of substitution of real sequences is better captured by the CAT model, offering the possibility of studying its impact on phylogenetic reconstruction and its connections with structure-function determinants.
Abstract: Most current models of sequence evolution assume that all sites of a protein evolve under the same substitution process, characterized by a 20 x 20 substitution matrix. Here, we propose to relax this assumption by developing a Bayesian mixture model that allows the amino-acid replacement pattern at different sites of a protein alignment to be described by distinct substitution processes. Our model, named CAT, assumes the existence of distinct processes (or classes) differing by their equilibrium frequencies over the 20 residues. Through the use of a Dirichlet process prior, the total number of classes and their respective amino-acid profiles, as well as the affiliations of each site to a given class, are all free variables of the model. In this way, the CAT model is able to adapt to the complexity actually present in the data, and it yields an estimate of the substitutional heterogeneity through the posterior mean number of classes. We show that a significant level of heterogeneity is present in the substitution patterns of proteins, and that the standard one-matrix model fails to account for this heterogeneity. By evaluating the Bayes factor, we demonstrate that the standard model is outperformed by CAT on all of the data sets which we analyzed. Altogether, these results suggest that the complexity of the pattern of substitution of real sequences is better captured by the CAT model, offering the possibility of studying its impact on phylogenetic reconstruction and its connections with structure-function determinants.

1,399 citations


Journal ArticleDOI
TL;DR: Experimental results suggest that 2- to 4-year-old children construct new causal maps and that their learning is consistent with the Bayes net formalism.
Abstract: We propose that children employ specialized cognitive systems that allow them to recover an accurate causal map of the world: an abstract, coherent, learned representation of the causal relations among events. This kind of knowledge can be perspicuously understood in terms of the formalism of directed graphical causal models, or Bayes nets. Children's causal learning and inference may involve computations similar to those for learning causal Bayes nets and for predicting with them. Experimental results suggest that 2- to 4-year-old children construct new causal maps and that their learning is consistent with the Bayes net formalism.

970 citations


Journal ArticleDOI
TL;DR: The combined use of Bayes factors and DCM allows one to evaluate competing scientific theories about the architecture of large-scale neural networks and the neuronal interactions that mediate perception and cognition.

849 citations


Journal ArticleDOI
James S. Clark1
TL;DR: Hierarchical Bayes represents a modelling structure with capacity to exploit diverse sources of information, to accommodate influences that are unknown, and to draw inference on large numbers of latent variables and parameters that describe complex relationships.
Abstract: Advances in computational statistics provide a general framework for the high-dimensional models typically needed for ecological inference and prediction. Hierarchical Bayes (HB) represents a modelling structure with capacity to exploit diverse sources of information, to accommodate influences that are unknown (or unknowable), and to draw inference on large numbers of latent variables and parameters that describe complex relationships. Here I summarize the structure of HB and provide examples for common spatiotemporal problems. The flexible framework means that parameters, variables and latent variables can represent broader classes of model elements than are treated in traditional models. Inference and prediction depend on two types of stochasticity, including (1) uncertainty, which describes our knowledge of fixed quantities, it applies to all ‘unobservables’ (latent variables and parameters), and it declines asymptotically with sample size, and (2) variability, which applies to fluctuations that are not explained by deterministic processes and does not decline asymptotically with sample size. Examples demonstrate how different sources of stochasticity impact inference and prediction and how allowance for stochastic influences can guide research.

Book
26 Apr 2004
TL;DR: In this paper, the authors discuss the role of statistics in the scientific method and present several approaches to statistical data gathering, such as Graphically Displaying a Single Variable, Graphically Comparing Two Samples, and Graphically Contrasting Relationships between Two or More Variables.
Abstract: Preface. Preface to First Edition. 1. Introduction to Statistical Science. 1.1 The Scientific Method: A Process for Learning. 1.2 The Role of Statistics in the Scientific Method. 1.3 Main Approaches to Statistics. 1.4 Purpose and Organization of This Text. 2. Scientific Data Gathering. 2.1 Sampling from a Real Population. 2.2 Observational Studies and Designed Experiments. Monte Carlo Exercises. 3. Displaying and Summarizing Data. 3.1 Graphically Displaying a Single Variable. 3.2 Graphically Comparing Two Samples. 3.3 Measures of Location. 3.4 Measures of Spread. 3.5 Displaying Relationships Between Two or More Variables. 3.6 Measures of Association for Two or More Variables. Exercises. 4. Logic, Probability, and Uncertainty. 4.1 Deductive Logic and Plausible Reasoning. 4.2 Probability. 4.3 Axioms of Probability. 4.4 Joint Probability and Independent Events. 4.5 Conditional Probability. 4.6 Bayes' Theorem. 4.7 Assigning Probabilities. 4.8 Odds Ratios and Bayes Factor. 4.9 Beat the Dealer. Exercises. 5. Discrete Random Variables. 5.1 Discrete Random Variables. 5.2 Probability Distribution of a Discrete Random Variable. 5.3 Binomial Distribution. 5.4 Hypergeometric Distribution. 5.5 Poisson Distribution. 5.6 Joint Random Variables. 5.7 Conditional Probability for Joint Random Variables. Exercises. 6. Bayesian Inference for Discrete Random Variables. 6.1 Two Equivalent Ways of Using Bayes' Theorem. 6.2 Bayes' Theorem for Binomial with Discrete Prior. 6.3 Important Consequences of Bayes' Theorem. 6.4 Bayes' theorem for Poisson with Discrete Prior. Exercises. Computer Exercises. 7. Continuous Random Variables. 7.1 Probability Density Function. 7.2 Some Continuous Distributions. 7.3 Joint Continuous Random Variables. 7.4 Joint Continuous and Discrete Random Variables. Exercises. 8. Bayesian Inference for Binomial Proportion. 8.1 Using a Uniform Prior. 8.2 Using a Beta Prior. 8.3 Choosing Your Prior. 8.4 Summarizing the Posterior Distribution. 8.5 Estimating the Proportion. 8.6 Bayesian Credible Interval. Exercises. Computer Exercises. 9. Comparing Bayesian and Frequentist Inferences for Proportion. 9.1 Frequentist Interpretation of Probability and Parameters. 9.2 Point Estimation. 9.3 Comparing Estimators for Proportion. 9.4 Interval Estimation. 9.5 Hypothesis Testing. 9.6 Testing a OneSided Hypothesis. 9.7 Testing a TwoSided Hypothesis. Exercises. Monte Carlo Exercises. 10. Bayesian Inference for Poisson. 10.1 Some Prior Distributions for Poisson. 10.2 Inference for Poisson Parameter. Exercises. Computer Exercises. 11. Bayesian Inference for Normal Mean. 11.1 Bayes' Theorem for Normal Mean with a Discrete Prior. 11.2 Bayes' Theorem for Normal Mean with a Continuous Prior. 11.3 Choosing Your Normal Prior. 11.4 Bayesian Credible Interval for Normal Mean. 11.5 Predictive Density for Next Observation. Exercises. Computer Exercises. 12. Comparing Bayesian and Frequentist Inferences for Mean. 12.1 Comparing Frequentist and Bayesian Point Estimators. 12.2 Comparing Confidence and Credible Intervals for Mean. 12.3 Testing a OneSided Hypothesis about a Normal Mean. 12.4 Testing a TwoSided Hypothesis about a Normal Mean. Exercises. 13. Bayesian Inference for Difference between Means. 13.1 Independent Random Samples from Two Normal Distributions. 13.2 Case 1: Equal Variances. 13.3 Case 2: Unequal Variances. 13.4 Bayesian Inference for Difference Between Two Proportions Using Normal Approximation. 13.5 Normal Random Samples from Paired Experiments. Exercises. 14. Bayesian Inference for Simple Linear Regression. 14.1 Least Squares Regression. 14.2 Exponential Growth Model. 14.3 Simple Linear Regression Assumptions. 14.4 Bayes' Theorem for the Regression Model. 14.5 Predictive Distribution for Future Observation. Exercises. Computer Exercises. 15. Bayesian Inference for Standard Deviation. 15.1 Bayes' Theorem for Normal Variance with a Continuous Prior. 15.2 Some Specific Prior Distributions and The Resulting Posteriors. 15.3 Bayesian inference for normal standard deviation. Exercises. Computer Exercises. 16. Robust Bayesian Methods. 16.1 Effect of Misspecified Prior. 16.2 Bayes' Theorem with Mixture Priors. Exercises. Computer Exercises. A. Introduction to Calculus. B. Use of Statistical Tables. C. Using the Included Minitab Macros. D. Using the Included R Functions. E. Answers to Selected Exercises. References. Topic Index.

Journal ArticleDOI
TL;DR: This work states that Bayesian information-theoretic methods provide robust measures of the probability of alternative models, and multiple models can be averaged into a single model that reflects uncertainty in model construction and selection.
Abstract: Bayesian inference is an important statistical tool that is increasingly being used by ecologists. In a Bayesian analysis, information available before a study is conducted is summarized in a quantitative model or hypothesis: the prior probability distribution. Bayes Theorem uses the prior probability distribution and the likelihood of the data to generate a posterior probability distribution. Posterior probability distributions are an epistemological alternative to P-values and provide a direct measure of the degree of belief that can be placed on models, hypotheses, or parameter estimates. Moreover, Bayesian information-theoretic methods provide robust measures of the probability of alternative models, and multiple models can be averaged into a single model that reflects uncertainty in model construction and selection. These methods are demonstrated through a simple worked example. Ecologists are using Bayesian inference in studies that range from predicting single-species population dynamics to understanding ecosystem processes. Not all ecologists, however, appreciate the philosophical underpinnings of Bayesian inference. In particular, Bayesians and frequentists differ in their definition of probability and in their treatment of model parameters as random variables or estimates of true values. These assumptions must be addressed explicitly before deciding whether or not to use Bayesian methods to analyse ecological data.

Journal ArticleDOI
TL;DR: A novel influence score for DBNs is developed that attempts to estimate both the sign (activation or repression) and relative magnitude of interactions among variables and reduces a significant portion of false positive interactions in the recovered networks.
Abstract: Motivation: Network inference algorithms are powerful computational tools for identifying putative causal interactions among variables from observational data. Bayesian network inference algorithms hold particular promise in that they can capture linear, non-linear, combinatorial, stochastic and other types of relationships among variables across multiple levels of biological organization. However, challenges remain when applying these algorithms to limited quantities of experimental data collected from biological systems. Here, we use a simulation approach to make advances in our dynamic Bayesian network (DBN) inference algorithm, especially in the context of limited quantities of biological data. Results: We test a range of scoring metrics and search heuristics to find an effective algorithm configuration for evaluating our methodological advances. We also identify sampling intervals and levels of data discretization that allow the best recovery of the simulated networks. We develop a novel influence score for DBNs that attempts to estimate both the sign (activation or repression) and relative magnitude of interactions among variables. When faced with limited quantities of observational data, combining our influence score with moderate data interpolation reduces a significant portion of false positive interactions in the recovered networks. Together, our advances allow DBN inference algorithms to be more effective in recovering biological networks from experimentally collected data. Availability: Source code and simulated data are available upon request. Supplementary information: http://www.jarvislab.net/Bioinformatics/BNAdvances/

Book ChapterDOI
01 Jan 2004
TL;DR: A novel “Bayes-frequentist compromise” is proposed that combines honest subjective non- or semiparametric Bayesian inference with good frequentist behavior, even in cases where the model is so large and the likelihood function so complex that standard Bayes procedures have poor frequentist performance.
Abstract: I describe two new methods for estimating the optimal treatment regime (equivalently, protocol, plan or strategy) from very high dimesional observational and experimental data: (i) g-estimation of an optimal double-regime structural nested mean model (drSNMM) and (ii) g-estimation of a standard single regime SNMM combined with sequential dynamic-programming (DP) regression. These methods are compared to certain regression methods found in the sequential decision and reinforcement learning literatures and to the regret modelling methods of Murphy (2003). I consider both Bayesian and frequentist inference. In particular, I propose a novel “Bayes-frequentist compromise” that combines honest subjective non- or semiparametric Bayesian inference with good frequentist behavior, even in cases where the model is so large and the likelihood function so complex that standard (uncompromised) Bayes procedures have poor frequentist performance.

Journal ArticleDOI
TL;DR: In this article, it was shown that the naïve Bayes classifier, which assumes independent covariates, greatly outperforms the Fisher linear discriminant rule under broad conditions when the number of variables grows faster than number of observations, in the classical problem of discriminating between two normal populations.
Abstract: We show that the ‘naive Bayes’ classifier which assumes independent covariates greatly outperforms the Fisher linear discriminant rule under broad conditions when the number of variables grows faster than the number of observations, in the classical problem of discriminating between two normal populations. We also introduce a class of rules spanning the range between independence and arbitrary dependence. These rules are shown to achieve Bayes consistency for the Gaussian ‘coloured noise’ model and to adapt to a spectrum of convergence rates, which we conjecture to be minimax.

Journal ArticleDOI
TL;DR: In this paper, a Bayesian probabilistic approach is presented for selecting the most plausible class of models for a structural or mechanical system within some specified set of model classes, based on system response data.
Abstract: A Bayesian probabilistic approach is presented for selecting the most plausible class of models for a structural or mechanical system within some specified set of model classes, based on system response data. The crux of the approach is to rank the classes of models based on their probabilities conditional on the response data which can be calculated based on Bayes’ theorem and an asymptotic expansion for the evidence for each model class. The approach provides a quantitative expression of a principle of model parsimony or of Ockham’s razor which in this context can be stated as "simpler models are to be preferred over unnecessarily complicated ones." Examples are presented to illustrate the method using a single-degree-of-freedom bilinear hysteretic system, a linear two-story frame, and a ten-story shear building, all of which are subjected to seismic excitation.

Journal ArticleDOI
TL;DR: An empirical Bayes approach to the estimation of possibly sparse sequences observed in Gaussian white noise is set out and investigated, using a mixture of an atom of probability at zero and a heavy-tailed density y with the mixing weight chosen by marginal maximum likelihood.
Abstract: An empirical Bayes approach to the estimation of possibly sparse sequences observed in Gaussian white noise is set out and investigated. The prior considered is a mixture of an atom of probability at zero and a heavy-tailed density y, with the mixing weight chosen by marginal maximum likelihood, in the hope of adapting between sparse and dense sequences. If estimation is then carried out using the posterior median, this is a random thresholding procedure. Other thresholding rules employing the same threshold can also be used. Probability bounds on the threshold chosen by the marginal maximum likelihood approach lead to overall risk bounds over classes of signal sequences of length n, allowing for sparsity of various kinds and degrees. The signal classes considered are nearly black sequences where only a proportion η is allowed to be nonzero, and sequences with normalized p norm bounded by η, for η > 0 and 0 1. Simulations show excellent performance. For appropriately chosen functions y, the method is computationally tractable and software is available. The extension to a modified thresholding method relevant to the estimation of very sparse sequences is also considered.

Proceedings ArticleDOI
14 Mar 2004
TL;DR: It is shown that even if having a simple structure, naive Bayes provide very competitive results, and the good performance of Bayes nets with respect to existing best results performed on KDD'99.
Abstract: Bayes networks are powerful tools for decision and reasoning under uncertainty. A very simple form of Bayes networks is called naive Bayes, which are particularly efficient for inference tasks. However, naive Bayes are based on a very strong independence assumption. This paper offers an experimental study of the use of naive Bayes in intrusion detection. We show that even if having a simple structure, naive Bayes provide very competitive results. The experimental study is done on KDD'99 intrusion data sets. We consider three levels of attack granularities depending on whether dealing with whole attacks, or grouping them in four main categories or just focusing on normal and abnormal behaviours. In the whole experimentations, we compare the performance of naive Bayes networks with one of well known machine learning techniques which is decision tree. Moreover, we compare the good performance of Bayes nets with respect to existing best results performed on KDD'99.

Journal ArticleDOI
TL;DR: The results show that Bayesian disease-mapping models are essentially conservative, with high specificity even in situations with very sparse data but low sensitivity if the raised-risk areas have only a moderate (< 2-fold) excess or are not based on substantial expected counts (> 50 per area).
Abstract: There is currently much interest in conducting spatial analyses of health outcomes at the small-area scale. This requires sophisticated statistical techniques, usually involving Bayesian models, to smooth the underlying risk estimates because the data are typically sparse. However, questions have been raised about the performance of these models for recovering the "true" risk surface, about the influence of the prior structure specified, and about the amount of smoothing of the risks that is actually performed. We describe a comprehensive simulation study designed to address these questions. Our results show that Bayesian disease-mapping models are essentially conservative, with high specificity even in situations with very sparse data but low sensitivity if the raised-risk areas have only a moderate (less than 2-fold) excess or are not based on substantial expected counts (> 50 per area). Semiparametric spatial mixture models typically produce less smoothing than their conditional autoregressive counterpart when there is sufficient information in the data (moderate-size expected count and/or high true excess risk). Sensitivity may be improved by exploiting the whole posterior distribution to try to detect true raised-risk areas rather than just reporting and mapping the mean posterior relative risk. For the widely used conditional autoregressive model, we show that a decision rule based on computing the probability that the relative risk is above 1 with a cutoff between 70 and 80% gives a specific rule with reasonable sensitivity for a range of scenarios having moderate expected counts (approximately 20) and excess risks (approximately 1.5- to 2-fold). Larger (3-fold) excess risks are detected almost certainly using this rule, even when based on small expected counts, although the mean of the posterior distribution is typically smoothed to about half the true value.

01 Jan 2004
TL;DR: Extensions of penalized spline generalized additive models for analyzing space-time regression data and study them from a Bayesian per- spective using MCMC techniques is proposed.
Abstract: We propose extensions of penalized spline generalized additive models for analyzing space-time regression data and study them from a Bayesian per- spective. Non-linear effects of continuous covariates and time trends are modelled through Bayesian versions of penalized splines, while correlated spatial effects follow a Markov random field prior. This allows to treat all functions and effects within a unified general framework by assigning appropriate priors with different forms and degrees of smoothness. Inference can be performed either with full (FB) or empiri- cal Bayes (EB) posterior analysis. FB inference using MCMC techniques is a slight extension of previous work. For EB inference, a computationally efficient solution is developed on the basis of a generalized linear mixed model representation. The second approach can be viewed as posterior mode estimation and is closely related to penalized likelihood estimation in a frequentist setting. Variance components, corresponding to inverse smoothing parameters, are then estimated by marginal likelihood. We carefully compare both inferential procedures in simulation studies and illustrate them through data applications. The methodology is available in the open domain statistical package BayesX and as an S-plus/R function.

Journal ArticleDOI
01 Nov 2004
TL;DR: A systematic procedure for constructing Bayesian networks from domain knowledge of experts using the causal mapping approach and how the graphical structure of causal maps can be modified to construct Bayes nets is described.
Abstract: This paper describes a systematic procedure for constructing Bayesian networks (BNs) from domain knowledge of experts using the causal mapping approach. We outline how causal knowledge of experts can be represented as causal maps, and how the graphical structure of causal maps can be modified to construct Bayes nets. Probability encoding techniques can be used to assess the numerical parameters of the resulting Bayes nets. We illustrate the construction of a Bayes net starting from a causal map of a systems analyst in the context of an information technology application outsourcing decision.

Journal ArticleDOI
TL;DR: This work developed different variants of Bayesian mixture based clustering procedures for clustering gene expression data with experimental replicates and introduces a heuristic modification to the Gibbs sampler based on the 'reverse annealing' principle that effectively overcomes the tendency of the GibbsSampler to converge to different modes of the posterior distribution when started from different initial positions.
Abstract: Motivation: Identifying patterns of co-expression in microarray data by cluster analysis has been a productive approach to uncovering molecular mechanisms underlying biological processes under investigation. Using experimental replicates can generally improve the precision of the cluster analysis by reducing the experimental variability of measurements. In such situations, Bayesian mixtures allow for an efficient use of information by precisely modeling between-replicates variability. Results: We developed different variants of Bayesian mixture based clustering procedures for clustering gene expression data with experimental replicates. In this approach, the statistical distribution of microarray data is described by a Bayesian mixture model. Clusters of co-expressed genes are created from the posterior distribution of clusterings, which is estimated by a Gibbs sampler. We define infinite and finite Bayesian mixture models with different between-replicates variance structures and investigate their utility by analyzing synthetic and the real-world datasets. Results of our analyses demonstrate that (1) improvements in precision achieved by performing only two experimental replicates can be dramatic when the between-replicates variability is high, (2) precise modeling of intra-gene variability is important for accurate identification of co-expressed genes and (3) the infinite mixture model with the 'elliptical' between-replicates variance structure performed overall better than any other method tested. We also introduce a heuristic modification to the Gibbs sampler based on the 'reverse annealing' principle. This modification effectively overcomes the tendency of the Gibbs sampler to converge to different modes of the posterior distribution when started from different initial positions. Finally, we demonstrate that the Bayesian infinite mixture model with 'elliptical' variance structure is capable of identifying the underlying structure of the data without knowing the 'correct' number of clusters. Availability: The MS Windows™ based program named Gaussian Infinite Mixture Modeling (GIMM) implementing the Gibbs sampler and corresponding C++ code are available at http://homepages.uc.edu/~medvedm/GIMM.htm Supplemental information: http://expression.microslu.washington.edu/expression/kayee/medvedovic2003/medvedovic_bioinf2003.html

Journal ArticleDOI
TL;DR: A dynamic Bayesian network and nonparametric regression model for constructing a gene network from time series microarray gene expression data and derives a new criterion for evaluating an estimated network from Bayes approach is derived.
Abstract: We propose a dynamic Bayesian network and nonparametric regression model for constructing a gene network from time series microarray gene expression data. The proposed method can overcome a shortcoming of the Bayesian network model in the sense of the construction of cyclic regulations. The proposed method can analyze the microarray data as a continuous data and can capture even nonlinear relations among genes. It can be expected that this model will give a deeper insight into complicated biological systems. We also derive a new criterion for evaluating an estimated network from Bayes approach. We conduct Monte Carlo experiments to examine the effectiviness of the proposed method. We also demonstrate the proposed method through the analysis of the Saccharomyces cerevisiae gene expression data.

01 Jan 2004
TL;DR: A heuristic analysis is presented in this paper based on a simplified version of RF denoted RF0 that supports the empirical results from RF and illuminates why RF is able to handle large numbers of input variables and what the role of mtry is.
Abstract: A heuristic analysis is presented in this paper based on a simplified version of RF denoted RF0. The results from RF0 support the empirical results from RF. RF0 regression is consistent using a value of mtry that does not depend on the number of cases N The rate of convergence to the Bayes rule depends only on the number of strong variables and not on how many noise variables are also present.. This also implies consistency for the two class RF0 classification. The analysis also illuminates why RF is able to handle large numbers of input variables and what the role of mtry is.

Journal ArticleDOI
TL;DR: This report shows how the hierarchical summary receiver operating characteristic (HSROC) model may be fitted using the SAS procedure NLMIXED and to compare the results to the fully Bayesian analysis using an example.

Posted Content
TL;DR: In this article, the authors consider random coefficient models (RCMs) for time-series-cross-section data and assess several issues in specifying RCMs, and then consider the finite sample properties of some standard RCM estimators, and show that the most common one associated with Hsiao has very poor properties.
Abstract: This paper considers random coefficient models (RCMs) for time-series-cross-section data These models allow for unit to unit variation in the model parameters After laying out the various models, we assess several issues in specifying RCMs We then consider the finite sample properties of some standard RCM estimators, and show that the most common one, associated with Hsiao, has very poor properties These analyses also show that a somewhat awkward combination of estimators based on Swamy's work performs reasonably well; this awkward estimator and a Bayes estimator with an uninformative prior (due to Smith) seem to perform best But we also see that estimators which assume full pooling perform well unless there is a large degree of unit to unit parameter heterogeneity We also argue that the various data drive methods (whether classical or empirical Bayes or Bayes with gentle priors) tends to lead to much more heterogeneity than most political scientists would like We speculate that fully Bayesian models, with a variety of informative priors, may be the best way to approach RCMs

Journal ArticleDOI
TL;DR: This work proposes a statistical method for estimating a gene network based on Bayesian networks from microarray gene expression data together with biological knowledge including protein-protein interactions, protein-DNA interactions, binding site information, existing literature and so on.
Abstract: We propose a statistical method for estimating a gene network based on Bayesian networks from microarray gene expression data together with biological knowledge including protein-protein interactions, protein-DNA interactions, binding site information, existing literature and so on. Microarray data do not contain enough information for constructing gene networks accurately in many cases. Our method adds biological knowledge to the estimation method of gene networks under a Bayesian statistical framework, and also controls the trade-off between microarray information and biological knowledge automatically. We conduct Monte Carlo simulations to show the effectiveness of the proposed method. We analyze Saccharomyces cerevisiae gene expression data as an application.

Journal ArticleDOI
TL;DR: An important aspect of the approach I advocate is modeling the relationship between a trial's primary endpoint and early indications of patient performance-auxiliary endpoints.
Abstract: The Bayesian approach is being used increasingly in medical research. The flexibility of the Bayesian approach allows for building designs of clinical trials that have good properties of any desired sort. Examples include maximizing effective treatment of patients in the trial, maximizing information about the slope of a dose–response curve, minimizing costs, minimizing the number of patients treated, minimizing the length of the trial and combinations of these desiderata. They also include standard frequentist operating characteristics when these are important considerations. Posterior probabilities are updated via Bayes’ theorem on the basis of accumulating data. These are used to effect modifications of the trial’s course, including stopping accrual, extending accrual beyond that originally planned, dropping treatment arms, adding arms, etc. An important aspect of the approach I advocate is modeling the relationship between a trial’s primary endpoint and early indications of patient performance—auxiliary endpoints. This has several highly desirable consequences. One is that it improves the efficiency of adaptive trials because information is available sooner than otherwise.

Journal ArticleDOI
TL;DR: An expectation- maximization algorithm is derived to efficiently compute a maximum a posteriori point estimate of the various parameters and demonstrates both parsimonious feature selection and excellent classification accuracy on a range of synthetic and benchmark data sets.
Abstract: This paper adopts a Bayesian approach to simultaneously learn both an optimal nonlinear classifier and a subset of predictor variables (or features) that are most relevant to the classification task. The approach uses heavy-tailed priors to promote sparsity in the utilization of both basis functions and features; these priors act as regularizers for the likelihood function that rewards good classification on the training data. We derive an expectation- maximization (EM) algorithm to efficiently compute a maximum a posteriori (MAP) point estimate of the various parameters. The algorithm is an extension of recent state-of-the-art sparse Bayesian classifiers, which in turn can be seen as Bayesian counterparts of support vector machines. Experimental comparisons using kernel classifiers demonstrate both parsimonious feature selection and excellent classification accuracy on a range of synthetic and benchmark data sets.