scispace - formally typeset
Search or ask a question

Showing papers on "Bayes' theorem published in 2005"


01 Jan 2005
TL;DR: In this paper, the problem of selecting one of a number of models of different dimensions is treated by finding its Bayes solution, and evaluating the leading terms of its asymptotic expansion.
Abstract: The problem of selecting one of a number of models of different dimensions is treated by finding its Bayes solution, and evaluating the leading terms of its asymptotic expansion. These terms are a valid large-sample criterion beyond the Bayesian context, since they do not depend on the a priori distribution.

36,760 citations


Journal ArticleDOI
TL;DR: It is found that in most cases the estimated ‘log probability of data’ does not provide a correct estimation of the number of clusters, K, and using an ad hoc statistic ΔK based on the rate of change in the log probability between successive K values, structure accurately detects the uppermost hierarchical level of structure for the scenarios the authors tested.
Abstract: The identification of genetically homogeneous groups of individuals is a long standing issue in population genetics. A recent Bayesian algorithm implemented in the software STRUCTURE allows the identification of such groups. However, the ability of this algorithm to detect the true number of clusters (K) in a sample of individuals when patterns of dispersal among populations are not homogeneous has not been tested. The goal of this study is to carry out such tests, using various dispersal scenarios from data generated with an individual-based model. We found that in most cases the estimated 'log probability of data' does not provide a correct estimation of the number of clusters, K. However, using an ad hoc statistic DeltaK based on the rate of change in the log probability of data between successive K values, we found that STRUCTURE accurately detects the uppermost hierarchical level of structure for the scenarios we tested. As might be expected, the results are sensitive to the type of genetic marker used (AFLP vs. microsatellite), the number of loci scored, the number of populations sampled, and the number of individuals typed in each sample.

18,572 citations


Journal ArticleDOI
TL;DR: A Bayes empirical Bayes (BEB) approach to the Codon-based substitution models problem is developed, which assigns a prior to the model parameters and integrates over their uncertainties, and the results suggest that in small data sets the new BEB method does not generate false positives as did the old NEB approach.
Abstract: Codon-based substitution models have been widely used to identify amino acid sites under positive selection in comparative analysis of protein-coding DNA sequences. The nonsynonymous-synonymous substitution rate ratio (d(N)/d(S), denoted omega) is used as a measure of selective pressure at the protein level, with omega > 1 indicating positive selection. Statistical distributions are used to model the variation in omega among sites, allowing a subset of sites to have omega > 1 while the rest of the sequence may be under purifying selection with omega 1. Current implementations, however, use the naive EB (NEB) approach and fail to account for sampling errors in maximum likelihood estimates of model parameters, such as the proportions and omega ratios for the site classes. In small data sets lacking information, this approach may lead to unreliable posterior probability calculations. In this paper, we develop a Bayes empirical Bayes (BEB) approach to the problem, which assigns a prior to the model parameters and integrates over their uncertainties. We compare the new and old methods on real and simulated data sets. The results suggest that in small data sets the new BEB method does not generate false positives as did the old NEB approach, while in large data sets it retains the good power of the NEB approach for inferring positively selected sites.

2,076 citations


Journal ArticleDOI
TL;DR: Innuence diagrams are graphical models for structuring decision scenarios, particularly scenarios consisting of a predeened sequence of actions and observations, and the Bayesian network framework with nodes for decisions and utilities is extended.
Abstract: Innuence diagrams are graphical models for structuring decision scenarios, particularly scenarios consisting of a predeened sequence of actions and observations. Innuence diagrams were originally introduced by 3] as a compact representation of symmetric decision trees 8] but they may also be thought of as extensions of Bayesian networks. This article is based on the article on Bayesian graphical models (referred to as BGM), and the reader is advised to read BGM before proceeding. UTILITIES The basis of innuence diagrams are probabilities and utilities. Utilities are quantiied measures for preference. That is, a real number is attached to each possible scenario in question. The beliefs in the scenarios are expressed as probabilities. (1) EXAMPLE The Bayesian network in BGM Figure 1 can for example be used to calculate the expected distribution for River ow given various observations. Assume that we can decide on a set of diierent types of Land use, and we have to balance the decision with the impact on River ow. This can be modelled by extending the Bayesian network framework with nodes for decisions and utilities: The Land use node is changed to a decision node, and we give River ow as well as Land use a diamond shaped child indicating that we attach utilities to these variables (see Figure 1). With this representation, the computer can easily calculate the impact from the various decisions and thereby the expected utilities that arrive from taking each decision. The user is advised to take the decision with highest expected utility.

1,000 citations


Journal ArticleDOI
TL;DR: A novel framework for small-sample inference of graphical models from gene expression data that focuses on the so-called graphical Gaussian models (GGMs) that are now frequently used to describe gene association networks and to detect conditionally dependent genes is introduced.
Abstract: Motivation: Genetic networks are often described statistically using graphical models (e.g. Bayesian networks). However, inferring the network structure offers a serious challenge in microarray analysis where the sample size is small compared to the number of considered genes. This renders many standard algorithms for graphical models inapplicable, and inferring genetic networks an 'ill-posed' inverse problem. Methods: We introduce a novel framework for small-sample inference of graphical models from gene expression data. Specifically, we focus on the so-called graphical Gaussian models (GGMs) that are now frequently used to describe gene association networks and to detect conditionally dependent genes. Our new approach is based on (1) improved (regularized) small-sample point estimates of partial correlation, (2) an exact test of edge inclusion with adaptive estimation of the degree of freedom and (3) a heuristic network search based on false discovery rate multiple testing. Steps (2) and (3) correspond to an empirical Bayes estimate of the network topology. Results: Using computer simulations, we investigate the sensitivity (power) and specificity (true negative rate) of the proposed framework to estimate GGMs from microarray data. This shows that it is possible to recover the true network topology with high accuracy even for small-sample datasets. Subsequently, we analyze gene expression data from a breast cancer tumor study and illustrate our approach by inferring a corresponding large-scale gene association network for 3883 genes. Availability: The authors have implemented the approach in the R package 'GeneTS' that is freely available from http://www.stat.uni-muenchen.de/~strimmer/genets/, from the R archive (CRAN) and from the Bioconductor website. Contact: korbinian.strimmer@lmu.de

866 citations


Journal ArticleDOI
TL;DR: This work presents an extension of a previously described frequentist (maximum likelihood or ML) approach to estimate individual admixture that allows for uncertainty in ancestral allele frequencies and demonstrates increased robustness when compared to an existing partial ML approach.
Abstract: The genome of an admixed individual represents a mixture of alleles from different ancestries. In the United States, the two largest minority groups, African-Americans and Hispanics, are both admixed. An understanding of the admixture proportion at an individual level (individual admixture, or IA) is valuable for both population geneticists and epidemiologists who conduct case-control association studies in these groups. Here we present an extension of a previously described frequentist (maximum likelihood or ML) approach to estimate individual admixture that allows for uncertainty in ancestral allele frequencies. We compare this approach both to prior partial likelihood based methods as well as more recently described Bayesian MCMC methods. Our full ML method demonstrates increased robustness when compared to an existing partial ML approach. Simulations also suggest that this frequentist estimator achieves similar efficiency, measured by the mean squared error criterion, as Bayesian methods but requires just a fraction of the computational time to produce point estimates, allowing for extensive analysis (e.g., simulations) not possible by Bayesian methods. Our simulation results demonstrate that inclusion of ancestral populations or their surrogates in the analysis is required by any method of IA estimation to obtain reasonable results.

607 citations


Journal ArticleDOI
TL;DR: The primary goal is to provide veterinary researchers with a concise presentation of the computational aspects involved in using the Bayesian framework for test evaluation.

490 citations


Journal ArticleDOI
TL;DR: The statistic P rep estimates the probability of replicating an effect, and provides all of the information now used in evaluating research, while avoiding many of the pitfalls of traditional statistical inference.
Abstract: The statistic p(rep) estimates the probability of replicating an effect. It captures traditional publication criteria for signal-to-noise ratio, while avoiding parametric inference and the resulting Bayesian dilemma. In concert with effect size and replication intervals, p(rep) provides all of the information now used in evaluating research, while avoiding many of the pitfalls of traditional statistical inference.

429 citations


Journal ArticleDOI
TL;DR: It is shown that reductions in parameter uncertainty, and thus in output uncertainty, can be effected by increasing the variety of data, increasing the accuracy of measurements and increasing the length of time series.
Abstract: Process-based forest models generally have many parameters, multiple outputs of interest and a small underlying empirical database. These characteristics hamper parameterization. Bayesian calibration offers a solution to the calibration problem because it applies to models of any type or size. It provides parameter estimates, with measures of uncertainty and correlation among the parameters. The procedure begins by quantifying the uncertainty about parameter values in the form of a prior probability distribution. Then data on the output variables are used to update the parameter distribution by means of Bayes' Theorem. This yields a posterior calibrated distribution for the parameters, which can be summarized in the form of a mean vector and variance matrix. The predictive uncertainty of the model can be quantified by running it with different parameter settings, sampled from the posterior distribution. In a further step, one may evaluate the posterior probability of the model itself (rather than that of the parameters) and compare that against the probability of other models, to aid in model selection or improvement. Bayesian calibration of process-based models cannot be performed analytically, so the posterior parameter distribution must be approximated in the form of a representative sample of parameter values. This can be achieved by means of Markov Chain Monte Carlo simulation, which is suitable for process-based models because of its simplicity and because it does not require advance knowledge of the shape of the posterior distribution. Despite the suitability of Bayesian calibration, the technique has rarely been used in forestry research. We introduce the method, using the example of a typical forest model. Further, we show that reductions in parameter uncertainty, and thus in output uncertainty, can be effected by increasing the variety of data, increasing the accuracy of measurements and increasing the length of time series.

353 citations


Journal ArticleDOI
08 Jul 2005-Science
TL;DR: This work used Bayesian inference to derive a probability distribution that represents the unknown structure and its precision and implemented this approach by using Markov chain Monte Carlo techniques, providing an objective figure of merit and improves structural quality.
Abstract: Macromolecular structures calculated from nuclear magnetic resonance data are not fully determined by experimental data but depend on subjective choices in data treatment and parameter settings. This makes it difficult to objectively judge the precision of the structures. We used Bayesian inference to derive a probability distribution that represents the unknown structure and its precision. This probability distribution also determines additional unknowns, such as theory parameters, that previously had to be chosen empirically. We implemented this approach by using Markov chain Monte Carlo techniques. Our method provides an objective figure of merit and improves structural quality.

323 citations


Journal ArticleDOI
TL;DR: This article provides an introduction to Bayesian statistics, hierarchical modeling, and Markov chain Monte Carlo computational techniques and shows that a signal detection analysis of recognition memory data leads to asymptotic underestimation of sensitivity.
Abstract: Although many nonlinear models of cognition have been proposed in the past 50 years, there has been little consideration of corresponding statistical techniques for their analysis. In analyses with nonlinear models, unmodeled variability from the selection of items or participants may lead to asymptotically biased estimation. This asymptotic bias, in turn, renders inference problematic. We show, for example, that a signal detection analysis of recognition memory data leads to asymptotic underestimation of sensitivity. To eliminate asymptotic bias, we advocate hierarchical models in which participant variability, item variability, and measurement error are modeled simultaneously. By accounting for multiple sources of variability, hierarchical models yield consistent and accurate estimates of participant and item effects in recognition memory. This article is written in tutorial format; we provide an introduction to Bayesian statistics, hierarchical modeling, and Markov chain Monte Carlo computational techniques.

Journal ArticleDOI
TL;DR: A key feature appears to be that the estimate of sparsity adapts to three different zones of estimation, first where the signal is not sparse enough for thresholding to be of benefit, second where an appropriately chosen threshold results in substantially improved estimation, and third where the signals are so sparse that the zero estimate gives the optimum accuracy rate.
Abstract: This paper explores a class of empirical Bayes methods for level-dependent threshold selection in wavelet shrinkage. The prior considered for each wavelet coefficient is a mixture of an atom of probability at zero and a heavy-tailed density. The mixing weight, or sparsity parameter, for each level of the transform is chosen by marginal maximum likelihood. If estimation is carried out using the posterior median, this is a random thresholding procedure; the estimation can also be carried out using other thresholding rules with the same threshold. Details of the calculations needed for implementing the procedure are included. In practice, the estimates are quick to compute and there is software available. Simulations on the standard model functions show excellent performance, and applications to data drawn from various fields of application are used to explore the practical performance of the approach. By using a general result on the risk of the corresponding marginal maximum likelihood approach for a single sequence, overall bounds on the risk of the method are found subject to membership of the unknown function in one of a wide range of Besov classes, covering also the case of f of bounded variation. The rates obtained are optimal for any value of the parameter p in (0,\infty], simultaneously for a wide range of loss functions, each dominating the L_q norm of the \sigmath derivative, with \sigma\ge0 and 0

Proceedings ArticleDOI
07 Aug 2005
TL;DR: It is shown that, for a wide range of benchmark datasets, naive Bayes models learned using EM have accuracy and learning time comparable to Bayesian networks with context-specific independence.
Abstract: Naive Bayes models have been widely used for clustering and classification. However, they are seldom used for general probabilistic learning and inference (i.e., for estimating and computing arbitrary joint, conditional and marginal distributions). In this paper we show that, for a wide range of benchmark datasets, naive Bayes models learned using EM have accuracy and learning time comparable to Bayesian networks with context-specific independence. Most significantly, naive Bayes inference is orders of magnitude faster than Bayesian network inference using Gibbs sampling and belief propagation. This makes naive Bayes models a very attractive alternative to Bayesian networks for general probability estimation, particularly in large or real-time domains.

Journal ArticleDOI
TL;DR: A simple solution to the problem involving a reversible-jump Markov chain Monte Carlo (MCMC) algorithm that allows exploration of all of tree space, including unresolved tree topologies with one or more polytomies, and eliminates misleadingly high posteriors associated with arbitrary resolutions of hard polytomie.
Abstract: Bayesian phylogenetic analyses are now very popular in systematics and molecular evolution because they allow the use of much more realistic models than currently possible with maximum likelihood methods. There are, however, a growing number of examples in which large Bayesian posterior clade probabilities are associated with very short branch lengths and low values for non-Bayesian measures of support such as nonparametric bootstrapping. For the four-taxon case when the true tree is the star phylogeny, Bayesian analyses become increasingly unpredictable in their preference for one of the three possible resolved tree topologies as data set size increases. This leads to the prediction that hard (or near-hard) polytomies in nature will cause unpredictable behavior in Bayesian analyses, with arbitrary resolutions of the polytomy receiving very high posterior probabilities in some cases. We present a simple solution to this problem involving a reversible-jump Markov chain Monte Carlo (MCMC) algorithm that allows exploration of all of tree space, including unresolved tree topologies with one or more polytomies. The reversible-jump MCMC approach allows prior distributions to place some weight on less-resolved tree topologies, which eliminates misleadingly high posteriors associated with arbitrary resolutions of hard polytomies. Fortunately, assigning some prior probability to polytomous tree topologies does not appear to come with a significant cost in terms of the ability to assess the level of support for edges that do exist in the true tree. Methods are discussed for applying arbitrary prior distributions to tree topologies of varying resolution, and an empirical example showing evidence of polytomies is analyzed and discussed.

Book
29 Nov 2005
TL;DR: In this article, the Mixture-based extension of the AR Model (MEAR) was used for medical image segmentation, and the results showed that the MEAR model can be used to detect time-invariant and time-variant parameters.
Abstract: Bayesian Theory.- Off-line Distributional Approximations and the Variational Bayes Method.- Principal Component Analysis and Matrix Decompositions.- Functional Analysis of Medical Image Sequences.- On-line Inference of Time-Invariant Parameters.- On-line Inference of Time-Variant Parameters.- The Mixture-based Extension of the AR Model (MEAR).- Concluding Remarks.

Journal ArticleDOI
TL;DR: This work uses a spatial prior on regression coefficients which embodies the prior knowledge that evoked responses are spatially contiguous and locally homogeneous and uses a computationally efficient Variational Bayes framework to let the data determine the optimal amount of smoothing.

Journal ArticleDOI
TL;DR: Variational approximations are used to perform the analogous model selection task in the Bayesian context and place JunB and JunD at the centre of the mechanisms that control apoptosis and proliferation.
Abstract: Motivation: We have used state-space models (SSMs) to reverse engineer transcriptional networks from highly replicated gene expression profiling time series data obtained from a well-established model of T cell activation. SSMs are a class of dynamic Bayesian networks in which the observed measurements depend on some hidden state variables that evolve according to Markovian dynamics. These hidden variables can capture effects that cannot be directly measured in a gene expression profiling experiment, for example: genes that have not been included in the microarray, levels of regulatory proteins, the effects of mRNA and protein degradation, etc. Results: We have approached the problem of inferring the model structure of these state-space models using both classical and Bayesian methods. In our previous work, a bootstrap procedure was used to derive classical confidence intervals for parameters representing 'gene--gene' interactions over time. In this article, variational approximations are used to perform the analogous model selection task in the Bayesian context. Certain interactions are present in both the classical and the Bayesian analyses of these regulatory networks. The resulting models place JunB and JunD at the centre of the mechanisms that control apoptosis and proliferation. These mechanisms are key for clonal expansion and for controlling the long term behavior (e.g. programmed cell death) of these cells. Availability: Supplementary data is available at http://public.kgi.edu/wild/index.htm and Matlab source code for variational Bayesian learning of SSMs is available at http://www.cse.ebuffalo.edu/faculty/mbeal/software.html Contact: David_Wild@kgi.edu

Journal ArticleDOI
TL;DR: A combination of extended-connectivity fingerprints (ECFPs) and Laplacian-modified Bayesian analysis in a study of the inhibition of Escherichia coli dihydrofolate reductase shows that 2D methods offer surprisingly competitive results with a low computational cost.
Abstract: This article describes the use of a combination of extended-connectivity fingerprints (ECFPs) and Laplacian-modified Bayesian analysis in a study of the inhibition of Escherichia coli dihydrofolate reductase. The McMaster High-Throughput Screening Lab at McMaster University proposed a competition to predict the hits in a separate test set of 50,000 compounds. Although the problem seemed best approached with 3D methods, the authors show that 2D methods offer surprisingly competitive results with a low computational cost.

Journal ArticleDOI
TL;DR: The goal of this paper is to develop effective statistical tools to identify genomic loci that show transcriptional or protein binding patterns of interest and a two-step approach is proposed and is implemented in TileMap.
Abstract: Motivation: Tiling array is a new type of microarray that can be used to survey genomic transcriptional activities and transcription factor binding sites at high resolution. The goal of this paper is to develop effective statistical tools to identify genomic loci that show transcriptional or protein binding patterns of interest. Results: A two-step approach is proposed and is implemented in TileMap. In the first step, a test-statistic is computed for each probe based on a hierarchical empirical Bayes model. In the second step, the test-statistics of probes within a genomic region are used to infer whether the region is of interest or not. Hierarchical empirical Bayes model shrinks variance estimates and increases sensitivity of the analysis. It allows complex multiple sample comparisons that are essential for the study of temporal and spatial patterns of hybridization across different experimental conditions. Neighboring probes are combined through a moving average method (MA) or a hidden Markov model (HMM). Unbalanced mixture subtraction is proposed to provide approximate estimates of false discovery rate for MA and model parameters for HMM. Availability: TileMap is freely available at http://biogibbs.stanford.edu/~jihk/TileMap/index.htm Contact: whwong@stanford.edu Supplementary information:http://biogibbs.stanford.edu/~jihk/TileMap/index.htm (includes coloured versions of all figures)

Journal ArticleDOI
05 May 2005-BMJ
TL;DR: How bayesian reasoning is a natural part of clinical decision making, particularly as it pertains to the clinical history and physical examination, and how bayesian approaches are a powerful and intuitive approach to the differential diagnosis are explained.
Abstract: Thought you didn't understand bayesian statistics? Read on and find out why doctors are expert in applying the theory, whether they realise it or not

Journal ArticleDOI
TL;DR: Simulations and real examples show that the proposed method is very competitive in terms of variable selection, estimation accuracy, and computation speed compared with other variable selection and estimation methods.
Abstract: We propose an empirical Bayes method for variable selection and coefficient estimation in linear regression models. The method is based on a particular hierarchical Bayes formulation, and the empirical Bayes estimator is shown to be closely related to the LASSO estimator. Such a connection allows us to take advantage of the recently developed quick LASSO algorithm to compute the empirical Bayes estimate, and provides a new way to select the tuning parameter in the LASSO method. Unlike previous empirical Bayes variable selection methods, which in most practical situations can be implemented only through a greedy stepwise algorithm, our method gives a global solution efficiently. Simulations and real examples show that the proposed method is very competitive in terms of variable selection, estimation accuracy, and computation speed compared with other variable selection and estimation methods.

Journal ArticleDOI
TL;DR: In this paper, the authors examine how one's propensity to use Bayes' rule is affected by whether this rule is aligned with reinforcement or clashes with it, and find that when these forces clash, around 50% of all decisions are inconsistent with Bayesian updating.
Abstract: We examine decision-making under risk and uncertainty in a laboratory experiment. The heart of our design examines how one’s propensity to use Bayes’ rule is affected by whether this rule is aligned with reinforcement or clashes with it. In some cases, we create environments where Bayesian updating after a successful outcome should lead a decision-maker to make a change, while no change should be made after observing an unsuccessful outcome. We observe striking patterns: When payoff reinforcement and Bayesian updating are aligned, nearly all people respond as expected. However, when these forces clash, around 50% of all decisions are inconsistent with Bayesian updating. While people tend to make costly initial choices that eliminate complexity in a subsequent decision, we find that complexity alone cannot explain our results. Finally, when a draw provides only information (and no payment), switching errors occur much less frequently, suggesting that the ‘emotional reinforcement’ (affect) induced by payments is a critical factor in deviations from Bayesian updating. There is considerable behavioral heterogeneity; we identify different types in the population and find that people who make ‘switching errors’ are more likely to have cross-period ‘reinforcement’ tendencies.

Journal ArticleDOI
TL;DR: The methodology includes uncertainty in the experimental measurement, and the posterior and prior distributions of the model output are used to compute a validation metric based on Bayesian hypothesis testing.

Journal ArticleDOI
TL;DR: It is concluded that Bayesian diagnosticity is normatively flawed and empirically unjustified.
Abstract: Several norms for how people should assess a question's usefulness have been proposed, notably Bayesian diagnosticity, information gain (mutual information), Kullback-Liebler distance, probability gain (error minimization), and impact (absolute change). Several probabilistic models of previous experiments on categorization, covariation assessment, medical diagnosis, and the selection task are shown to not discriminate among these norms as descriptive models of human intuitions and behavior. Computational optimization found situations in which information gain, probability gain, and impact strongly contradict Bayesian diagnosticity. In these situations, diagnosticity's claims are normatively inferior. Results of a new experiment strongly contradict the predictions of Bayesian diagnosticity. Normative theoretical concerns also argue against use of diagnosticity. It is concluded that Bayesian diagnosticity is normatively flawed and empirically unjustified.

Journal ArticleDOI
TL;DR: In this article, a Bayesian approach to evaluate analysis of variance or analysis of covariance models with inequality constraints on the (adjusted) means is presented and contains two issues: estimation of the parameters given the restrictions using the Gibbs sampler and model selection using Bayes factors in the case of competing theories.
Abstract: Researchers often have one or more theories or expectations with respect to the outcome of their empirical research. When researchers talk about the expected relations between variables if a certain theory is correct, their statements are often in terms of one or more parameters expected to be larger or smaller than one or more other parameters. Stated otherwise, their statements are often formulated using inequality constraints. In this article, a Bayesian approach to evaluate analysis of variance or analysis of covariance models with inequality constraints on the (adjusted) means is presented. This evaluation contains two issues: estimation of the parameters given the restrictions using the Gibbs sampler and model selection using Bayes factors in the case of competing theories. The article concludes with two illustrations: a one-way analysis of covariance and an analysis of a three-way table of ordered means.

Journal ArticleDOI
TL;DR: The VOBN model can distinguish these 238 sites from a set of 472 intergenic 'non-promoter' sequences with a higher accuracy than fixed-order Markov models or Bayesian trees.
Abstract: Motivation: We propose a new class of variable-order Bayesian network (VOBN) models for the identification of transcription factor binding sites (TFBSs). The proposed models generalize the widely used position weight matrix (PWM) models, Markov models and Bayesian network models. In contrast to these models, where for each position a fixed subset of the remaining positions is used to model dependencies, in VOBN models, these subsets may vary based on the specific nucleotides observed, which are called the context. This flexibility turns out to be of advantage for the classification and analysis of TFBSs, as statistical dependencies between nucleotides in different TFBS positions (not necessarily adjacent) may be taken into account efficiently---in a position-specific and context-specific manner. Results: We apply the VOBN model to a set of 238 experimentally verified sigma-70 binding sites in Escherichia coli. We find that the VOBN model can distinguish these 238 sites from a set of 472 intergenic 'non-promoter' sequences with a higher accuracy than fixed-order Markov models or Bayesian trees. We use a replicated stratified-holdout experiment having a fixed true-negative rate of 99.9%. We find that for a foreground inhomogeneous VOBN model of order 1 and a background homogeneous variable-order Markov (VOM) model of order 5, the obtained mean true-positive (TP) rate is 47.56%. In comparison, the best TP rate for the conventional models is 44.39%, obtained from a foreground PWM model and a background 2nd-order Markov model. As the standard deviation of the estimated TP rate is ∼0.01%, this improvement is highly significant. Availability: All datasets are available upon request from the authors. A web server for utilizing the VOBN and VOM models is available at http://www.eng.tau.ac.il/~bengal/ Contact: bengal@eng.tau.ac.il

Journal ArticleDOI
TL;DR: In this article, a nonparametric mixture-of-normal model is proposed to identify genes that are differentially expressed between normal tissue and colon cancer tissue samples, which is similar to a popular empirical Bayes approach that is used for the same inference problem.
Abstract: conditions. The probability model is a mixture of normal distributions. The resulting inference is similar to a popular empirical Bayes approach that is used for the same inference problem. The use of fully model-based inference mitigates some of the necessary limitations of the empirical Bayes method. We argue that inference is no more difficult than posterior simulation in traditional nonparametric mixture-of-normal models. The approach proposed is motivated by a microarray experiment that was carried out to identify genes that are differentially expressed between normal tissue and colon cancer tissue samples. Additionally, we carried out a small simulation study to verify the methods proposed. In the motivating case-studies we show how the nonparametric Bayes approach facilitates the evaluation of posterior expected false discovery rates. We also show how inference can proceed even in the absence of a null sample of known non-differentially expressed scores. This highlights the difference from alternative empirical Bayes approaches that are based on plug-in estimates.

Journal ArticleDOI
TL;DR: In this article, the accuracy of several relaxed-clock methods (penalized likelihood and Bayesian inference using various models of rate change) using nucleotide sequences simulated on a nine-taxon tree.
Abstract: In recent years, a number of phylogenetic methods have been developed for estimating molecular rates and divergence dates under models that relax the molecular clock constraint by allowing rate change throughout the tree. These methods are being used with increasing frequency, but there have been few studies into their accuracy. We tested the accuracy of several relaxed-clock methods (penalized likelihood and Bayesian inference using various models of rate change) using nucleotide sequences simulated on a nine-taxon tree. When the sequences evolved with a constant rate, the methods were able to infer rates accurately, but estimates were more precise when a molecular clock was assumed. When the sequences evolved under a model of auto-correlated rate change, rates were accurately estimated using penalized likelihood and by Bayesian inference using lognormal and exponential models of rate change, while other models did not perform as well. When the sequences evolved under a model of uncorrelated rate change, only Bayesian inference using an exponential rate model performed well. Collectively, the results provide a strong recommendation for using the exponential model of rate change if a conservative approach to divergence time estimation is required. A case study is presented in which we use a simulation-based approach to examine the hypothesis of elevated rates in the Cambrian period, and it is found that these high rate estimates might be an artifact of the rate estimation method. If this bias is present, then the ages of metazoan divergences would be systematically underestimated. The results of this study have implications for studies of molecular rates and divergence dates.

Book
01 Jan 2005
TL;DR: This paper uses local false discovery rate methods to carry out size and power calculations on large-scale data sets and an empirical Bayes approach allows the fdr analysis to proceed from a minimum of frequentist or Bayesian modeling assumptions.
Abstract: Modern scientific technology is providing a new class of large-scale simultaneous inference problems, with hundreds or thousands of hypothesis tests to consider at the same time. Microarrays epitomize this type of technology but similar problems arise in proteomics, time of flight spectroscopy, flow cytometry, FMRI imaging, and massive social science surveys. This paper uses local false discovery rate methods to carry out size and power calculations on large-scale data sets. An empirical Bayes approach allows the fdr analysis to proceed from a minimum of frequentist or Bayesian modeling assumptions. Microarray and simulated data sets are used to illustrate a convenient estimation methodology whose accuracy can be calculated in closed form. A crucial part of the methodology is an fdr assessment of “thinned counts”, what the histogram of test statistics would look like for just the non-null cases.

Journal ArticleDOI
TL;DR: There are techniques from the Classical approach that are closer-those based directly on the likelihood-and Bayesian techniques failed to make comparisons with these, as this letter will argue.
Abstract: In a recent Statistics in Medicine paper, Warn, Thompson and Spiegelhalter (WTS) made a comparison between the Bayesian approach to the meta-analysis of binary outcomes and a popular Classical approach that uses summary (two-stage) techniques. They included approximate summary (two-stage) Bayesian techniques in their comparisons in an attempt undoubtedly to make the comparison less unfair. But, as this letter will argue, there are techniques from the Classical approach that are closer-those based directly on the likelihood-and they failed to make comparisons with these. Here the differences between Bayesian and Classical approaches in meta-analysis applications reside solely in how the likelihood functions are converted into either credibility intervals or confidence intervals. Both summarize, contrast and combine data using likelihood functions. Conflating what Bayes actually offers to meta-analysts-a means of converting likelihood functions to credibility intervals-with the use of likelihood functions themselves to summarize, contrast and combine studies is at best misleading.