scispace - formally typeset
Search or ask a question

Showing papers by "Bret Larget published in 2006"


Journal ArticleDOI
TL;DR: A novel 2-stage Markov chain Monte Carlo (MCMC) method that first obtains independent Bayesian posterior probability distributions for individual genes using standard methods and introduces a one-parameter probability distribution to describe the prior distribution of concordance among gene trees.
Abstract: Multigene sequence data have great potential for elucidating important and interesting evolutionary processes, but statistical methods for extracting information from such data remain limited. Although various biological processes may cause different genes to have different genealogical histories (and hence different tree topologies), we also may expect that the number of distinct topologies among a set of genes is relatively small compared with the number of possible topologies. Therefore evidence about the tree topology for one gene should influence our inferences of the tree topology on a different gene, but to what extent? In this paper, we present a new approach for modeling and estimating concordance among a set of gene trees given aligned molecular sequence data. Our approach introduces a one-parameter probability distribution to describe the prior distribution of concordance among gene trees. We describe a novel 2-stage Markov chain Monte Carlo (MCMC) method that first obtains independent Bayesian posterior probability distributions for individual genes using standard methods. These posterior distributions are then used as input for a second MCMC procedure that estimates a posterior distribution of gene-to-tree maps (GTMs). The posterior distribution of GTMs can then be summarized to provide revised posterior probability distributions for each gene (taking account of concordance) and to allow estimation of the proportion of the sampled genes for which any given clade is true (the sample-wide concordance factor). Further, under the assumption that the sampled genes are drawn randomly from a genome of known size, we show how one can obtain an estimate, with credibility intervals, on the proportion of the entire genome for which a clade is true (the genome-wide concordance factor). We demonstrate the method on a set of 106 genes from 8 yeast species.

485 citations


Journal ArticleDOI
21 Apr 2006-Science
TL;DR: It is shown that nearest neighbor interchange transitions, commonly used in phylogenetic Markov chain Monte Carlo algorithms, perform poorly on mixtures of dissimilar trees, but the conditions leading to their results are artificial.
Abstract: Mossel and Vigoda (Reports, 30 September 2005, p. 2207) show that nearest neighbor interchange transitions, commonly used in phylogenetic Markov chain Monte Carlo (MCMC) algorithms, perform poorly on mixtures of dissimilar trees. However, the conditions leading to their results are artificial. Standard MCMC convergence diagnostics would detect the problem in real data, and correction of the model misspecification would solve it.

30 citations