scispace - formally typeset
Search or ask a question
Posted Content

The conditioned reconstructed process

TL;DR: A neutral model for speciation and extinction, the constant rate birth-death process, is investigated and the tree distribution of the reconstructed trees is looked at--i.e. the trees without the extinct species.
Abstract: We investigate a neutral model for speciation and extinction, the constant rate birth-death process. The process is conditioned to have $n$ extant species today, we look at the tree distribution of the reconstructed trees-- i.e. the trees without the extinct species. Whereas the tree shape distribution is well-known and actually the same as under the pure birth process, no analytic results for the speciation times were known. We provide the distribution for the speciation times and calculate the expectations analytically. This characterizes the reconstructed trees completely. We will show how the results can be used to date phylogenies.
Citations
More filters
Journal ArticleDOI
TL;DR: The new version provides convergence diagnostics and allows multiple analyses to be run in parallel with convergence progress monitored on the fly, and provides more output options than previously, including samples of ancestral states, site rates, site dN/dS rations, branch rates, and node dates.
Abstract: Since its introduction in 2001, MrBayes has grown in popularity as a software package for Bayesian phylogenetic inference using Markov chain Monte Carlo (MCMC) methods. With this note, we announce the release of version 3.2, a major upgrade to the latest official release presented in 2003. The new version provides convergence diagnostics and allows multiple analyses to be run in parallel with convergence progress monitored on the fly. The introduction of new proposals and automatic optimization of tuning parameters has improved convergence for many problems. The new version also sports significantly faster likelihood calculations through streaming single-instruction-multiple-data extensions (SSE) and support of the BEAGLE library, allowing likelihood calculations to be delegated to graphics processing units (GPUs) on compatible hardware. Speedup factors range from around 2 with SSE code to more than 50 with BEAGLE for codon problems. Checkpointing across all models allows long runs to be completed even when an analysis is prematurely terminated. New models include relaxed clocks, dating, model averaging across time-reversible substitution models, and support for hard, negative, and partial (backbone) tree constraints. Inference of species trees from gene trees is supported by full incorporation of the Bayesian estimation of species trees (BEST) algorithms. Marginal model likelihoods for Bayes factor tests can be estimated accurately across the entire model space using the stepping stone method. The new version provides more output options than previously, including samples of ancestral states, site rates, site d(N)/d(S) rations, branch rates, and node dates. A wide range of statistics on tree parameters can also be output for visualization in FigTree and compatible software.

18,718 citations


Cites background or methods from "The conditioned reconstructed proce..."

  • ...…prior model on clock trees has been expanded to incorporate recent progress in the understanding of the linear constant birth–death process with complete sampling (Gernhard 2008), with random incomplete sampling (Stadler 2009), or with clustered or diversified sampling (Höhna et al. 2011)....

    [...]

  • ...The birth–death prior model on clock trees has been expanded to incorporate recent progress in the understanding of the linear constant birth–death process with complete sampling (Gernhard 2008), with random incomplete sampling (Stadler 2009), or with clustered or diversified sampling (Höhna et al....

    [...]

Journal ArticleDOI
TL;DR: It is demonstrated that both BEST and the new Bayesian Markov chain Monte Carlo method for the multispecies coalescent have much better estimation accuracy for species tree topology than concatenation, and the method outperforms BEST in divergence time and population size estimation.
Abstract: Until recently, it has been common practice for a phylogenetic analysis to use a single gene sequence from a single individual organism as a proxy for an entire species. With technological advances, it is now becoming more common to collect data sets containing multiple gene loci and multiple individuals per species. These data sets often reveal the need to directly model intraspecies polymorphism and incomplete lineage sorting in phylogenetic estimation procedures. For a single species, coalescent theory is widely used in contemporary population genetics to model intraspecific gene trees. Here, we present a Bayesian Markov chain Monte Carlo method for the multispecies coalescent. Our method coestimates multiple gene trees embedded in a shared species tree along with the effective population size of both extant and ancestral species. The inference is made possible by multilocus data from multiple individuals per species. Using a multiindividual data set and a series of simulations of rapid species radiations, we demonstrate the efficacy of our new method. These simulations give some insight into the behavior of the method as a function of sampled individuals, sampled loci, and sequence length. Finally, we compare our new method to both an existing method (BEST 2.2) with similar goals and the supermatrix (concatenation) method. We demonstrate that both BEST and our method have much better estimation accuracy for species tree topology than concatenation, and our method outperforms BEST in divergence time and population size estimation.

2,401 citations


Cites methods from "The conditioned reconstructed proce..."

  • ...For the divergence times, we use the reconstructed birth–death process (Gernhard 2008), parameterized by lineage birth and death rates λ and μ:...

    [...]

  • ...For the divergence times, we use the reconstructed birth–death process (Gernhard 2008), parameterized by lineage birth and death rates λ and µ: fBD(S ) = ns!λ ns−1(λ− µ) e −(λ−µ)x1 λ− µ e−(λ−µ)x1 × ns−1∏ i=1 (λ− µ)2 e−(λ−µ)xi λ− µ e−(λ−µ)xi , (5) where ns is the number of species and x1, x2, . . .…...

    [...]

  • ...One hundred species trees were simulated using a birth– death process with λ = 1 and µ = 0.2 (Gernhard 2008)....

    [...]

  • ...…on a popular existing software package for Bayesian phylogenetics, and as a result it can exploit existing models for gene trees such as relaxed molecular clocks (Drummond et al. 2006) and previously implemented priors for species tree such as the reconstructed birth–death prior (Gernhard 2008)....

    [...]

  • ...2006) and previously implemented priors for species tree such as the reconstructed birth–death prior (Gernhard 2008)....

    [...]

Journal ArticleDOI
TL;DR: The fossilized birth–death process is introduced—a fossil calibration method that unifies extinct and extant species with a single macroevolutionary model, eliminating the need for ad hoc calibration priors and yielding more accurate node age estimates while providing a coherent measure of statistical uncertainty.
Abstract: Time-calibrated species phylogenies are critical for addressing a wide range of questions in evolutionary biology, such as those that elucidate historical biogeography or uncover patterns of coevolution and diversification. Because molecular sequence data are not informative on absolute time, external data—most commonly, fossil age estimates—are required to calibrate estimates of species divergence dates. For Bayesian divergence time methods, the common practice for calibration using fossil information involves placing arbitrarily chosen parametric distributions on internal nodes, often disregarding most of the information in the fossil record. We introduce the “fossilized birth–death” (FBD) process—a model for calibrating divergence time estimates in a Bayesian framework, explicitly acknowledging that extant species and fossils are part of the same macroevolutionary process. Under this model, absolute node age estimates are calibrated by a single diversification model and arbitrary calibration densities are not necessary. Moreover, the FBD model allows for inclusion of all available fossils. We performed analyses of simulated data and show that node age estimation under the FBD model results in robust and accurate estimates of species divergence times with realistic measures of statistical uncertainty, overcoming major limitations of standard divergence time estimation methods. We used this model to estimate the speciation times for a dataset composed of all living bears, indicating that the genus Ursus diversified in the Late Miocene to Middle Pliocene.

588 citations


Cites methods from "The conditioned reconstructed proce..."

  • ...For all analyses using calibration priors (fixed-scaled, fixed-true, and hyperprior), we assumed a constant-rate reconstructed birth-death process [39, 40] as a prior on speciation times....

    [...]

Journal ArticleDOI
TL;DR: A comparison with recent implementations of path sampling and stepping-stone sampling shows reassuringly that MAP identification and its Bayes factor provide similar performance to PS and SS and that these approaches considerably outperform HME, sHME, and AICM in selecting the correct underlying clock model.
Abstract: Recent implementations of path sampling (PS) and stepping-stone sampling (SS) have been shown to outperform the harmonic mean estimator (HME) and a posterior simulation-based analog of Akaike’s information criterion through Markov chain Monte Carlo (AICM), in Bayesian model selection of demographic and molecular clock models. Almost simultaneously, a Bayesian model averaging approach was developed that avoids conditioning on a single model but averages over a set of relaxed clock models. This approach returns estimates of the posterior probability of each clock model through which one can estimate the Bayes factor in favor of the maximum a posteriori (MAP) clock model; however, this Bayes factor estimate may suffer when the posterior probability of the MAP model approaches 1. Here, we compare these two recent developments with the HME, stabilized/smoothed HME (sHME), and AICM, using both synthetic and empirical data. Our comparison shows reassuringly that MAP identification and its Bayes factor provide similar performance to PS and SS and that these approaches considerably outperform HME, sHME, and AICM in selecting the correct underlying clock model. We also illustrate the importance of using proper priors on a large set of empirical data sets.

556 citations

Journal ArticleDOI
TL;DR: The authors' large time-calibrated phylogeny provides a significant step towards completing a fully sampled species-level phylogeny for Solanaceae, and provides age estimates for the whole family, and is one of the best sampled angiosperm family phylogenies both in terms of taxon sampling and resolution published thus far.
Abstract: The Solanaceae is a plant family of great economic importance. Despite a wealth of phylogenetic work on individual clades and a deep knowledge of particular cultivated species such as tomato and potato, a robust evolutionary framework with a dated molecular phylogeny for the family is still lacking. Here we investigate molecular divergence times for Solanaceae using a densely-sampled species-level phylogeny. We also review the fossil record of the family to derive robust calibration points, and estimate a chronogram using an uncorrelated relaxed molecular clock. Our densely-sampled phylogeny shows strong support for all previously identified clades of Solanaceae and strongly supported relationships between the major clades, particularly within Solanum. The Tomato clade is shown to be sister to section Petota, and the Regmandra clade is the first branching member of the Potato clade. The minimum age estimates for major splits within the family provided here correspond well with results from previous studies, indicating splits between tomato & potato around 8 Million years ago (Ma) with a 95% highest posterior density (HPD) 7–10 Ma, Solanum & Capsicum c. 19 Ma (95% HPD 17–21), and Solanum & Nicotiana c. 24 Ma (95% HPD 23–26). Our large time-calibrated phylogeny provides a significant step towards completing a fully sampled species-level phylogeny for Solanaceae, and provides age estimates for the whole family. The chronogram now includes 40% of known species and all but two monotypic genera, and is one of the best sampled angiosperm family phylogenies both in terms of taxon sampling and resolution published thus far. The increased resolution in the chronogram combined with the large increase in species sampling will provide much needed data for the examination of many biological questions using Solanaceae as a model system.

421 citations


Cites methods from "The conditioned reconstructed proce..."

  • ...A Birth-Death tree prior was used, which accounts for both speciation and extinction [110]....

    [...]

References
More filters
Journal ArticleDOI

16,450 citations


"The conditioned reconstructed proce..." refers methods in this paper

  • ...We investigate the constant rate birth-death process (Feller, 1968; Kendall, 1948) as it is probably the most popular homogeneous model....

    [...]

Journal ArticleDOI
TL;DR: The following work is founded on that conception of evolution, the most recent and precise formulation of which is due to Dr. J. C. Willis, and represents an attempt to develop the quantitative consequences of the conception.
Abstract: The following work is founded on that conception of evolution, the most recent and precise formulation of which is due to Dr J C Willis, and represents an attempt to develop the quantitative consequences of the conception By his statistical studies of distribution Dr Willis was led to two conclusions:— (1) Species occupying large areas are, on the whole , older than those occupying small areas, provided that allied forms are compared

1,564 citations

Journal ArticleDOI
TL;DR: The results of the method are found to be insensitive to changes in the rate parameter of the branching process, and the best trees estimated by the new method are the same as those from the maximum likelihood analysis of separate topologies, but the posterior probabilities are quite different from the bootstrap proportions.
Abstract: A new method is presented for inferring evolutionary trees using nucleotide sequence data. The birth-death process is used as a model of speciation and extinction to specify the prior distribution of phylogenies and branching times. Nucleotide substitution is modeled by a continuous-time Markov process. Parameters of the branching model and the substitution model are estimated by maximum likelihood. The posterior probabilities of different phylogenies are calculated and the phylogeny with the highest posterior probability is chosen as the best estimate of the evolutionary relationship among species. We refer to this as the maximum posterior probability (MAP) tree. The posterior probability provides a natural measure of the reliability of the estimated phylogeny. Two example data sets are analyzed to infer the phylogenetic relationship of human, chimpanzee, gorilla, and orangutan. The best trees estimated by the new method are the same as those from the maximum likelihood analysis of separate topologies, but the posterior probabilities are quite different from the bootstrap proportions. The results of the method are found to be insensitive to changes in the rate parameter of the branching process.

1,508 citations


"The conditioned reconstructed proce..." refers background in this paper

  • ...For the general birth-death process, the joint probability for the shape and all speciation times has been established in Rannala and Yang (1996); the joint probability for the speciation times disregarding the shape has been established in Yang and Rannala (1997)....

    [...]

  • ...In Rannala and Yang (1996), joint probabilities for x1, . . . , xn−1 are given....

    [...]

  • ...Rannala and Yang (1996) ; the joint probability for the speciation times disregarding...

    [...]

Journal ArticleDOI
TL;DR: An improved Bayesian method is presented for estimating phylogenetic trees using DNA sequence data, and the posterior probabilities of phylogenies are used to estimate the maximum posterior probability (MAP) tree, which has a probability of approximately 95%.
Abstract: An improved Bayesian method is presented for estimating phylogenetic trees using DNA sequence data. The birth-death process with species sampling is used to specify the prior distribution of phylogenies and ancestral speciation times, and the posterior probabilities of phylogenies are used to estimate the maximum posterior probability (MAP) tree. Monte Carlo integration is used to integrate over the ancestral speciation times for particular trees. A Markov Chain Monte Carlo method is used to generate the set of trees with the highest posterior probabilities. Methods are described for an empirical Bayesian analysis, in which estimates of the speciation and extinction rates are used in calculating the posterior probabilities, and a hierarchical Bayesian analysis, in which these parameters are removed from the model by an additional integration. The Markov Chain Monte Carlo method avoids the requirement of our earlier method for calculating MAP trees to sum over all possible topologies (which limited the number of taxa in an analysis to about five). The methods are applied to analyze DNA sequences for nine species of primates, and the MAP tree, which is identical to a maximum-likelihood estimate of topology, has a probability of approximately 95%.

1,230 citations


"The conditioned reconstructed proce..." refers background or methods in this paper

  • ...From Yang and Rannala (1997), Equation (3), we obtain the density g of the ordered speciation times, x2 > x3 > . . . > xn−1, given n and x1 = t1, g(x2, x3, . . . , xn|t1 = t, n) = (n − 2)! n−1 ∏ i=2 µ p1(xi) p0(t) ....

    [...]

  • ...For the general birth-death process, the joint probability for the shape and all speciation times has been established in Rannala and Yang (1996); the joint probability for the speciation times disregarding the shape has been established in Yang and Rannala (1997)....

    [...]

  • ...This joint density is used in Yang and Rannala (1997) in order to infer reconstructed trees with Bayesian methods....

    [...]