Open AccessJournal ArticleDOI

Inference of Ancient Whole-Genome Duplications and the Evolution of Gene Duplication and Loss Rates

- 01 Jul 2019 -

- Vol. 36, Iss: 7, pp 1384-1404

Chats0

TLDR

A full probabilistic approach for phylogenomic reconciliation-based WGD inference is developed, accounting for both gene tree and reconciliation uncertainty using a method based on the principle of amalgamated likelihood estimation.

Abstract:

Gene tree-species tree reconciliation methods have been employed for studying ancient whole-genome duplication (WGD) events across the eukaryotic tree of life. Most approaches have relied on using maximum likelihood trees and the maximum parsimony reconciliation thereof to count duplication events on specific branches of interest in a reference species tree. Such approaches do not account for uncertainty in the gene tree and reconciliation, or do so only heuristically. The effects of these simplifications on the inference of ancient WGDs are unclear. In particular, the effects of variation in gene duplication and loss rates across the species tree have not been considered. Here, we developed a full probabilistic approach for phylogenomic reconciliation-based WGD inference, accounting for both gene tree and reconciliation uncertainty using a method based on the principle of amalgamated likelihood estimation. The model and methods are implemented in a maximum likelihood and Bayesian setting and account for variation of duplication and loss rates across the species tree, using methods inspired by phylogenetic divergence time estimation. We applied our newly developed framework to ancient WGDs in land plants and investigated the effects of duplication and loss rate variation on reconciliation and gene count based assessment of these earlier proposed WGDs.

Content maybe subject to copyright Report

This is a post-peer-review, pre-copyedit version of an article published in

Molecular Biology & Evolution. The final authenticated version is available

online at: https://doi.org/10.1093/molbev/msz088

Inference of ancient whole genome duplications and the

evolution of the gene duplication and loss rate

Arthur Zwaenepoel

1, 2, 3, ∗

Yves Van de Peer

1, 2, 3, 4, ∗

1. Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052 Ghent, Belgium

2. Center for Plant Systems Biology, VIB, 9052 Ghent, Belgium

3. Bioinformatics Institute Ghent, 9052 Ghent, Belgium

4. Department of Biochemistry, Genetics and Microbiology, University of Pretoria, Pretoria 0028, South Africa

* Corresponding author: arzwa@psb.vib-ugent.be, yvpee@psb.vib-ugent.be

Abstract

Gene tree - species tree reconciliation methods have been employed for studying ancient whole genome

duplication (WGD) events across the eukaryotic tree of life. Most approaches have relied on using

maximum likelihood trees and the maximum parsimony reconciliation thereof to count duplication events

on speciﬁc branches of interest in a reference species tree. Such approaches do not account for uncertainty

in the gene tree and reconciliation, or do so only heuristically. The effects of these simpliﬁcations on the

inference of ancient WGDs are unclear. In particular the effects of variation in gene duplication and loss

rates across the species tree have not been considered. Here, we developed a full probabilistic approach

for phylogenomic reconciliation based WGD inference, accounting for both gene tree and reconciliation

uncertainty using a method based on the principle of amalgamated likelihood estimation. The model and

methods are implemented in a maximum likelihood and Bayesian setting and account for variation of

duplication and loss rate across the species tree, using methods inspired by phylogenetic divergence time

estimation. We applied our newly developed framework to ancient WGDs in land plants and investigate

the effects of duplication and loss rate variation on reconciliation and gene count based assessment of

these earlier proposed WGDs.

Introduction

In the past decades, examination of genomic data has revealed many signatures of ancient whole genome

duplications (WGDs) across the eukaryotic tree of life (reviewed in Van de Peer et al. 2017). These ﬁndings

have initiated an active research ﬁeld concerned with the evolutionary importance of polyploidy, especially

in plants where polyploidization seems to have been rampant. The apparent widespread incidence of

polyploidy in the phylogeny of land plants is often cited as strong evidence for the evolutionary importance

of polyploidy. But just how widespread is ancient polyploidy in land plants? While there has been strong

evidence for many (relatively recent) WGD events, inference of these events, especially very ancient ones,

remains highly challenging. Evidently, the signal of an ancient WGD event erodes through time, and all

current methods suffer a strong loss of power the more ancient the hypothesized event. As a result, some of

the claimed ancient WGD events have been contested, such as the two hypothesized events in land plants

(Jiao et al. 2011; Ruprecht et al. 2017), the 2R hypothesis in vertebrates (Abbasi 2010; Van de Peer et al.

2010; Smith and Keinath 2015), or ancient WGD events in hexapods (Zheng Li et al. 2018; Li et al. 2019;

Nakatani and McLysaght 2019).

Methods for unveiling ancient WGDs can be classiﬁed crudely into three main approaches. The ﬁrst

approach takes advantage of the expectation that a WGD leaves a signature in the distribution of duplicate

divergence times. One commonly estimates the synonymous distance (

, which serves as a proxy

for the divergence time) for all paralogous pairs in a genome and visualizes the resulting distribution. In

such a

distribution, ancient WGDs will be visible as peaks against the background exponential decay

distribution from small scale duplication (SSD) events (Lynch and Conery 2000; Blanc and Wolfe 2004).

There are a couple of pitfalls with this approach, which have been discussed in detail (Vanneste et al. 2013;

Tiley et al. 2018; Zwaenepoel et al. 2018). Importantly, these distributions are not suitable for inferring very

ancient events due to saturation of the synonymous distance. The second main approach is based on the

expectation that a WGD should lead to large co-linear blocks in the genome. Such co-linearity or synteny

based information has often been considered as the strongest evidence for ancient WGDs. In particular the

combination of syntenic and

information has been vital for the discrimination of WGD-derived and

SSD-derived paralogs. The major drawback is however that high quality genome assemblies are required,

and these are still non-trivial to obtain. Nevertheless, even with high quality assemblies, interpretation

of syntenic signal for very ancient putative WGDs is not always unequivocal. In particular the temporal

(either relative or absolute) framing of a WGD event based on syntenic data is complicated and requires

high quality genomes of multiple related lineages. The last set of methods are united by their usage of

phylogenetic information in individual gene families. Both methods using gene counts and gene tree

topologies have been used, either in a model-based or heuristic framework. Especially heuristic gene tree -

species tree reconciliation methods have been widely employed to unveil evidence for ancient WGDs (e.g.

Jiao et al. 2011; Li et al. 2015; Li et al. 2016; McKain et al. 2016; Thomas et al. 2017; Zheng Li et al. 2018;

Yang et al. 2018), often in combination with other sources of evidence. Here, we take heuristic to mean that

the gene tree is inferred independently from its reconciliation. In these approaches, a larger than expected

number of duplication events inferred for a particular branch of the species tree is regarded as indicative

for an ancient WGD. So far, most of the support for very ancient WGD events has been obtained using gene

tree reconciliation approaches. These approaches naturally provide a temporal view on the hypothesized

event as they assume a known, either dated or undated, species tree.

There are however several potential pitfalls when employing heuristic gene tree - species tree reconciliation

approaches. The ﬁrst, and probably most obvious, is the need for an arbitrary cut-off on the number of

duplications before some species tree branch is associated with a WGD. This is especially troubling for

putative WGD events on tip branches, as large numbers of SSD events can easily be confused with a WGD

event (Zwaenepoel et al. 2018). The number of duplication events inferred for speciﬁc branches can also be

very sensitive to taxon sampling, and some signal for a putative WGD event on a particular branch may be

absent or weakened when the branch is subdivided by adding more taxa to the analysis. Perhaps more

important are the problems with the methodology per se. In most cases, reconciliation approaches rely on a

single gene tree topology for every gene family, inferred by maximum likelihood methods, and a single

reconciliation thereof, typically employing a least common ancestor (LCA) approach which minimizes the

total number of duplication and loss events (Zmasek and Eddy 2001). A gene tree topology is however

a probabilistic model of the phylogeny of that gene family, and for a single gene family there may be a

considerable number of different topologies with near equal support (Salter 2001). Similarly, a reconciliation

of a gene tree to a species tree can also be considered probabilistically, and, although less well studied,

relying on the single most parsimonious reconciliation may be similarly problematic. In particular the

joint effects of these two issues may be of crucial importance, as the reconciliation of uncertain topologies

by means of LCA reconciliation will result in conﬂicting views on the evolution of the gene family and

systematic biases (see e.g. Hahn 2007). To overcome some of these problems, researchers have typically

ﬁltered out nodes with low bootstrap support, evaluated some type of duplication consistency scores or

have used heuristic branch swapping methods in the reconciliation step (as implemented for example in

Notung (Chen et al. 2000)).

Probabilistic methods for WGD inference in a phylogenetic context, both employing gene trees and gene

counts, were recently proposed by Rabier et al. (2014). In a gene count based method (Hahn et al. 2005),

one does not employ topological information but effectively integrates over all possible gene trees that

could have generated the observed counts at the species tree leaves. Such methods therefore naturally

handle uncertainty in the gene tree, albeit in a somewhat crude fashion. The observed gene family is

modeled as the outcome of a birth-death Markov chain, allowing likelihood based inference of duplication

and loss rates, ancestral gene counts and, in the framework of Rabier et al. (2014), WGD retention rates.

While they have yielded great insights in genome evolution, gene count based methods do not consider

all of the information in genomic data sets, as sequence data for the genes provides information about

their phylogeny. Therefore, a gene tree - species tree reconciliation approach that estimates parameters of a

model of gene family evolution is expected to be more accurate. We expect this in particular when models

are employed that allow variation in the duplication and loss rate across the species tree. Additionally,

reconciliation based methods have the obvious advantage of providing the researcher with an actual

reconciled gene tree, i.e. a tree with nodes labeled as either a speciation, duplication or loss node. In our

case, this labeling should also include whether a particular duplication node is inferred to be a WGD or

SSD-derived duplication. This therefore also provides a model-based framework for selecting gene families

for Bayesian molecular dating analyses to estimate absolute ages of ancient WGDs (as in e.g. Vanneste

et al. 2014; and Clark and Donoghue 2017) or to study functional biases in gene retention patterns (e.g.

Li et al. 2016). Contrary to expectations, Rabier et al. (2014) reported a lower power to detect WGDs for

their reconciliation approach compared to their gene count approach, and they recommend usage of the

gene count method for testing WGD hypotheses in a phylogenetic context. However, they attributed these

observations mainly to computational limitations in their reconciliation method.

Here we introduce a novel method for WGD inference using gene trees designed to overcome the issues

of the reconciliation method in Rabier et al. (2014). We draw inspiration from the growing body of

literature on gene tree inference under a known species tree (reviewed in Szöll˝osi et al. 2015) and develop

an approach which allows to assess the statistical support for WGD hypotheses from alignments of

multi-copy gene families. Our approach is based on the principle of amalgamated likelihood estimation

(ALE) for probabilistic gene tree - species tree reconciliation, ﬁrst proposed and developed by Szöll˝osi,

Rosikiewicz, et al. (2013). We develop an ALE approach, called Whale, employing the probabilistic model

of Rabier et al. (2014) to estimate duplication, loss and WGD retention rates and test WGD hypotheses in

a phylogenetic context. By using the amalgamation principle with a probabilistic model of gene family

evolution in the presence of WGDs, Whale jointly accounts for uncertainty in the gene tree topology and

reconciliation. As in Szöll˝osi, Rosikiewicz, et al. (2013), our approach is fully probabilistic, and does not

employ parsimony-guided reconciliation as in Rabier et al. (2014). We employed the Whale method both

in a maximum likelihood and Bayesian setting and reveal the crucial importance of considering duplication

and loss rate heterogeneity across the species tree when assessing WGD hypotheses. To accommodate

this, we implemented models of duplication and loss rate evolution inspired by molecular divergence time

estimation. Revisiting some of the ancient WGDs reported in the land plant phylogeny, we evaluated our

new approaches and discuss caveats when assessing WGDs using gene tree reconciliation.

New approaches

•

We implemented algorithms to compute the joint gene tree - reconciliation likelihood under the

probabilistic model of Rabier et al. (2014) using the principle of amalgamation.

•

Through analysis of simulated and empirical data sets we show that likelihood based inference of

whole genome duplications (WGDs) sensu Rabier et al. (2014) is very sensitive to rate variation across

branches of the species tree. This also has implications for simulation-based assessment of putative

‘bursts’ in the number of duplications in data sets of reconciled gene trees.

•

We implemented models that can accommodate variation in duplication and loss rates inspired by

Bayesian divergence time estimation and employ these to study the evolution of the duplication and

loss rate together with putative ancient WGDs.

Results

Validation using simulated data

The ALE approach and the dynamic programming algorithm for probabilistic reconciliation inference

have been extensively validated using simulations (Szöll˝osi et al. 2012; Szöll˝osi, Rosikiewicz, et al. 2013;

Szöll˝osi, Tannier, et al. 2013). However, our adoption of these methods is considerably different from

these studies, which focused mainly on horizontal gene transfer and improved gene tree inference under

a known species tree. We introduce the WGD model as well as the prior distribution on the number of

lineages at the root ﬁrst developed by Rabier et al. (2014) in the ALE context, and estimate duplication

and loss rates not family-wise as in ALE (Szöll˝osi, Rosikiewicz, et al. 2013), but across families similar to

Rasmussen and Kellis (2011) and Rabier et al. (2014) (see methods). We veriﬁed the correctness of our

new approach and its implementation using simulated data. Importantly, while the ALE approach takes

gene tree uncertainty into consideration by employing samples from the posterior distribution for gene

tree topologies, we simulated only a single unrooted gene tree topology per family, and do not consider

gene tree uncertainty here. This was previously done already using extensive simulations in Szöll˝osi,

Rosikiewicz, et al. (2013), where the basic merits of an ALE approach were shown, and we do not revisit

these highly computationally intensive simulation studies here. Note that all reported rate estimates are

dependent on the time scale used in the species tree, which in our case is in units of 100 million years.

Numerical optimization of the likelihood under the basic constant-rates duplication-loss (DL) model

(i.e. using a single duplication (

) and loss (

) rate for the full species tree) with a geometric prior

distribution on the number of lineages at the root provides accurate maximum likelihood estimates (MLEs)

for the simulated duplication and loss rates (Figure S1). In general, rates are estimated more accurately

when the duplication and loss rate are similar whereas slight biases are observed when the rates are quite

different. If the loss rate is higher than the duplication rate, both rates tend to be underestimated. If the

duplication rate is higher than the loss rate, the duplication rate seems to be slightly overestimated. Not

unexpected, our simulations suggest that the variance of the MLEs increases with the rate. Estimates of the

duplication (

) and loss (

) rate are quite robust to the parametrization of the geometric prior distribution

on the number of genes at the root (Figure S2). As expected, assuming a very low prior probability on

multiple genes at the root (

1/η ≈ 1

) leads to overestimation of

and underestimation of

. Conversely,

assigning a strong prior on multiple ancestral lineages (

1/η  1

) leads to an underestimation of

and

overestimation of

to compensate for unobserved lineages assumed at the root. These observations hold

HTML Viewer

Figures

Table 1: Putative WGDs, their estimated dates and retention rate Maximum-likelihood estimates under different local-clock models. The constant rates model has one duplication and one loss rate for the whole tree. Local clock 1 estimates a duplication and loss rate for six rate classes (core angiosperms, Amborella, gymnosperms, Sellaginalla, bryophytes and all other branches). Local clock 2 estimates rates for the same rate classes as local clock 1, but with additional different rate classes for the stem branches of angiosperms, spermatophytes and tracheophytes.

Figure 1: MLEs for the retention rate for different simulated WGD scenario’s and duplication and loss rates. Simulations of 10 times 500 gene families were done for a 10-taxon tree with constant duplication and loss rates across the tree. The species tree topology used is shown in the upper left corner of each set of simulations, and the star indicates the simulated WGD event.

Figure 5: Bayesian inference of duplication, loss and retention rates under a geometric Brownian motion (GBM) (autocorrelated rates) prior for the nine-taxon angiosperm analysis. Inference is based on a random subset of 1000 gene families. (A) Posterior mean duplication (left) and loss (right) rates estimated under a GBM prior with ν = 0.1 colored on the species tree lineages. Black bars indicate WGDs significantly different from zero and are annotated with the posterior mean for the relevant retention rate. The other bars indicate WGDs with a retention rate estimate not significantly different from zero. (B) Kernel density estimates for the marginal posterior distributions for the retention rates of all twelve WGDs. The different distributions show posteriors for different priors on the duplication and loss rate and different values of ν.

Figure 4: Bayesian posterior inference with Whale for simulated data sets. We simulated three data sets of 1000 gene families with duplication and loss rates sampled from the GBM prior with ν = 0.10 (top row), ν = 0.25 (middle row) and ν = 0.50 (bottom row). We used a log-normal distribution log(N (0.1, 0.1)) for the duplication rate at the root (λτ) and log-normal distribution log(N (0.15, 0.1)) for the loss rate at the root (µτ). We set the minimal duplication and loss rate to 0.05 and 0.1 respectively. Retention rates were sampled from a Beta(2, 4) distribution and the geometric prior probability for the number of lineage at the root η was sampled from a Beta(10, 1) distribution. We then performed Bayesian inference under the GBM prior with ν = 0.1, λτ , µτ ∼ log(N (0.15, 0.5)), q ∼ Beta(1, 1) and η ∼ Beta(4, 2). The true (simulated) values are marked by dots, whereas the boxplots shows the sample from the posterior distribution obtained by MCMC with Whale.

Figure 3: The nine-taxon species tree and associated WGDs used in the Whale analyses using maximum likelihood estimation. More information on the WGDs marked along this species tree can be found in Table 1.

Figure 8: Node-averaged whole paranome KS distributions for Ginkgo biloba, Picea abies, A. trichopoda and Pinus taeda, with the KS distributions for those duplications reconciled to the hypothetical gymnosperm and seed plant WGD overlaid where appropriate. Note that the scale on the y-axis is a probability density, not a frequency or absolute number of duplications.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

A rooted phylogeny resolves early bacterial evolution.

Gareth A. Coleman, +7 more

- 07 May 2021 -

Science

TL;DR: A rooted bacterial tree is necessary to understand early evolution, but the position of the root is contested as discussed by the authors, which suggests that LBCA was a free-living flagellated, rod-shaped double-membraned organism.

...read moreread less

Journal ArticleDOI

The Origin of Land Plants Is Rooted in Two Bursts of Genomic Novelty.

Alexander M. C. Bowles, +4 more

- 03 Feb 2020 -

Current Biology

TL;DR: The findings highlight the biological processes that evolved with the origin of land plants and emphasize the importance of conserved gene novelties in plant diversification.

...read moreread less

Journal ArticleDOI

Asterid Phylogenomics/Phylotranscriptomics Uncover Morphological Evolutionary Histories and Support Phylogenetic Placement for Numerous Whole-Genome Duplications

Caifei Zhang, +11 more

- 01 Nov 2020 -

Molecular Biology and Evolution

TL;DR: An Aptian (Early Cretaceous) origin of asterids and the origin of all orders before the K-Pg boundary is supported and Ancestral state reconstruction at the family level suggests that the asterid ancestor was a woody terrestrial plant with simple leaves, bisexual and actinomorphic flowers with free petals and free anthers.

...read moreread less

Journal ArticleDOI

Distinct Expression and Methylation Patterns for Genes with Different Fates following a Single Whole-Genome Duplication in Flowering Plants

Tao Shi, +10 more

- 01 Aug 2020 -

Molecular Biology and Evolution

TL;DR: After a WGD genes that returned to single copies show the highest levels and breadth of expression, gene body methylation, and intron numbers, whereas the long-retained duplicates exhibit the highest degrees of protein–protein interactions and protein lengths and the lowest methylation in gene flanking regions.

...read moreread less

Posted ContentDOI

A rooted phylogeny resolves early bacterial evolution

Gareth A. Coleman, +8 more

- 15 Jul 2020 -

bioRxiv

TL;DR: This work predicts that the last bacterial common ancestor was a free-living flagellated, rod-shaped cell featuring a double membrane with a lipopolysaccharide outer layer, a Type III CRISPR-Cas system, Type IV pili, and the ability to sense and respond via chemotaxis.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

MrBayes 3.2: Efficient Bayesian Phylogenetic Inference and Model Choice across a Large Model Space

Fredrik Ronquist, +9 more

- 01 May 2012 -

Systematic Biology

TL;DR: The new version provides convergence diagnostics and allows multiple analyses to be run in parallel with convergence progress monitored on the fly, and provides more output options than previously, including samples of ancestral states, site rates, site dN/dS rations, branch rates, and node dates.

...read moreread less

Journal ArticleDOI

The evolutionary fate and consequences of duplicate genes

Michael Lynch, +1 more

- 10 Nov 2000 -

Science

TL;DR: Although duplicate genes may only rarely evolve new functions, the stochastic silencing of such genes may play a significant role in the passive origin of new species.

...read moreread less

Journal ArticleDOI

The genome of black cottonwood, Populus trichocarpa (Torr. & Gray)

Gerald A. Tuskan, +115 more

- 15 Sep 2006 -

Science

TL;DR: The draft genome of the black cottonwood tree, Populus trichocarpa, has been reported in this paper, with more than 45,000 putative protein-coding genes identified.

...read moreread less

Journal ArticleDOI

Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods

Ziheng Yang

- 01 Sep 1994 -

Journal of Molecular Evolution

TL;DR: Two approximate methods are proposed for maximum likelihood phylogenetic estimation, which allow variable rates of substitution across nucleotide sites, and one of them uses several categories of rates to approximate the gamma distribution, with equal probability for each category.

...read moreread less

BookDOI

MCMC using Hamiltonian dynamics

Radford M. Neal

- 09 Jun 2012 -

arXiv: Computation

TL;DR: In this paper, the authors discuss theoretical and practical aspects of Hamiltonian Monte Carlo, and present some of its variations, including using windows of states for deciding on acceptance or rejection, computing trajectories using fast approximations, tempering during the course of a trajectory to handle isolated modes, and short-cut methods that prevent useless trajectories from taking much computation time.

...read moreread less

Collapse

IQ-TREE: A fast and effective stochastic algorithm for estimating maximum likelihood phylogenies

Lam Tung Nguyen, +3 more

- 01 Jan 2015 -

Molecular Biology and Evolution

RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies.

Alexandros Stamatakis

- 01 May 2014 -

Bioinformatics

MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability

Kazutaka Katoh, +1 more

- 01 Apr 2013 -

Molecular Biology and Evolution

PAML 4: Phylogenetic Analysis by Maximum Likelihood

Ziheng Yang

- 01 Aug 2007 -

Molecular Biology and Evolution

Frequently Asked Questions (9)

Q1. What are the contributions in "Inference of ancient whole genome duplications and the evolution of the gene duplication and loss rate" ?

The authors applied their newly developed framework to ancient WGDs in land plants and investigate the effects of duplication and loss rate variation on reconciliation and gene count based assessment of these earlier proposed WGDs.

Q2. What have the authors stated for future works in "Inference of ancient whole genome duplications and the evolution of the gene duplication and loss rate" ?

Accounting for these complexities in a probabilistic framework is another challenge for future research, and would require more sophisticated models that explicitly model the polyploid phase of the lineage under consideration. In particular, their model does not account for incomplete lineage sorting, and incorporating the multi-species coalescent in their framework to account for the possibility of deep coalescence would be an interesting future development. The authors believe these might be fruitful further research directions. In particular genome-scale molecular dating would be a promising avenue, where the temporal signal from both the gene family and sequence evolution process could be employed using relaxed clock priors on both duplication, loss and substitution rates to date species divergence times and WGDs in an integrative fashion.

Q3. What are the main methods used to uncover evidence for ancient WGDs?

Especially heuristic gene tree - species tree reconciliation methods have been widely employed to unveil evidence for ancient WGDs (e.g. Jiao et al.

Q4. What is the merit of the amalgamation approach?

The authors note that, besides being very efficient, the amalgamation approach has the merit that it only requires a sample from the posterior distribution over gene tree topologies.

Q5. What is the reason for the conflicting signals for the putative gymnosperm WGD?

A possible explanation for the conflicting signals for the putative gymnosperm WGD in the nine-taxon and five-taxon analyses may be that it is an artifact due to a strong drop in duplication rate in the Ginkgo lineage compared to the gymnosperm stem and the lineage leading to P. abies.

Q6. What is the effect of a DL model on the number of duplicates?

To assess whether a particular number of duplicates corresponds to a significant increase in the number of duplications (possibly stemming from a WGD) they simulated gene tree topologies under the species tree of interest using a constant-rates DL model, with four sets of duplication and loss rates, which are estimated using gene count data.

Q7. What is the effect of assuming a very low prior probability on multiple genes at the root?

As expected, assuming a very low prior probability on multiple genes at the root (1/η ≈ 1) leads to overestimation of λ and underestimation of µ.

Q8. What are the special considerations needed when handling the root of S?

The authors next consider the special considerations needed when handling the root of S and the ubiquitous clade Γ.Prior on the number of lineages at the root and conditioningA fundamental issue in probabilistic gene tree - species tree reconciliation is that an explicit or implicit assumption on the number of lineages present at the root of the species tree is required.

Q9. What is the probability of a lineage leaving no descendants at the end of the time slice?

The propagation probability is the probability that a single lineage entering a time slice at time t ‘propagates’ through the time slice to generate exactly one lineage at the end of the time slice (time t′) which has observed descendants at the present (t0 = 0).

Inference of Ancient Whole-Genome Duplications and the Evolution of Gene Duplication and Loss Rates

Figures

Citations

A rooted phylogeny resolves early bacterial evolution.

The Origin of Land Plants Is Rooted in Two Bursts of Genomic Novelty.

Asterid Phylogenomics/Phylotranscriptomics Uncover Morphological Evolutionary Histories and Support Phylogenetic Placement for Numerous Whole-Genome Duplications

Distinct Expression and Methylation Patterns for Genes with Different Fates following a Single Whole-Genome Duplication in Flowering Plants

A rooted phylogeny resolves early bacterial evolution

References

MrBayes 3.2: Efficient Bayesian Phylogenetic Inference and Model Choice across a Large Model Space

The evolutionary fate and consequences of duplicate genes

The genome of black cottonwood, Populus trichocarpa (Torr. & Gray)

Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods

MCMC using Hamiltonian dynamics

Related Papers (5)

Ancestral polyploidy in seed plants and angiosperms

IQ-TREE: A fast and effective stochastic algorithm for estimating maximum likelihood phylogenies

RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies.

MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability

PAML 4: Phylogenetic Analysis by Maximum Likelihood

Frequently Asked Questions (9)

Q1. What are the contributions in "Inference of ancient whole genome duplications and the evolution of the gene duplication and loss rate" ?

Q2. What have the authors stated for future works in "Inference of ancient whole genome duplications and the evolution of the gene duplication and loss rate" ?

Q3. What are the main methods used to uncover evidence for ancient WGDs?

Q4. What is the merit of the amalgamation approach?

Q5. What is the reason for the conflicting signals for the putative gymnosperm WGD?

Q6. What is the effect of a DL model on the number of duplicates?

Q7. What is the effect of assuming a very low prior probability on multiple genes at the root?

Q8. What are the special considerations needed when handling the root of S?

Q9. What is the probability of a lineage leaving no descendants at the end of the time slice?