scispace - formally typeset
Search or ask a question

Showing papers in "Molecular Biology and Evolution in 2015"


Journal ArticleDOI
TL;DR: It is shown that a combination of hill-climbing approaches and a stochastic perturbation method can be time-efficiently implemented and found higher likelihoods between 62.2% and 87.1% of the studied alignments, thus efficiently exploring the tree-space.
Abstract: Large phylogenomics data sets require fast tree inference methods, especially for maximum-likelihood (ML) phylogenies. Fast programs exist, but due to inherent heuristics to find optimal trees, it is not clear whether the best tree is found. Thus, there is need for additional approaches that employ different search strategies to find ML trees and that are at the same time as fast as currently available ML programs. We show that a combination of hill-climbing approaches and a stochastic perturbation method can be time-efficiently implemented. If we allow the same CPU time as RAxML and PhyML, then our software IQ-TREE found higher likelihoods between 62.2% and 87.1% of the studied alignments, thus efficiently exploring the tree-space. If we use the IQ-TREE stopping rule, RAxML and PhyML are faster in 75.7% and 47.1% of the DNA alignments and 42.2% and 100% of the protein alignments, respectively. However, the range of obtaining higher likelihoods with IQ-TREE improves to 73.3-97.1%. IQ-TREE is freely available at http://www.cibiv.at/software/iqtree.

13,668 citations


Journal ArticleDOI
TL;DR: FastME as discussed by the authors is based on balanced minimum evolution, which is the very principle of Neighbor Joining (NJ). FastME improves over NJ by performing topological moves using fast, sophisticated algorithms.
Abstract: FastME provides distance algorithms to infer phylogenies. FastME is based on balanced minimum evolution, which is the very principle of Neighbor Joining (NJ). FastME improves over NJ by performing topological moves using fast, sophisticated algorithms. The first version of FastME only included Nearest Neighbor Interchange. The new 2.0 version also includes Subtree Pruning and Regrafting, while remaining as fast as NJ and providing a number of facilities: Distance estimation for DNA and proteins with various models and options, bootstrapping, and parallel computations. FastME is available using several interfaces: Command-line (to be integrated in pipelines), PHYLIP-like, and a Web server (http://www.atgc-montpellier.fr/fastme/).

886 citations


Journal ArticleDOI
TL;DR: A global timetree of life synthesized from 2,274 studies representing 50,632 species and examined the pattern and rate of diversification as well as the timing of speciation suggests that speciation and diversification are processes dominated by random events and that adaptive change is largely a separate process.
Abstract: Genomic data are rapidly resolving the tree of living species calibrated to time, the timetree of life, which will provide a framework for research in diverse fields of science. Previous analyses of taxonomically restricted timetrees have found a decline in the rate of diversification in many groups of organisms, often attributed to ecological interactions among species. Here, we have synthesized a global timetree of life from 2,274 studies representing 50,632 species and examined the pattern and rate of diversification as well as the timing of speciation. We found that species diversity has been mostly expanding overall and in many smaller groups of species, and that the rate of diversification in eukaryotes has been mostly constant. We also identified, and avoided, potential biases that may have influenced previous analyses of diversification including low levels of taxon sampling, small clade size, and the inclusion of stem branches in clade analyses. We found consistency in time-to-speciation among plants and animals, ∼2 My, as measured by intervals of crown and stem species times. Together, this clock-like change at different levels suggests that speciation and diversification are processes dominated by random events and that adaptive change is largely a separate process.

809 citations


Journal ArticleDOI
TL;DR: Adaptive branch-site random effects likelihood (aBSREL), whose key innovation is variable parametric complexity chosen with an information theoretic criterion, delivers statistical performance matching or exceeding best-in-class existing approaches, while running an order of magnitude faster.
Abstract: Over the past two decades, comparative sequence analysis using codon-substitution models has been honed into a powerful and popular approach for detecting signatures of natural selection from molecular data. A substantial body of work has focused on developing a class of “branch-site” models which permit selective pressures on sequences, quantified by the ω ratio, to vary among both codon sites and individual branches in the phylogeny. We develop and present a method in this class, adaptive branch-site random effects likelihood (aBSREL), whose key innovation is variable parametric complexity chosen with an information theoretic criterion. By applying models of different complexity to different branches in the phylogeny, aBSREL delivers statistical performance matching or exceeding best-in-class existing approaches, while running an order of magnitude faster. Based on simulated data analysis, we offer guidelines for what extent and strength of diversifying positive selection can be detected reliably and suggest that there is a natural limit on the optimal parametric complexity for “branch-site” models. An aBSREL analysis of 8,893 Euteleostomes gene alignments demonstrates that over 80% of branches in typical gene phylogenies can be adequately modeled with a single ω ratio model, that is, current models are unnecessarily complicated. However, there are a relatively small number of key branches, whose identities are derived from the data using a model selection procedure, for which it is essential to accurately model evolutionary complexity.

501 citations


Journal ArticleDOI
TL;DR: It is found that D is unreliable in this situation as it gives inflated values when effective population size is low, causing D outliers to cluster in genomic regions of reduced diversity, and a related statistic f^d is proposed, a modified version of a statistic originally developed to estimate the genome-wide fraction of admixture.
Abstract: Several methods have been proposed to test for introgression across genomes. One method tests for a genome-wide excess of shared derived alleles between taxa using Patterson's D statistic, but does not establish which loci show such an excess or whether the excess is due to introgression or ancestral population structure. Several recent studies have extended the use of D by applying the statistic to small genomic regions, rather than genome-wide. Here, we use simulations and whole-genome data from Heliconius butterflies to investigate the behavior of D in small genomic regions. We find that D is unreliable in this situation as it gives inflated values when effective population size is low, causing D outliers to cluster in genomic regions of reduced diversity. As an alternative, we propose a related statistic ƒ(d), a modified version of a statistic originally developed to estimate the genome-wide fraction of admixture. ƒ(d) is not subject to the same biases as D, and is better at identifying introgressed loci. Finally, we show that both D and ƒ(d) outliers tend to cluster in regions of low absolute divergence (d(XY)), which can confound a recently proposed test for differentiating introgression from shared ancestral variation at individual loci.

489 citations


Journal ArticleDOI
TL;DR: This work presents a general hypothesis testing framework (RELAX) for detecting relaxed selection in a codon-based phylogenetic framework and demonstrates the power of RELAX in a variety of biological scenarios where relaxation of selection has been hypothesized or demonstrated previously.
Abstract: Relaxation of selective strength, manifested as a reduction in the efficiency or intensity of natural selection, can drive evolutionary innovation and presage lineage extinction or loss of function. Mechanisms through which selection can be relaxed range from the removal of an existing selective constraint to a reduction in effective population size. Standard methods for estimating the strength and extent of purifying or positive selection from molecular sequence data are not suitable for detecting relaxed selection, because they lack power and can mistake an increase in the intensity of positive selection for relaxation of both purifying and positive selection. Here, we present a general hypothesis testing framework (RELAX) for detecting relaxed selection in a codon-based phylogenetic framework. Given two subsets of branches in a phylogeny, RELAX can determine whether selective strength was relaxed or intensified in one of these subsets relative to the other. We establish the validity of our test via simulations and show that it can distinguish between increased positive selection and a relaxation of selective strength. We also demonstrate the power of RELAX in a variety of biological scenarios where relaxation of selection has been hypothesized or demonstrated previously. We find that obligate and facultative γ-proteobacteria endosymbionts of insects are under relaxed selection compared with their free-living relatives and obligate endosymbionts are under relaxed selection compared with facultative endosymbionts. Selective strength is also relaxed in asexual Daphnia pulex lineages, compared with sexual lineages. Endogenous, nonfunctional, bornavirus-like elements are found to be under relaxed selection compared with exogenous Borna viruses. Finally, selection on the short-wavelength sensitive, SWS1, opsin genes in echolocating and nonecholocating bats is relaxed only in lineages in which this gene underwent pseudogenization; however, selection on the functional medium/long-wavelength sensitive opsin, M/LWS1, is found to be relaxed in all echolocating bats compared with nonecholocating bats.

444 citations


Journal ArticleDOI
TL;DR: A new approach to identifying gene-wide evidence of episodic positive selection, where the non-synonymous substitution rate is transiently greater than the synonymous rate, and a computationally inexpensive evidence metric for identifying sites subject to episodicpositive selection on any foreground branches.
Abstract: We present BUSTED, a new approach to identifying gene-wide evidence of episodic positive selection, where the non-synonymous substitution rate is transiently greater than the synonymous rate. BUSTED can be used either on an entire phylogeny (without requiring an a priori hypothesis regarding which branches are under positive selection) or on a pre-specified subset of foreground lineages (if a suitable a priori hypothesis is available). Selection is modeled as varying stochastically over branches and sites, and we propose a computationally inexpensive evidence metric for identifying sites subject to episodic positive selection on any foreground branches. We compare BUSTED with existing models on simulated and empirical data. An implementation is available on www.datamonkey.org/busted, with a widget allowing the interactive specification of foreground branches.

387 citations


Journal ArticleDOI
TL;DR: A phylogenetic analysis of the chloroplast genomes of 34 citrus genotypes constitutes the most comprehensive and detailed study to date on the evolution and variability of the genus Citrus.
Abstract: Citrus genus includes some of the most important cultivated fruit trees worldwide. Despite being extensively studied because of its commercial relevance, the origin of cultivated citrus species and the history of its domestication still remain an open question. Here, we present a phylogenetic analysis of the chloroplast genomes of 34 citrus genotypes which constitutes the most comprehensive and detailed study to date on the evolution and variability of the genus Citrus. A statistical model was used to estimate divergence times between the major citrus groups. Additionally, a complete map of the variability across the genome of different citrus species was produced, including single nucleotide variants, heteroplasmic positions, indels (insertions and deletions), and large structural variants. The distribution of all these variants provided further independent support to the phylogeny obtained. An unexpected finding was the high level of heteroplasmy found in several of the analyzed genomes. The use of the complete chloroplast DNA not only paves the way for a better understanding of the phylogenetic relationships within the Citrus genus but also provides original insights into other elusive evolutionary processes, such as chloroplast inheritance, heteroplasmy, and gene selection.

237 citations


Journal ArticleDOI
TL;DR: In this article, the authors sequenced the mitochondrial genomes of 48 termite species and combined them with 18 previously sequenced termite mitochondrial genomes for phylogenetic and molecular clock analyses using multiple fossil calibrations.
Abstract: Termites have colonized many habitats and are among the most abundant animals in tropical ecosystems, which they modify considerably through their actions. The timing of their rise in abundance and of the dispersal events that gave rise to modern termite lineages is not well understood. To shed light on termite origins and diversification, we sequenced the mitochondrial genome of 48 termite species and combined them with 18 previously sequenced termite mitochondrial genomes for phylogenetic and molecular clock analyses using multiple fossil calibrations. The 66 genomes represent most major clades of termites. Unlike previous phylogenetic studies based on fewer molecular data, our phylogenetic tree is fully resolved for the lower termites. The phylogenetic positions of Macrotermitinae and Apicotermitinae are also resolved as the basal groups in the higher termites, but in the crown termitid groups, including Termitinae + Syntermitinae + Nasutitermitinae + Cubitermitinae, the position of some nodes remains uncertain. Our molecular clock tree indicates that the lineages leading to termites and Cryptocercus roaches diverged 170 Ma (153-196 Ma 95% confidence interval [CI]), that modern Termitidae arose 54 Ma (46-66 Ma 95% CI), and that the crown termitid group arose 40 Ma (35-49 Ma 95% CI). This indicates that the distribution of basal termite clades was influenced by the final stages of the breakup of Pangaea. Our inference of ancestral geographic ranges shows that the Termitidae, which includes more than 75% of extant termite species, most likely originated in Africa or Asia, and acquired their pantropical distribution after a series of dispersal and subsequent diversification events.

232 citations


Journal ArticleDOI
TL;DR: Support is found for the hypothesis that sex chromosome systems can readily become trap-like and it is found that adding even a small number of species from understudied clades can greatly enhance hypothesis testing in a model-based phylogenetic framework.
Abstract: Sex chromosomes have evolved many times in animals and studying these replicate evolutionary “experiments” can help broaden our understanding of the general forces driving the origin and evolution of sex chromosomes. However this plan of study has been hindered by the inability to identify the sex chromosome systems in the large number of species with cryptic, homomorphic sex chromosomes. Restriction site-associated DNA sequencing (RAD-seq) is a critical enabling technology that can identify the sex chromosome systems in many species where traditional cytogenetic methods have failed. Using newly generated RAD-seq data from 12 gecko species, along with data from the literature, we reinterpret the evolution of sex-determining systems in lizards and snakes and test the hypothesis that sex chromosomes can routinely act as evolutionary traps. We uncovered between 17 and 25 transitions among gecko sex-determining systems. This is approximately one-half to two-thirds of the total number of transitions observed among all lizards and snakes. We find support for the hypothesis that sex chromosome systems can readily become trap-like and show that adding even a small number of species from understudied clades can greatly enhance hypothesis testing in a model-based phylogenetic framework. RAD-seq will undoubtedly prove useful in eva luating other species for male or female heterogamety, particularly the majority of fish, amphibian, and reptile species that lack visibly heteromorphic sex chromosomes, and will significantly accelerate th ep ace of biological discovery.

222 citations


Journal ArticleDOI
TL;DR: The spontaneous mutation rate in Heliconius melpomene is estimated by genome sequencing of a pair of parents and 30 of their offspring, based on the ratio of number of de novo heterozygotes to the number of callable site-individuals, suggesting a role for natural selection reducing diversity.
Abstract: We estimated the spontaneous mutation rate in Heliconius melpomene by genome sequencing of a pair of parents and 30 of their offspring, based on the ratio of number of de novo heterozygotes to the number of callable site-individuals. We detected nine new mutations, each one affecting a single site in a single offspring. This yields an estimated mutation rate of 2.9 × 10−9 (95% confidence interval, 1.3 × 10−9–5.5 × 10−9), which is similar to recent estimates in Drosophila melanogaster, the only other insect species in which the mutation rate has been directly estimated. We infer that recent effective population size of H. melpomene is about 2 million, a substantially lower value than its census size, suggesting a role for natural selection reducing diversity. We estimate that H. melpomene diverged from its Mullerian comimic H. erato about 6 Ma, a somewhat later date than estimates based on a local molecular clock.

Journal ArticleDOI
TL;DR: A major update of the underlying data is presented almost doubling the number of entities and the broader coverage across the tree of life improves phylogenetic analyses and the capability of ITS2 as a DNA barcode.
Abstract: The internal transcribed spacer 2 (ITS2) is a well-established marker for phylogenetic analyses in eukaryotes. A reliable resource for reference sequences and their secondary structures is the ITS2 database (http://its2.bioapps.biozentrum.uni-wuerzburg.de/). However, the database was last updated in 2011. Here, we present a major update of the underlying data almost doubling the number of entities. This increases the number of taxa represented within all major eukaryotic clades. Moreover, additional data has been added to underrepresented groups and some new groups have been added. The broader coverage across the tree of life improves phylogenetic analyses and the capability of ITS2 as a DNA barcode.

Journal ArticleDOI
TL;DR: This approach demonstrates a new approach for functional phylogenomic analysis in nonmodel species that is based on homolog groups in addition to inferred ortholog groups, and finds that both synonymous and nonsynonymous substitution rates in herbaceous lineages are up to three times as fast as in their woody relatives.
Abstract: Many phylogenomic studies based on transcriptomes have been limited to “single-copy” genes due to methodological challenges in homology and orthology inferences. Only a relatively small number of studies have explored analyses beyond reconstructing species relationships. We sampled 69 transcriptomes in the hyperdiverse plant clade Caryophyllales and 27 outgroups from annotated genomes across eudicots. Using a combined similarity- and phylogenetic tree-based approach, we recovered 10,960 homolog groups, where each was represented by at least eight ingroup taxa. By decomposing these homolog trees, and taking gene duplications into account, we obtained 17,273 ortholog groups, where each was represented by at least ten ingroup taxa. We reconstructed the species phylogeny using a 1,122-gene data set with a gene occupancy of 92.1%. From the homolog trees, we found that both synonymous and nonsynonymous substitution rates in herbaceous lineages are up to three times as fast as in their woody relatives. This is the first time such a pattern has been shown across thousands of nuclear genes with dense taxon sampling. We also pinpointed regions of the Caryophyllales tree that were characterized by relatively high frequencies of gene duplication, including three previously unrecognized whole-genome duplications. By further combining information from homolog tree topology and synonymous distance between paralog pairs, phylogenetic locations for 13 putative genome duplication events were identified. Genes that experienced the greatest gene family expansion were concentrated among those involved in signal transduction and oxidoreduction, including a cytochrome P450 gene that encodes a key enzyme in the betalain synthesis pathway. Our approach demonstrates a new approach for functional phylogenomic analysis in nonmodel species that is based on homolog groups in addition to inferred ortholog groups.

Journal ArticleDOI
TL;DR: The timing of duplications in the gene trees indicates that the papilionoid WGD occurred in the common ancestor of all papilionoids, and there is evidence for several independent WGDs near the base of other major legume lineages, including the Mimosoideae-Cassiinae-Caesalpinieae (MCC), Detarieae, and Cercideae clades.
Abstract: Unresolved questions about evolution of the large and diverse legume family include the timing of polyploidy (whole-genome duplication; WGDs) relative to the origin of the major lineages within the Fabaceae and to the origin of symbiotic nitrogen fixation. Previous work has established that a WGD affects most lineages in the Papilionoideae and occurred sometime after the divergence of the papilionoid and mimosoid clades, but the exact timing has been unknown. The history of WGD has also not been established for legume lineages outside the Papilionoideae. We investigated the presence and timing of WGDs in the legumes by querying thousands of phylogenetic trees constructed from transcriptome and genome data from 20 diverse legumes and 17 outgroup species. The timing of duplications in the gene trees indicates that the papilionoid WGD occurred in the common ancestor of all papilionoids. The earliest diverging lineages of the Papilionoideae include both nodulating taxa, such as the genistoids (e.g., lupin), dalbergioids (e.g., peanut), phaseoloids (e.g., beans), and galegoids (=Hologalegina, e.g., clovers), and clades with nonnodulating taxa including Xanthocercis and Cladrastis (evaluated in this study). We also found evidence for several independent WGDs near the base of other major legume lineages, including the Mimosoideae-Cassiinae-Caesalpinieae (MCC), Detarieae, and Cercideae clades. Nodulation is found in the MCC and papilionoid clades, both of which experienced ancestral WGDs. However, there are numerous nonnodulating lineages in both clades, making it unclear whether the phylogenetic distribution of nodulation is due to independent gains or a single origin followed by multiple losses.

Journal ArticleDOI
TL;DR: The results indicate that retrieving ancient genomes from similarly warm Mediterranean environments such as the Near East is technically feasible and suggest that both Cardial and LBK peoples derived from a common ancient population located in or around the Balkan Peninsula.
Abstract: The spread of farming out of the Balkans and into the rest of Europe followed two distinct routes: An initial expansion represented by the Impressa and Cardial traditions, which followed the Northern Mediterranean coastline; and another expansion represented by the LBK (Linearbandkeramik) tradition, which followed the Danube River into Central Europe. Although genomic data now exist from samples representing the second migration, such data have yet to be successfully generated from the initial Mediterranean migration. To address this, we generated the complete genome of a 7,400-yearold Cardial individual (CB13) from Cova Bonica in Vallirana (Barcelona), as well as partial nuclear data from five others excavated from different sites in Spain and Portugal. CB13 clusters with all previously sequenced early European farmers and modern-day Sardinians. Furthermore, our analyses suggest that both Cardial and LBK peoples derived from a common ancient population located in or around the Balkan Peninsula. The Iberian Cardial genome also carries a discernible hunter‐gatherer genetic signature that likely was not acquired by admixture with local Iberian foragers. Our results indicate that retrieving ancient genomes from similarly warm Mediterranean environments such as the Near East is technically feasible.

Journal ArticleDOI
TL;DR: It is found that the S. eubayanus subgenomes of lager-brewing yeasts have experienced increased rates of evolution since hybridization, and that certain genes involved in metabolism may have been particularly affected.
Abstract: The dramatic phenotypic changes that occur in organisms during domestication leave indelible imprints on their genomes. Although many domesticated plants and animals have been systematically compared with their wild genetic stocks, the molecular and genomic processes underlying fungal domestication have received less attention. Here, we present a nearly complete genome assembly for the recently described yeast species Saccharomyces eubayanus and compare it to the genomes of multiple domesticated alloploid hybrids of S. eubayanus × S. cerevisiae (S. pastorianus syn. S. carlsbergensis), which are used to brew lager-style beers. We find that the S. eubayanus subgenomes of lager-brewing yeasts have experienced increased rates of evolution since hybridization, and that certain genes involved in metabolism may have been particularly affected. Interestingly, the S. eubayanus subgenome underwent an especially strong shift in selection regimes, consistent with more extensive domestication of the S. cerevisiae parent prior to hybridization. In contrast to recent proposals that lager-brewing yeasts were domesticated following a single hybridization event, the radically different neutral site divergences between the subgenomes of the two major lager yeast lineages strongly favor at least two independent origins for the S. cerevisiae × S. eubayanus hybrids that brew lager beers. Our findings demonstrate how this industrially important hybrid has been domesticated along similar evolutionary trajectories on multiple occasions.

Journal ArticleDOI
TL;DR: A shotgun sequencing approach is tested, whereby mitochondrial genomes are assembled from complex ecological mixtures through mitochondrial metagenomics, and it is demonstrated how the approach overcomes many of the taxonomic impediments to the study of biodiversity.
Abstract: In spite of the growth of molecular ecology, systematics and next-generation sequencing, the discovery and analysis of diversity is not currently integrated with building the tree-of-life. Tropical arthropod ecologists are well placed to accelerate this process if all specimens obtained through mass-trapping, many of which will be new species, could be incorporated routinely into phylogeny reconstruction. Here we test a shotgun sequencing approach, whereby mitochondrial genomes are assembled from complex ecological mixtures through mitochondrial metagenomics, and demonstrate how the approach overcomes many of the taxonomic impediments to the study of biodiversity. DNA from approximately 500 beetle specimens, originating from a single rainforest canopy fogging sample from Borneo, was pooled and shotgun sequenced, followed by de novo assembly of complete and partial mitogenomes for 175 species. The phylogenetic tree obtained from this local sample was highly similar to that from existing mitogenomes selected for global coverage of major lineages of Coleoptera. When all sequences were combined only minor topological changes were induced against this reference set, indicating an increasingly stable estimate of coleopteran phylogeny, while the ecological sample expanded the tip-level representation of several lineages. Robust trees generated from ecological samples now enable an evolutionary framework for ecology. Meanwhile, the inclusion of uncharacterized samples in the tree-of-life rapidly expands taxon and biogeographic representation of lineages without morphological identification. Mitogenomes from shotgun sequencing of unsorted environmental samples and their associated metadata, placed robustly into the phylogenetic tree, constitute novel DNA "superbarcodes" for testing hypotheses regarding global patterns of diversity.

Journal ArticleDOI
TL;DR: Overall, the first comparative transcriptome-wide analysis of caste determination among three major hymenopteran social lineages found few shared caste differentially expressed transcripts across the threeSocial lineages, but there is substantially more overlap at the levels of pathways and biological functions.
Abstract: An area of great interest in evolutionary genomics is whether convergently evolved traits are the result of convergent molecular mechanisms. The presence of queen and worker castes in insect societies is a spectacular example of convergent evolution and phenotypic plasticity. Multiple insect lineages have evolved environmentally induced alternative castes. Given multiple origins of eusociality in Hymenoptera (bees, ants, and wasps), it has been proposed that insect castes evolved from common genetic "toolkits" consisting of deeply conserved genes. Here, we combine data from previously published studies on fire ants and honey bees with new data for Polistes metricus paper wasps to assess the toolkit idea by presenting the first comparative transcriptome-wide analysis of caste determination among three major hymenopteran social lineages. Overall, we found few shared caste differentially expressed transcripts across the three social lineages. However, there is substantially more overlap at the levels of pathways and biological functions. Thus, there are shared elements but not on the level of specific genes. Instead, the toolkit appears to be relatively "loose," that is, different lineages show convergent molecular evolution involving similar metabolic pathways and molecular functions but not the exact same genes. Additionally, our paper wasp data do not support a complementary hypothesis that "novel" taxonomically restricted genes are related to caste differences.

Journal ArticleDOI
TL;DR: Bayesian phylogenetic analyses of simulated virus data are conducted to evaluate the performance of the date-randomization test and propose guidelines for interpretation of its results, finding that the test sometimes fails to detect rate estimates from data with no temporal signal.
Abstract: Rates and timescales of viral evolution can be estimated using phylogenetic analyses of time-structured molecular sequences. This involves the use of molecular-clock methods, calibrated by the sampling times of the viral sequences. However, the spread of these sampling times is not always sufficient to allow the substitution rate to be estimated accurately. We conducted Bayesian phylogenetic analyses of simulated virus data to evaluate the performance of the date-randomization test, which is sometimes used to investigate whether time-structured data sets have temporal signal. An estimate of the substitution rate passes this test if its mean does not fall within the 95% credible intervals of rate estimates obtained using replicate data sets in which the sampling times have been randomized. We find that the test sometimes fails to detect rate estimates from data with no temporal signal. This error can be minimized by using a more conservative criterion, whereby the 95% credible interval of the estimate with correct sampling times should not overlap with those obtained with randomized sampling times. We also investigated the behavior of the test when the sampling times are not uniformly distributed throughout the tree, which sometimes occurs in empirical data sets. The test performs poorly in these circumstances, such that a modification to the randomization scheme is needed. Finally, we illustrate the behavior of the test in analyses of nucleotide sequences of cereal yellow dwarf virus. Our results validate the use of the date-randomization test and allow us to propose guidelines for interpretation of its results.

Journal ArticleDOI
TL;DR: A de novo genome of a Tibetan chicken is assembled and whole genomes of 32 additional chickens, including Tibetan chickens, village chickens, game fowl, and Red Junglefowl are resequenced, and it is found that the Tibetan chickens could broadly be placed into two groups.
Abstract: Much like other indigenous domesticated animals, Tibetan chickens living at high altitudes (2,200-4,100 m) show specific physiological adaptations to the extreme environmental conditions of the Tibetan Plateau, but the genetic bases of these adaptations are not well characterized. Here, we assembled a de novo genome of a Tibetan chicken and resequenced whole genomes of 32 additional chickens, including Tibetan chickens, village chickens, game fowl, and Red Junglefowl, and found that the Tibetan chickens could broadly be placed into two groups. Further analyses revealed that several candidate genes in the calcium-signaling pathway are possibly involved in adaptation to the hypoxia experienced by these chickens, as these genes appear to have experienced directional selection in the two Tibetan chicken populations, suggesting a potential genetic mechanism underlying high altitude adaptation in Tibetan chickens. The candidate selected genes identified in this study, and their variants, may be useful targets for clarifying our understanding of the domestication of chickens in Tibet, and might be useful in current breeding efforts to develop improved breeds for the highlands.

Journal ArticleDOI
TL;DR: The sequencing of 3.7 Mb of MSY in each of 448 human males at a mean coverage of 51× yields 13,261 high-confidence SNPs, 65.9% of which are previously unreported, and constitutes a robust evolutionary framework for analyzing the history of other classes of mutation.
Abstract: Many studies of human populations have used the male-specific region of the Y chromosome (MSY) as a marker, but MSY sequence variants have traditionally been subject to ascertainment bias. Also, dating of haplogroups has relied on Y-specific short tandem repeats (STRs), involving problems of mutation rate choice, and possible long-term mutation saturation. Next-generation sequencing can ascertain single nucleotide polymorphisms (SNPs) in an unbiased way, leading to phylogenies in which branch-lengths are proportional to time, and allowing the times-to-most-recent-common-ancestor (TMRCAs) of nodes to be estimated directly. Here we describe the sequencing of 3.7 Mb of MSY in each of 448 human males at a mean coverage of 51×, yielding 13,261 high-confidence SNPs, 65.9% of which are previously unreported. The resulting phylogeny covers the majority of the known clades, provides date estimates of nodes, and constitutes a robust evolutionary framework for analyzing the history of other classes of mutation. Different clades within the tree show subtle but significant differences in branch lengths to the root. We also apply a set of 23 Y-STRs to the same samples, allowing SNP- and STR-based diversity and TMRCA estimates to be systematically compared. Ongoing purifying selection is suggested by our analysis of the phylogenetic distribution of nonsynonymous variants in 15 MSY single-copy genes.

Journal ArticleDOI
TL;DR: It is shown that FOT genes confer a strong competitive advantage during grape must fermentation by increasing the number and diversity of oligopeptides that yeast can utilize as a source of nitrogen, thereby improving biomass formation, fermentation efficiency, and cell viability.
Abstract: Although an increasing number of horizontal gene transfers have been reported in eukaryotes, experimental evidence for their adaptive value is lacking. Here, we report the recent transfer of a 158-kb genomic region between Torulaspora microellipsoides and Saccharomyces cerevisiae wine yeasts or closely related strains. This genomic region has undergone several rearrangements in S. cerevisiae strains, including gene loss and gene conversion between two tandemly duplicated FOT genes encoding oligopeptide transporters. We show that FOT genes confer a strong competitive advantage during grape must fermentation by increasing the number and diversity of oligopeptides that yeast can utilize as a source of nitrogen, thereby improving biomass formation, fermentation efficiency, and cell viability. Thus, the acquisition of FOT genes has favored yeast adaptation to the nitrogen-limited wine fermentation environment. This finding indicates that anthropic environments offer substantial ecological opportunity for evolutionary diversification through gene exchange between distant yeast species.

Journal ArticleDOI
TL;DR: A high-confidence transcriptional regulatory map covering 388 TFs from 47 families in Arabidopsis reveals distinct functional and evolutionary features of novel TFs, suggesting a plausible mechanism for their contribution to the evolution of multicellular organisms.
Abstract: Transcription factors (TFs) play key roles in both development and stress responses. By integrating into and rewiring original systems, novel TFs contribute significantly to the evolution of transcriptional regulatory networks. Here, we report a high-confidence transcriptional regulatory map covering 388 TFs from 47 families in Arabidopsis. Systematic analysis of this map revealed the architectural heterogeneity of developmental and stress response subnetworks and identified three types of novel network motifs that are absent from unicellular organisms and essential for multicellular development. Moreover, TFs of novel families that emerged during plant landing present higher binding specificities and are preferentially wired into developmental processes and these novel network motifs. Further unveiled connection between the binding specificity and wiring preference of TFs explains the wiring preferences of novel-family TFs. These results reveal distinct functional and evolutionary features of novel TFs, suggesting a plausible mechanism for their contribution to the evolution of multicellular organisms.

Journal ArticleDOI
TL;DR: The influence of context-dependent mutation on genome architecture is strongest in M. florum, consistent with the reduced efficiency of selection in organisms with low effective population size.
Abstract: Despite the general assumption that site-specific mutation rates are independent of the local sequence context, a growing body of evidence suggests otherwise. To further examine context-dependent patterns of mutation, we amassed 5,645 spontaneous mutations in wild- type (WT) and mismatch-repair deficient (MMR(-)) mutation-accumulation (MA) lines of the gram-positive model organism Bacillus subtilis. We then analyzed>7,500 spontaneous base-substitution mutations across B. subtilis, Escherichia coli, and Mesoplasma florum WT and MMR(-) MA lines, finding a context-dependent mutation pattern that is asymmetric around the origin of replication. Different neighboring nucleotides can alter site-specific mutation rates by as much as 75-fold, with sites neighboring G:C base pairs or dimers involving alternating pyrimidine-purine and purine-pyrimidine nucleotides having significantly elevated mutation rates. The influence of context-dependent mutation on genome architecture is strongest in M. florum, consistent with the reduced efficiency of selection in organisms with low effective population size. If not properly accounted for, the disparities arising from patterns of context-dependent mutation can significantly influence interpretations of positive and purifying selection.

Journal ArticleDOI
TL;DR: The distribution of intragenic epistatic effects within this region in seven Hsp90 point mutant backgrounds of neutral to slightly deleterious effect is reported, resulting in a pattern that indicates a drastic change in the distribution of fitness effects one step away from the wild type.
Abstract: Mutations are the source of evolutionary variation. The interactions of multiple mutations can have important effects on fitness and evolutionary trajectories. We have recently described the distribution of fitness effects of all single mutations for a nine-amino-acid region of yeast Hsp90 (Hsp82) implicated in substrate binding. Here, we report and discuss the distribution of intragenic epistatic effects within this region in seven Hsp90 point mutant backgrounds of neutral to slightly deleterious effect, resulting in an analysis of more than 1,000 double mutants. We find negative epistasis between substitutions to be common, and positive epistasis to be rare—resulting in a pattern that indicates a drastic change in the distribution of fitness effects one step away from the wild type. This can be well explained by a concave relationship between phenotype and genotype (i.e., a concave shape of the local fitness landscape), suggesting mutational robustness intrinsic to the local sequence space. Structural analyses indicate that, in this region, epistatic effects are most pronounced when a solvent-inaccessible position is involved in the interaction. In contrast, all 18 observations of positive epistasis involved at least one mutation at a solvent-exposed position. By combining the analysis of evolutionary and biophysical properties of an epistatic landscape, these results contribute to a more detailed understanding of the complexity of protein evolution.

Journal ArticleDOI
TL;DR: This work investigated diverse genomic selections using high-density single nucleotide polymorphism data of five distinct cattle breeds to provide a glimpse into diverse genomic selection during cattle domestication, breed formation, and recent genetic improvement.
Abstract: We investigated diverse genomic selections using high-densit ys ingle nucleotide polymorphism data of fi ve distinct cattle breeds. Based on allele frequency differences, we detected hundreds of candidate regions under positive selection across Holstein, Angus, Charolais, Brahman, and N’Dama. In addition to well-known genes such as KIT, MC1R, ASIP, GHR, LCORL, NCAPG, WIF1 ,a ndABCA12, we found evidence for a variety of novel and less-known genes under selection in cattle, such as LAP3, SAR1B, LRIG3, FGF5 ,a ndNUDCD3. Selective sweeps near LAP3 were then validated by next-generation sequencing. Genome-wide association analysis involving 26,362 Holsteins confirmed that LAP3 and SAR1B were related to milk production traits, suggesting that our candidate regions were likely functional. In addition, haplotype network analyses further revealed distinct selective pressures and evolution patterns across these five cattle breeds. Our results provided a glimpse into diverse genomic selection during cattle domestication, breed formation, and recent genetic improvement. These findings will facilitate genome-assisted breeding to improve animal production and health.

Journal ArticleDOI
TL;DR: Comparative analysis of gene expression patterns in parasitic and nonparasitic angiosperms suggests that parasitism genes are derived primarily from root and floral tissues, but with some genes co-opted from other tissues.
Abstract: The origin of novel traits is recognized as an important process underlying many major evolutionary radiations. We studied the genetic basis for the evolution of haustoria, the novel feeding organs of parasitic flowering plants, using comparative transcriptome sequencing in three species of Orobanchaceae. Around 180 genes are upregulated during haustorial development following host attachment in at least two species, and these are enriched in proteases, cell wall modifying enzymes, and extracellular secretion proteins. Additionally, about 100 shared genes are upregulated in response to haustorium inducing factors prior to host attachment. Collectively, we refer to these newly identified genes as putative “parasitism genes.” Most of these parasitism genes are derived from gene duplications in a common ancestor of Orobanchaceae and Mimulus guttatus, a related nonparasitic plant. Additionally, the signature of relaxed purifying selection and/or adaptive evolution at specific sites was detected in many haustorial genes, and may play an important role in parasite evolution. Comparative analysis of gene expression patterns in parasitic and nonparasitic angiosperms suggests that parasitism genes are derived primarily from root and floral tissues, but with some genes co-opted from other tissues. Gene duplication, often taking place in a nonparasitic ancestor of Orobanchaceae, followed by regulatory neofunctionalization, was an important process in the origin of parasitic haustoria.

Journal ArticleDOI
TL;DR: Although genetic diversity has been greatly reduced during domestication, the remaining mutations were disproportionally biased toward nonsynonymous substitutions, providing unequivocal evidence that deleterious mutations accumulate in low recombining regions of the genome, due to the reduced efficacy of purifying selection.
Abstract: For populations to maintain optimal fitness, harmful mutations must be efficiently purged from the genome. Yet, under circumstances that diminish the effectiveness of natural selection, such as the process of plant and animal domestication, deleterious mutations are predicted to accumulate. Here, we compared the load of deleterious mutations in 21 accessions from natural populations and 19 domesticated accessions of the common sunflower using whole-transcriptome single nucleotide polymorphism data. Although we find that genetic diversity has been greatly reduced during domestication, the remaining mutations were disproportionally biased toward nonsynonymous substitutions. Bioinformatically predicted deleterious mutations affecting protein function were especially strongly over-represented. We also identify similar patterns in two other domesticated species of the sunflower family (globe artichoke and cardoon), indicating that this phenomenon is not due to idiosyncrasies of sunflower domestication or the sunflower genome. Finally, we provide unequivocal evidence that deleterious mutations accumulate in low recombining regions of the genome, due to the reduced efficacy of purifying selection. These results represent a conundrum for crop improvement efforts. Although the elimination of harmful mutations should be a long-term goal of plant and animal breeding programs, it will be difficult to weed them out because of limited recombination.

Journal ArticleDOI
TL;DR: This study finds no genomic evidence for higher-than-neutral levels of molecular convergence, but suggests the presence of abundant epistasis that decreases the likelihood of molecular converge between distantly related lineages.
Abstract: Convergent and parallel amino acid substitutions in protein evolution, collectively referred to as molecular convergence here, have small probabilities under neutral evolution. For this reason, molecular convergence is commonly viewed as evidence for similar adaptations of different species. The surge in the number of reports of molecular convergence in the last decade raises the intriguing question of whether molecular convergence occurs substantially more frequently than expected under neutral evolution. We here address this question using all one-to-one orthologous proteins encoded by the genomes of 12 fruit fly species and those encoded by 17 mammals. We found that the expected amount of molecular convergence varies greatly depending on the specific neutral substitution model assumed at each amino acid site and that the observed amount of molecular convergence is explainable by neutral models incorporating site-specific information of acceptable amino acids. Interestingly, the total number of convergent and parallel substitutions between two lineages, relative to the neutral expectation, decreases with the genetic distance between the two lineages, regardless of the model used in computing the neutral expectation. We hypothesize that this trend results from differences in the amino acids acceptable at a given site among different clades of a phylogeny, due to prevalent epistasis, and provide simulation as well as empirical evidence for this hypothesis. Together, our study finds no genomic evidence for higher-than-neutral levels of molecular convergence, but suggests the presence of abundant epistasis that decreases the likelihood of molecular convergence between distantly related lineages.

Journal ArticleDOI
TL;DR: Using comparative genomics, the immune system of all the major groups of arthropods beyond insects for the first time is characterized—studying five chelicerates, a myriapod, and a crustacean, finding clear traces of an ancient origin of innate immunity.
Abstract: This project was funded by a Royal Society University Research Fellowship and a European Research Council grant DrosophilaInfection (281668) to F.M.J., and a Medical Research Council studentship to W.J.P.