scispace - formally typeset
Search or ask a question

Showing papers in "Molecular Biology and Evolution in 2016"


Journal ArticleDOI
TL;DR: The latest version of the Molecular Evolutionary Genetics Analysis (Mega) software, which contains many sophisticated methods and tools for phylogenomics and phylomedicine, has been optimized for use on 64-bit computing systems for analyzing larger datasets.
Abstract: We present the latest version of the Molecular Evolutionary Genetics Analysis (Mega) software, which contains many sophisticated methods and tools for phylogenomics and phylomedicine. In this major upgrade, Mega has been optimized for use on 64-bit computing systems for analyzing larger datasets. Researchers can now explore and analyze tens of thousands of sequences in Mega The new version also provides an advanced wizard for building timetrees and includes a new functionality to automatically predict gene duplication events in gene family trees. The 64-bit Mega is made available in two interfaces: graphical and command line. The graphical user interface (GUI) is a native Microsoft Windows application that can also be used on Mac OS X. The command line Mega is available as native applications for Windows, Linux, and Mac OS X. They are intended for use in high-throughput and scripted analysis. Both versions are available from www.megasoftware.net free of charge.

33,048 citations


Journal ArticleDOI
TL;DR: PartitionFinder 2 is a program for automatically selecting best-fit partitioning schemes and models of evolution for phylogenetic analyses that includes the ability to analyze morphological datasets, new methods to analyze genome-scale datasets, and new output formats to facilitate interoperability with downstream software.
Abstract: PartitionFinder 2 is a program for automatically selecting best-fit partitioning schemes and models of evolution for phylogenetic analyses. PartitionFinder 2 is substantially faster and more efficient than version 1, and incorporates many new methods and features. These include the ability to analyze morphological datasets, new methods to analyze genome-scale datasets, new output formats to facilitate interoperability with downstream software, and many new models of molecular evolution. PartitionFinder 2 is freely available under an open source license and works on Windows, OSX, and Linux operating systems. It can be downloaded from www.robertlanfear.com/partitionfinder. The source code is available at https://github.com/brettc/partitionfinder.

3,445 citations


Journal ArticleDOI
TL;DR: The Environment for Tree Exploration v3 is presented, featuring numerous improvements in the underlying library of methods, and providing a novel set of standalone tools to perform common tasks in comparative genomics and phylogenetics.
Abstract: The Environment for Tree Exploration (ETE) is a computational framework that simplifies the reconstruction, analysis, and visualization of phylogenetic trees and multiple sequence alignments. Here, we present ETE v3, featuring numerous improvements in the underlying library of methods, and providing a novel set of standalone tools to perform common tasks in comparative genomics and phylogenetics. The new features include (i) building gene-based and supermatrix-based phylogenies using a single command, (ii) testing and visualizing evolutionary models, (iii) calculating distances between trees of different size or including duplications, and (iv) providing seamless integration with the NCBI taxonomy database. ETE is freely available at http://etetoolkit.org.

1,452 citations


Journal ArticleDOI
TL;DR: This article proposes a fast algorithm to compute quartet-based support for each branch of a given species tree with regard to a given set of gene trees and evaluates the precision and recall of the local PP on a wide set of simulated and biological datasets.
Abstract: Species tree reconstruction is complicated by effects of incomplete lineage sorting, commonly modeled by the multi-species coalescent model (MSC). While there has been substantial progress in developing methods that estimate a species tree given a collection of gene trees, less attention has been paid to fast and accurate methods of quantifying support. In this article, we propose a fast algorithm to compute quartet-based support for each branch of a given species tree with regard to a given set of gene trees. We then show how the quartet support can be used in the context of the MSC to compute (1) the local posterior probability (PP) that the branch is in the species tree and (2) the length of the branch in coalescent units. We evaluate the precision and recall of the local PP on a wide set of simulated and biological datasets, and show that it has very high precision and improved recall compared with multi-locus bootstrapping. The estimated branch lengths are highly accurate when gene tree estimation error is low, but are underestimated when gene tree estimation error increases. Computation of both the branch length and local PP is implemented as new features in ASTRAL.

578 citations


Journal ArticleDOI
TL;DR: The spatial phylogenetic reconstruction of evolutionary dynamics software is overhauled, now called SpreaD3 to emphasize the use of data-driven documents, as an analysis and visualization package that primarily complements Bayesian inference in BEAST.
Abstract: Model-based phylogenetic reconstructions increasingly consider spatial or phenotypic traits in conjunction with sequence data to study evolutionary processes. Alongside parameter estimation, visualization of ancestral reconstructions represents an integral part of these analyses. Here, we present a complete overhaul of the spatial phylogenetic reconstruction of evolutionary dynamics software, now called SpreaD3 to emphasize the use of data-driven documents, as an analysis and visualization package that primarily complements Bayesian inference in BEAST (http://beast.bio.ed.ac.uk, last accessed 9 May 2016). The integration of JavaScript D3 libraries (www.d3.org, last accessed 9 May 2016) offers novel interactive web-based visualization capacities that are not restricted to spatial traits and extend to any discrete or continuously valued trait for any organism of interest.

352 citations


Journal ArticleDOI
TL;DR: DNA methylation was found to be widespread, detected in all orders examined except Diptera (flies), and a gene duplication event in the maintenance DNA methyltransferase 1 (DNMT1) that is shared by some Hymenoptera, and paralogs have experienced divergent, nonneutral evolution suggests alternative DNA methylation pathways may exist.
Abstract: DNA methylation contributes to gene and transcriptional regulation in eukaryotes, and therefore has been hypothesized to facilitate the evolution of plastic traits such as sociality in insects. However, DNA methylation is sparsely studied in insects. Therefore, we documented patterns of DNA methylation across a wide diversity of insects. We predicted that underlying enzymatic machinery is concordant with patterns of DNA methylation. Finally, given the suggestion that DNA methylation facilitated social evolution in Hymenoptera, we tested the hypothesis that the DNA methylation system will be associated with presence/absence of sociality among other insect orders. We found DNA methylation to be widespread, detected in all orders examined except Diptera (flies). Whole genome bisulfite sequencing showed that orders differed in levels of DNA methylation. Hymenopteran (ants, bees, wasps and sawflies) had some of the lowest levels, including several potential losses. Blattodea (cockroaches and termites) show all possible patterns, including a potential loss of DNA methylation in a eusocial species whereas solitary species had the highest levels. Species with DNA methylation do not always possess the typical enzymatic machinery. We identified a gene duplication event in the maintenance DNA methyltransferase 1 (DNMT1) that is shared by some Hymenoptera, and paralogs have experienced divergent, nonneutral evolution. This diversity and nonneutral evolution of underlying machinery suggests alternative DNA methylation pathways may exist. Phylogenetically corrected comparisons revealed no evidence that supports evolutionary association between sociality and DNA methylation. Future functional studies will be required to advance our understanding of DNA methylation in insects.

259 citations


Journal ArticleDOI
TL;DR: A novel inference scheme based on the statistical analysis of large alignments of homologs of the protein of interest is developed, which is able to capture epistatic couplings between residues, and therefore to assess the dependence of mutational effects on the sequence context where they appear.
Abstract: The quantitative characterization of mutational landscapes is a task of outstanding importance in evolutionary and medical biology: It is, for example, of central importance for our understanding of the phenotypic effect of mutations related to disease and antibiotic drug resistance. Here we develop a novel inference scheme for mutational landscapes, which is based on the statistical analysis of large alignments of homologs of the protein of interest. Our method is able to capture epistatic couplings between residues, and therefore to assess the dependence of mutational effects on the sequence context where they appear. Compared with recent large-scale mutagenesis data of the beta-lactamase TEM-1, a protein providing resistance against beta-lactam antibiotics, our method leads to an increase of about 40% in explicative power as compared with approaches neglecting epistasis. We find that the informative sequence context extends to residues at native distances of about 20 A from the mutated site, reaching thus far beyond residues in direct physical contact.

245 citations


Journal ArticleDOI
TL;DR: Reconstruction of ancestral morphological states during the Brassicaceae evolution indicates prevalent parallel (convergent) evolution of several traits over deep times across the entire family.
Abstract: Brassicaceae is one of the most diverse and economically valuable angiosperm families with widely cultivated vegetable crops and scientifically important model plants, such as Arabidopsis thaliana. The evolutionary history, ecological, morphological, and genetic diversity, and abundant resources and knowledge of Brassicaceae make it an excellent model family for evolutionary studies. Recent phylogenetic analyses of the family revealed three major lineages (I, II, and III), but relationships among and within these lineages remain largely unclear. Here, we present a highly supported phylogeny with six major clades using nuclear markers from newly sequenced transcriptomes of 32 Brassicaceae species and large data sets from additional taxa for a total of 55 species spanning 29 out of 51 tribes. Clade A consisting of Lineage I and Macropodium nivale is sister to combined Clade B (with Lineage II and others) and a new Clade C. The ABC clade is sister to Clade D with species previously weakly associated with Lineage II and Clade E (Lineage III) is sister to the ABCD clade. Clade F (the tribe Aethionemeae) is sister to the remainder of the entire family. Molecular clock estimation reveals an early radiation of major clades near or shortly after the Eocene-Oligocene boundary and subsequent nested divergences of several tribes of the previously polytomous Expanded Lineage II. Reconstruction of ancestral morphological states during the Brassicaceae evolution indicates prevalent parallel (convergent) evolution of several traits over deep times across the entire family. These results form a foundation for future evolutionary analyses of structures and functions across Brassicaceae.

240 citations


Journal ArticleDOI
TL;DR: It is proposed that WGDs and environmental factors, including animals, contributed to the evolution of the many fruits in Rosaceae, which provide a foundation for understanding fruit evolution.
Abstract: Fruits are the defining feature of angiosperms, likely have contributed to angiosperm successes by protecting and dispersing seeds, and provide foods to humans and other animals, with many morphological types and important ecological and agricultural implications. Rosaceae is a family with ∼3000 species and an extraordinary spectrum of distinct fruits, including fleshy peach, apple, and strawberry prized by their consumers, as well as dry achenetum and follicetum with features facilitating seed dispersal, excellent for studying fruit evolution. To address Rosaceae fruit evolution and other questions, we generated 125 new transcriptomic and genomic datasets and identified hundreds of nuclear genes to reconstruct a well-resolved Rosaceae phylogeny with highly supported monophyly of all subfamilies and tribes. Molecular clock analysis revealed an estimated age of ∼101.6 Ma for crown Rosaceae and divergence times of tribes and genera, providing a geological and climate context for fruit evolution. Phylogenomic analysis yielded strong evidence for numerous whole genome duplications (WGDs), supporting the hypothesis that the apple tribe had a WGD and revealing another one shared by fleshy fruit-bearing members of this tribe, with moderate support for WGDs in the peach tribe and other groups. Ancestral character reconstruction for fruit types supports independent origins of fleshy fruits from dry-fruit ancestors, including the evolution of drupes (e.g., peach) and pomes (e.g., apple) from follicetum, and drupetum (raspberry and blackberry) from achenetum. We propose that WGDs and environmental factors, including animals, contributed to the evolution of the many fruits in Rosaceae, which provide a foundation for understanding fruit evolution.

207 citations


Journal ArticleDOI
TL;DR: These analyses provide a well-resolved phylogeny of landfowl, including strong support for previously problematic relationships such as those among junglefowl (Gallus), and clarify the position of two enigmatic galliform genera not sampled in previous molecular phylogenetic studies.
Abstract: Production of massive DNA sequence data sets is transforming phylogenetic inference, but best practices for analyzing such data sets are not well established. One uncertainty is robustness to missing data, particularly in coalescent frameworks. To understand the effects of increasing matrix size and loci at the cost of increasing missing data, we produced a 90 taxon, 2.2 megabase, 4,800 locus sequence matrix of landfowl using target capture of ultraconserved elements. We then compared phylogenies estimated with concatenated maximum likelihood, quartet-based methods executed on concatenated matrices and gene tree reconciliation methods, across five thresholds of missing data. Results of maximum likelihood and quartet analyses were similar, well resolved, and demonstrated increasing support with increasing matrix size and sparseness. Conversely, gene tree reconciliation produced unexpected relationships when we included all informative loci, with certain taxa placed toward the root compared with other approaches. Inspection of these taxa identified a prevalence of short average contigs, which potentially biased gene tree inference and caused erroneous results in gene tree reconciliation. This suggests that the more problematic missing data in gene tree-based analyses are partial sequences rather than entire missing sequences from locus alignments. Limiting gene tree reconciliation to the most informative loci solved this problem, producing well-supported topologies congruent with concatenation and quartet methods. Collectively, our analyses provide a well-resolved phylogeny of landfowl, including strong support for previously problematic relationships such as those among junglefowl (Gallus), and clarify the position of two enigmatic galliform genera (Lerwa, Melanoperdix) not sampled in previous molecular phylogenetic studies.

204 citations


Journal ArticleDOI
TL;DR: This study offers novel insights into rapid genomic adaptations to extreme environments in sheep and other animals, and provides a valuable resource for future research on livestock breeding in response to climate change.
Abstract: Global climate change has a significant effect on extreme environments and a profound influence on species survival. However, little is known of the genome-wide pattern of livestock adaptations to extreme environments over a short time frame following domestication. Sheep (Ovis aries) have become well adapted to a diverse range of agroecological zones, including certain extreme environments (e.g., plateaus and deserts), during their post-domestication (approximately 8-9 kya) migration and differentiation. Here, we generated whole-genome sequences from 77 native sheep, with an average effective sequencing depth of ∼5× for 75 samples and ∼42× for 2 samples. Comparative genomic analyses among sheep in contrasting environments, that is, plateau (>4,000 m above sea level) versus lowland ( 1500 m) versus low-altitude region ( 600 mm), and arid zone ( 400 mm), detected a novel set of candidate genes as well as pathways and GO categories that are putatively associated with hypoxia responses at high altitudes and water reabsorption in arid environments. In addition, candidate genes and GO terms functionally related to energy metabolism and body size variations were identified. This study offers novel insights into rapid genomic adaptations to extreme environments in sheep and other animals, and provides a valuable resource for future research on livestock breeding in response to climate change.

Journal ArticleDOI
TL;DR: It is suggested that insertion sequences play an important role in plasmid evolution by maintaining the plasticity necessary to alleviate plasmids–host constrains and the observed evolutionary strategy consistently followed by all evolved E. coli lineages exposes a trade-off between horizontal and vertical transmission.
Abstract: Large conjugative plasmids are important drivers of bacterial evolution and contribute significantly to the dissemination of antibiotic resistance. Although plasmid borne multidrug resistance is recognized as one of the main challenges in modern medicine, the adaptive forces shaping the evolution of these plasmids within pathogenic hosts are poorly understood. Here we study plasmid-host adaptations following transfer of a 73 kb conjugative multidrug resistance plasmid to naive clinical isolates of Klebsiella pneumoniae and Escherichia coli. We use experimental evolution, mathematical modelling and population sequencing to show that the long-term persistence and molecular integrity of the plasmid is highly influenced by multiple factors within a 25 kb plasmid region constituting a host-dependent burden. In the E. coli hosts investigated here, improved plasmid stability readily evolves via IS26 mediated deletions of costly regions from the plasmid backbone, effectively expanding the host-range of the plasmid. Although these adaptations were also beneficial to plasmid persistence in a naive K. pneumoniae host, they were never observed in this species, indicating that differential evolvability can limit opportunities of plasmid adaptation. While insertion sequences are well known to supply plasmids with adaptive traits, our findings suggest that they also play an important role in plasmid evolution by maintaining the plasticity necessary to alleviate plasmid-host constrains. Further, the observed evolutionary strategy consistently followed by all evolved E. coli lineages exposes a trade-off between horizontal and vertical transmission that may ultimately limit the dissemination potential of clinical multidrug resistance plasmids in these hosts.

Journal ArticleDOI
TL;DR: The results suggest that white-rot fungi evolved later in the Agaricomycetes, with the first class II peroxidases reconstructed in the ancestor of the Auriculariales and residual Agarics, and the origin of ligninolytic enzymes reconstructed.
Abstract: Evolution of lignocellulose decomposition was one of the most ecologically important innovations in fungi. White-rot fungi in the Agaricomycetes (mushrooms and relatives) are the most effective microorganisms in degrading both cellulose and lignin components of woody plant cell walls (PCW). However, the precise evolutionary origins of lignocellulose decomposition are poorly understood, largely because certain early-diverging clades of Agaricomycetes and its sister group, the Dacrymycetes, have yet to be sampled, or have been undersampled, in comparative genomic studies. Here, we present new genome sequences of ten saprotrophic fungi, including members of the Dacrymycetes and early-diverging clades of Agaricomycetes (Cantharellales, Sebacinales, Auriculariales, and Trechisporales), which we use to refine the origins and evolutionary history of the enzymatic toolkit of lignocellulose decomposition. We reconstructed the origin of ligninolytic enzymes, focusing on class II peroxidases (AA2), as well as enzymes that attack crystalline cellulose. Despite previous reports of white rot appearing as early as the Dacrymycetes, our results suggest that white-rot fungi evolved later in the Agaricomycetes, with the first class II peroxidases reconstructed in the ancestor of the Auriculariales and residual Agaricomycetes. The exemplars of the most ancient clades of Agaricomycetes that we sampled all lack class II peroxidases, and are thus concluded to use a combination of plesiomorphic and derived PCW degrading enzymes that predate the evolution of white rot.

Journal ArticleDOI
TL;DR: Phylo.io is introduced, a web application to visualize and compare phylogenetic trees side-by-side and has distinctive features are: highlighting of similarities and differences between two trees, automatic identification of the best matching rooting and leaf order, scalability to large trees, high usability, multiplatform support via standard HTML5 implementation, and possibility to store and share visualizations.
Abstract: Phylogenetic trees are pervasively used to depict evolutionary relationships. Increasingly, researchers need to visualize large trees and compare multiple large trees inferred for the same set of taxa (reflecting uncertainty in the tree inference or genuine discordance among the loci analyzed). Existing tree visualization tools are however not well suited to these tasks. In particular, side-by-side comparison of trees can prove challenging beyond a few dozen taxa. Here, we introduce Phylo.io, a web application to visualize and compare phylogenetic trees side-by-side. Its distinctive features are: highlighting of similarities and differences between two trees, automatic identification of the best matching rooting and leaf order, scalability to large trees, high usability, multiplatform support via standard HTML5 implementation, and possibility to store and share visualizations. The tool can be freely accessed at http://phylo.io and can easily be embedded in other web servers. The code for the associated JavaScript library is available at https://github.com/DessimozLab/phylo-io under an MIT open source license.

Journal ArticleDOI
TL;DR: Assessing the interactions between miRNAs and NBS-LRRs, it is found nucleotide diversity in the wobble position of the codons in the target site drives the diversification of mi RNAs.
Abstract: High expression of plant nucleotide binding site leucine-rich repeat (NBS-LRR) defense genes is often lethal to plant cells, a phenotype perhaps associated with fitness costs. Plants implement several mechanisms to control the transcript level of NBS-LRR defense genes. As negative transcriptional regulators, diverse miRNAs target NBS-LRRs in eudicots and gymnosperms. To understand the evolutionary benefits of this miRNA-NBS-LRR regulatory system, we investigated the NBS-LRRs of 70 land plants, coupling this analysis with extensive small RNA data. A tight association between the diversity of NBS-LRRs and miRNAs was found. The miRNAs typically target highly duplicated NBS-LRRs In comparison, families of heterogeneous NBS-LRRs were rarely targeted by miRNAs in Poaceae and Brassicaceae genomes. We observed that duplicated NBS-LRRs from different gene families periodically gave birth to new miRNAs. Most of these newly emerged miRNAs target the same conserved, encoded protein motif of NBS-LRRs, consistent with a model of convergent evolution for these miRNAs. By assessing the interactions between miRNAs and NBS-LRRs, we found nucleotide diversity in the wobble position of the codons in the target site drives the diversification of miRNAs. Taken together, we propose a co-evolutionary model of plant NBS-LRRs and miRNAs hypothesizing how plants balance the benefits and costs of NBS-LRR defense genes.

Journal ArticleDOI
TL;DR: It is found that the number and allelic frequencies of sites that are uniquely shared between archaic humans and specific present-day populations are particularly useful for detecting adaptive introgression.
Abstract: Comparisons of DNA from archaic and modern humans show that these groups interbred, and in some cases received an evolutionary advantage from doing so. This process-adaptive introgression-may lead to a faster rate of adaptation than is predicted from models with mutation and selection alone. Within the last couple of years, a series of studies have identified regions of the genome that are likely examples of adaptive introgression. In many cases, once a region was ascertained as being introgressed, commonly used statistics based on both haplotype as well as allele frequency information were employed to test for positive selection. Introgression by itself, however, changes both the haplotype structure and the distribution of allele frequencies, thus confounding traditional tests for detecting positive selection. Therefore, patterns generated by introgression alone may lead to false inferences of positive selection. Here we explore models involving both introgression and positive selection to investigate the behavior of various statistics under adaptive introgression. In particular, we find that the number and allelic frequencies of sites that are uniquely shared between archaic humans and specific present-day populations are particularly useful for detecting adaptive introgression. We then examine the 1000 Genomes dataset to characterize the landscape of uniquely shared archaic alleles in human populations. Finally, we identify regions that were likely subject to adaptive introgression and discuss some of the most promising candidate genes located in these regions.

Journal ArticleDOI
TL;DR: Diversification analyses showed Xenarthra to be an ancient clade with a constant diversification rate through time with a species turnover driven by high but constant extinction, and to split armadillos into two distinct families Dasypodidae and Chlamyphoridae to better reflect their ancient divergence.
Abstract: Xenarthra (armadillos, sloths, and anteaters) constitutes one of the four major clades of placental mammals. Despite their phylogenetic distinctiveness in mammals, a reference phylogeny is still lacking for the 31 described species. Here we used Illumina shotgun sequencing to assemble 33 new complete mitochondrial genomes, establishing Xenarthra as the first major placental clade to be fully sequenced at the species level for mitogenomes. The resulting data set allowed the reconstruction of a robust phylogenetic framework and timescale that are consistent with previous studies conducted at the genus level using nuclear genes. Incorporating the full species diversity of extant xenarthrans points to a number of inconsistencies in xenarthran systematics and species definition. We propose to split armadillos into two distinct families Dasypodidae (dasypodines) and Chlamyphoridae (euphractines, chlamyphorines, and tolypeutines) to better reflect their ancient divergence, estimated around 42 Ma. Species delimitation within long-nosed armadillos (genus Dasypus) appeared more complex than anticipated, with the discovery of a divergent lineage in French Guiana. Diversification analyses showed Xenarthra to be an ancient clade with a constant diversification rate through time with a species turnover driven by high but constant extinction. We also detected a significant negative correlation between speciation rate and past temperature fluctuations with an increase in speciation rate corresponding to the general cooling observed during the last 15 My. Biogeographic reconstructions identified the tropical rainforest biome of Amazonia and the Guiana Shield as the cradle of xenarthran evolutionary history with subsequent dispersions into more open and dry habitats.

Journal ArticleDOI
TL;DR: Using diagnostic single nucleotide polymorphism (SNP) markers to estimate inversion frequencies from 28 whole-genome Pool-seq samples collected from 10 populations along the North American east coast provides strong evidence that inversion clines are maintained by spatially-and perhaps also temporally-varying selection.
Abstract: Clines in chromosomal inversion polymorphisms-presumably driven by climatic gradients-are common but there is surprisingly little evidence for selection acting on them. Here we address this long-standing issue in Drosophila melanogaster by using diagnostic single nucleotide polymorphism (SNP) markers to estimate inversion frequencies from 28 whole-genome Pool-seq samples collected from 10 populations along the North American east coast. Inversions In(3L)P, In(3R)Mo, and In(3R)Payne showed clear latitudinal clines, and for In(2L)t, In(2R)NS, and In(3R)Payne the steepness of the clinal slopes changed between summer and fall. Consistent with an effect of seasonality on inversion frequencies, we detected small but stable seasonal fluctuations of In(2R)NS and In(3R)Payne in a temperate Pennsylvanian population over 4 years. In support of spatially varying selection, we observed that the cline in In(3R)Payne has remained stable for >40 years and that the frequencies of In(2L)t and In(3R)Payne are strongly correlated with climatic factors that vary latitudinally, independent of population structure. To test whether these patterns are adaptive, we compared the amount of genetic differentiation of inversions versus neutral SNPs and found that the clines in In(2L)t and In(3R)Payne are maintained nonneutrally and independent of admixture. We also identified numerous clinal inversion-associated SNPs, many of which exhibit parallel differentiation along the Australian cline and reside in genes known to affect fitness-related traits. Together, our results provide strong evidence that inversion clines are maintained by spatially-and perhaps also temporally-varying selection. We interpret our data in light of current hypotheses about how inversions are established and maintained.

Journal ArticleDOI
TL;DR: An improved resource that provides D. melanogaster genomes from multiple sources and provides an aligned D. simulans genome to facilitate divergence comparisons will broaden the range of population genomic questions that can addressed from multi-population allele frequencies and haplotypes in this model species.
Abstract: The Drosophila Genome Nexus is a population genomic resource that provides D. melanogaster genomes from multiple sources. To facilitate comparisons across data sets, genomes are aligned using a common reference alignment pipeline which involves two rounds of mapping. Regions of residual heterozygosity, identity-by-descent, and recent population admixture are annotated to enable data filtering based on the user's needs. Here, we present a significant expansion of the Drosophila Genome Nexus, which brings the current data object to a total of 1,121 wild-derived genomes. New additions include 305 previously unpublished genomes from inbred lines representing six population samples in Egypt, Ethiopia, France, and South Africa, along with another 193 genomes added from recently-published data sets. We also provide an aligned D. simulans genome to facilitate divergence comparisons. This improved resource will broaden the range of population genomic questions that can addressed from multi-population allele frequencies and haplotypes in this model species. The larger set of genomes will also enhance the discovery of functionally relevant natural variation that exists within and between populations.

Journal ArticleDOI
TL;DR: It is suggested that positive Darwinian selection might be the driving force underlying the formation and evolution of miRNA clustering and the functional co-adaptation between new and old miRNAs in the miR-17–92 cluster.
Abstract: MicroRNAs (miRNAs) are endogenously expressed small noncoding RNAs. The genomic locations of animal miRNAs are significantly clustered in discrete loci. We found duplication and de novo formation were important mechanisms to create miRNA clusters and the clustered miRNAs tend to be evolutionarily conserved. We proposed a "functional co-adaptation" model to explain how clustering helps newly emerged miRNAs survive and develop functions. We presented evidence that abundance of miRNAs in the same clusters were highly correlated and those miRNAs exerted cooperative repressive effects on target genes in human tissues. By transfecting miRNAs into human and fly cells and extensively profiling the transcriptome alteration with deep-sequencing, we further demonstrated the functional co-adaptation between new and old miRNAs in the miR-17-92 cluster. Our population genomic analysis suggest that positive Darwinian selection might be the driving force underlying the formation and evolution of miRNA clustering. Our model provided novel insights into mechanisms and evolutionary significance of miRNA clustering.

Journal ArticleDOI
TL;DR: Evidence of widespread convergence at the gene level is presented by identifying parallel shifts in evolutionary rate during three independent episodes of mammalian adaptation to the marine environment by identifying Hundreds of genes accelerated their evolutionary rates in all three marine mammal lineages during their transition to aquatic life.
Abstract: Mammal species have made the transition to the marine environment several times, and their lineages represent one of the classical examples of convergent evolution in morphological and physiological traits. Nevertheless, the genetic mechanisms of their phenotypic transition are poorly understood, and investigations into convergence at the molecular level have been inconclusive. While past studies have searched for convergent changes at specific amino acid sites, we propose an alternative strategy to identify those genes that experienced convergent changes in their selective pressures, visible as changes in evolutionary rate specifically in the marine lineages. We present evidence of widespread convergence at the gene level by identifying parallel shifts in evolutionary rate during three independent episodes of mammalian adaptation to the marine environment. Hundreds of genes accelerated their evolutionary rates in all three marine mammal lineages during their transition to aquatic life. These marine-accelerated genes are highly enriched for pathways that control recognized functional adaptations in marine mammals, including muscle physiology, lipid-metabolism, sensory systems, and skin and connective tissue. The accelerations resulted from both adaptive evolution as seen in skin and lung genes, and loss of function as in gustatory and olfactory genes. In regard to sensory systems, this finding provides further evidence that reduced senses of taste and smell are ubiquitous in marine mammals. Our analysis demonstrates the feasibility of identifying genes underlying convergent organism-level characteristics on a genome-wide scale and without prior knowledge of adaptations, and provides a powerful approach for investigating the physiological functions of mammalian genes.

Journal ArticleDOI
TL;DR: By conducting the first measurements of rates of DNA turnover in seed plant mitogenomes, it is discovered that turnover rates vary by orders of magnitude among species.
Abstract: Mitochondrial genomes (mitogenomes) of flowering plants are well known for their extreme diversity in size, structure, gene content, and rates of sequence evolution and recombination. In contrast, little is known about mitogenomic diversity and evolution within gymnosperms. Only a single complete genome sequence is available, from the cycad Cycas taitungensis, while limited information is available for the one draft sequence, from Norway spruce (Picea abies). To examine mitogenomic evolution in gymnosperms, we generated complete genome sequences for the ginkgo tree (Ginkgo biloba) and a gnetophyte (Welwitschia mirabilis). There is great disparity in size, sequence conservation, levels of shared DNA, and functional content among gymnosperm mitogenomes. The Cycas and Ginkgo mitogenomes are relatively small, have low substitution rates, and possess numerous genes, introns, and edit sites; we infer that these properties were present in the ancestral seed plant. By contrast, the Welwitschia mitogenome has an expanded size coupled with accelerated substitution rates and extensive loss of these functional features. The Picea genome has expanded further, to more than 4 Mb. With regard to structural evolution, the Cycas and Ginkgo mitogenomes share a remarkable amount of intergenic DNA, which may be related to the limited recombinational activity detected at repeats in Ginkgo Conversely, the Welwitschia mitogenome shares almost no intergenic DNA with any other seed plant. By conducting the first measurements of rates of DNA turnover in seed plant mitogenomes, we discovered that turnover rates vary by orders of magnitude among species.

Journal ArticleDOI
TL;DR: It is demonstrated that concatenation (RAxML), gene-tree-based coalescent (ASTRAL, MP-EST, and STAR), and supertree (matrix representation with parsimony [MRP]) methods perform reliably, so long as missing data are randomly distributed and that a sufficiently large number of genes are sampled.
Abstract: Phylogeneticists are increasingly assembling genome-scale data sets that include hundreds of genes to resolve their focal clades. Although these data sets commonly include a moderate to high amount of missing data, there remains no consensus on their impact to species tree estimation. Here, using several simulated and empirical data sets, we assess the effects of missing data on species tree estimation under varying degrees of incomplete lineage sorting (ILS) and gene rate heterogeneity. We demonstrate that concatenation (RAxML), gene-tree-based coalescent (ASTRAL, MP-EST, and STAR), and supertree (matrix representation with parsimony [MRP]) methods perform reliably, so long as missing data are randomly distributed (by gene and/or by species) and that a sufficiently large number of genes are sampled. When data sets are indecisive sensu Sanderson et al. (2010. Phylogenomics with incomplete taxon coverage: the limits to inference. BMC Evol Biol. 10:155) and/or ILS is high, however, high amounts of missing data that are randomly distributed require exhaustive levels of gene sampling, likely exceeding most empirical studies to date. Moreover, missing data become especially problematic when they are nonrandomly distributed. We demonstrate that STAR produces inconsistent results when the amount of nonrandom missing data is high, regardless of the degree of ILS and gene rate heterogeneity. Similarly, concatenation methods using maximum likelihood can be misled by nonrandom missing data in the presence of gene rate heterogeneity, which becomes further exacerbated when combined with high ILS. In contrast, ASTRAL, MP-EST, and MRP are more robust under all of these scenarios. These results underscore the importance of understanding the influence of missing data in the phylogenomics era.

Journal ArticleDOI
TL;DR: PCA-based statistics, implemented in the PC Adapt R package and the PCAdapt fast open-source software, retrieve well-known signals of human adaptation, which is encouraging for future whole-genome sequencing project, especially when defining populations is difficult.
Abstract: To characterize natural selection, various analytical methods for detecting candidate genomic regions have been developed. We propose to perform genome-wide scans of natural selection using principal component analysis (PCA). We show that the common FST index of genetic differentiation between populations can be viewed as the proportion of variance explained by the principal components. Considering the correlations between genetic variants and each principal component provides a conceptual framework to detect genetic variants involved in local adaptation without any prior definition of populations. To validate the PCA-based approach, we consider the 1000 Genomes data (phase 1) considering 850 individuals coming from Africa, Asia, and Europe. The number of genetic variants is of the order of 36 millions obtained with a low-coverage sequencing depth (3×). The correlations between genetic variation and each principal component provide well-known targets for positive selection (EDAR, SLC24A5, SLC45A2, DARC), and also new candidate genes (APPBPP2, TP1A1, RTTN, KCNMA, MYO5C) and noncoding RNAs. In addition to identifying genes involved in biological adaptation, we identify two biological pathways involved in polygenic adaptation that are related to the innate immune system (beta defensins) and to lipid metabolism (fatty acid omega oxidation). An additional analysis of European data shows that a genome scan based on PCA retrieves classical examples of local adaptation even when there are no well-defined populations. PCA-based statistics, implemented in the PCAdapt R package and the PCAdapt fast open-source software, retrieve well-known signals of human adaptation, which is encouraging for future whole-genome sequencing project, especially when defining populations is difficult.

Journal ArticleDOI
TL;DR: A fully probabilistic approach for the joint reconstruction of phylodynamic history in structured populations (such as geographic structure) based on a multitype birth–death process is presented and can be used to quantify the spread of a pathogen in a structured population.
Abstract: When viruses spread, outbreaks can be spawned in previously unaffected regions. Depending on the time and mode of introduction, each regional outbreak can have its own epidemic dynamics. The migration and phylodynamic processes are often intertwined and need to be taken into account when analyzing temporally and spatially structured virus data. In this article, we present a fully probabilistic approach for the joint reconstruction of phylodynamic history in structured populations (such as geographic structure) based on a multitype birth-death process. This approach can be used to quantify the spread of a pathogen in a structured population. Changes in epidemic dynamics through time within subpopulations are incorporated through piecewise constant changes in transmission parameters.We analyze a global human influenza H3N2 virus data set from a geographically structured host population to demonstrate how seasonal dynamics can be inferred simultaneously with the phylogeny and migration process. Our results suggest that the main migration path among the northern, tropical, and southern region represented in the sample analyzed here is the one leading from the tropics to the northern region. Furthermore, the time-dependent transmission dynamics between and within two HIV risk groups, heterosexuals and injecting drug users, in the Latvian HIV epidemic are investigated. Our analyses confirm that the Latvian HIV epidemic peaking around 2001 was mainly driven by the injecting drug user risk group.

Journal ArticleDOI
TL;DR: This work used genes from 64 new transcriptome datasets and others to reconstruct a robust Asteraceae phylogeny, covering 73 species from 18 tribes in six subfamilies, and provides different evidence for several WGDs in Asteraceae and reveals distinct association among WGD events, dramatic changes in environment and species radiations.
Abstract: Biodiversity results from multiple evolutionary mechanisms, including genetic variation and natural selection. Whole-genome duplications (WGDs), or polyploidizations, provide opportunities for large-scale genetic modifications. Many evolutionarily successful lineages, including angiosperms and vertebrates, are ancient polyploids, suggesting that WGDs are a driving force in evolution. However, this hypothesis is challenged by the observed lower speciation and higher extinction rates of recently formed polyploids than diploids. Asteraceae includes about 10% of angiosperm species, is thus undoubtedly one of the most successful lineages and paleopolyploidization was suggested early in this family using a small number of datasets. Here, we used genes from 64 new transcriptome datasets and others to reconstruct a robust Asteraceae phylogeny, covering 73 species from 18 tribes in six subfamilies. We estimated their divergence times and further identified multiple potential ancient WGDs within several tribes and shared by the Heliantheae alliance, core Asteraceae (Asteroideae-Mutisioideae), and also with the sister family Calyceraceae. For two of the WGD events, there were subsequent great increases in biodiversity; the older one proceeded the divergence of at least 10 subfamilies within 10 My, with great variation in morphology and physiology, whereas the other was followed by extremely high species richness in the Heliantheae alliance clade. Our results provide different evidence for several WGDs in Asteraceae and reveal distinct association among WGD events, dramatic changes in environment and species radiations, providing a possible scenario for polyploids to overcome the disadvantages of WGDs and to evolve into lineages with high biodiversity.

Journal ArticleDOI
TL;DR: The first extensive phylogenomic analysis of stramenopiles, including representatives of most major lineages, provides a robust phylogenetic framework to investigate the evolution and diversification of this group of ecologically relevant protists.
Abstract: Stramenopiles or heterokonts constitute one of the most speciose and diverse clades of protists. It includes ecologically important algae (such as diatoms or large multicellular brown seaweeds), as well as heterotrophic (e.g., bicosoecids, MAST groups) and parasitic (e.g., Blastocystis, oomycetes) species. Despite their evolutionary and ecological relevance, deep phylogenetic relationships among stramenopile groups, inferred mostly from small-subunit rDNA phylogenies, remain unresolved, especially for the heterotrophic taxa. Taking advantage of recently released stramenopile transcriptome and genome sequences, as well as data from the genomic assembly of the MAST-3 species Incisomonas marina generated in our laboratory, we have carried out the first extensive phylogenomic analysis of stramenopiles, including representatives of most major lineages. Our analyses, based on a large data set of 339 widely distributed proteins, strongly support a root of stramenopiles lying between two clades, Bigyra and Gyrista (Pseudofungi plus Ochrophyta). Additionally, our analyses challenge the Phaeista-Khakista dichotomy of photosynthetic stramenopiles (ochrophytes) as two groups previously considered to be part of the Phaeista (Pelagophyceae and Dictyochophyceae), branch with strong support with the Khakista (Bolidophyceae and Diatomeae). We propose a new classification of ochrophytes within the two groups Chrysista and Diatomista to reflect the new phylogenomic results. Our stramenopile phylogeny provides a robust phylogenetic framework to investigate the evolution and diversification of this group of ecologically relevant protists.

Journal ArticleDOI
TL;DR: As evolved plasmids were able to persist longer in multiple naïve hosts, acquisition of this transposon also expanded the plasmid's host range, which has important implications for the spread of antibiotic resistance.
Abstract: The World Health Organization has declared the emergence of antibiotic resistance to be a global threat to human health. Broad-host-range plasmids have a key role in causing this health crisis because they transfer multiple resistance genes to a wide range of bacteria. To limit the spread of antibiotic resistance, we need to gain insight into the mechanisms by which the host range of plasmids evolves. Although initially unstable plasmids have been shown to improve their persistence through evolution of the plasmid, the host, or both, the means by which this occurs are poorly understood. Here, we sought to identify the underlying genetic basis of expanded plasmid host-range and increased persistence of an antibiotic resistance plasmid using a combined experimental-modeling approach that included whole-genome resequencing, molecular genetics and a plasmid population dynamics model. In nine of the ten previously evolved clones, changes in host and plasmid each slightly improved plasmid persistence, but their combination resulted in a much larger improvement, which indicated positive epistasis. The only genetic change in the plasmid was the acquisition of a transposable element from a plasmid native to the Pseudomonas host used in these studies. The analysis of genetic deletions showed that the critical genes on this transposon encode a putative toxin–antitoxin (TA) and a cointegrate resolution system. As evolved plasmids were able to persist longer in multiple naive hosts, acquisition of this transposon also expanded the plasmid’s host range, which has important implications for the spread of antibiotic resistance.

Journal ArticleDOI
TL;DR: It is demonstrated that statistical theory can be applied to adjust composite likelihoods and perform robust computationally efficient statistical inference in two demographic inference tools: ∂a∂i and TRACTS.
Abstract: Many population genetics tools employ composite likelihoods, because fully modeling genomic linkage is challenging. But traditional approaches to estimating parameter uncertainties and performing model selection require full likelihoods, so these tools have relied on computationally expensive maximum-likelihood estimation (MLE) on bootstrapped data. Here, we demonstrate that statistical theory can be applied to adjust composite likelihoods and perform robust computationally efficient statistical inference in two demographic inference tools: ∂a∂i and TRACTS. On both simulated and real data, the adjustments perform comparably to MLE bootstrapping while using orders of magnitude less computational time.

Journal ArticleDOI
TL;DR: Analysis of population history show that long-term global temperature has strongly influenced the demographic history of A. m.
Abstract: Studying the genetic signatures of climate-driven selection can produce insights into local adaptation and the potential impacts of climate change on populations. The honey bee (Apis mellifera) is an interesting species to study local adaptation because it originated in tropical/subtropical climatic regions and subsequently spread into temperate regions. However, little is known about the genetic basis of its adaptation to temperate climates. Here, we resequenced the whole genomes of ten individual bees from a newly discovered population in temperate China and downloaded resequenced data from 35 individuals from other populations. We found that the new population is an undescribed subspecies in the M-lineage of A. mellifera (Apis mellifera sinisxinyuan). Analyses of population history show that long-term global temperature has strongly influenced the demographic history of A. m. sinisxinyuan and its divergence from other subspecies. Further analyses comparing temperate and tropical populations identified several candidate genes related to fat body and the Hippo signaling pathway that are potentially involved in adaptation to temperate climates. Our results provide insights into the demographic history of the newly discovered A. m. sinisxinyuan, as well as the genetic basis of adaptation of A. mellifera to temperate climates at the genomic level. These findings will facilitate the selective breeding of A. mellifera to improve the survival of overwintering colonies.