scispace - formally typeset
Search or ask a question

Showing papers in "Molecular Biology and Evolution in 2020"


Journal ArticleDOI
TL;DR: Some notable features of IQ-TREE version 2 are described and the key advantages over other software are highlighted.
Abstract: IQ-TREE (http://www.iqtree.org, last accessed February 6, 2020) is a user-friendly and widely used software package for phylogenetic inference using maximum likelihood. Since the release of version 1 in 2014, we have continuously expanded IQ-TREE to integrate a plethora of new models of sequence evolution and efficient computational approaches of phylogenetic inference to deal with genomic data. Here, we describe notable features of IQ-TREE version 2 and highlight the key advantages over other software.

4,337 citations


Journal ArticleDOI
TL;DR: The macOS version of the MEGA software, which eliminates the need for virtualization and emulation programs, has a native Cocoa graphical user interface that is programmed to provide a consistent user experience across macOS, Windows, and Linux.
Abstract: The Molecular Evolutionary Genetics Analysis (MEGA) software enables comparative analysis of molecular sequences in phylogenetics and evolutionary medicine. Here, we introduce the macOS version of the MEGA software. This new version eliminates the need for virtualization and emulation programs previously required to use MEGA on Apple computers. MEGA for macOS utilizes memory and computing resources efficiently for conducting evolutionary analyses on macOS. It has a native Cocoa graphical user interface that is programmed to provide a consistent user experience across macOS, Windows, and Linux. MEGA for macOS is available from www.megasoftware.net free of charge.

896 citations


Journal ArticleDOI
TL;DR: ModelTest-NG is a reimplementation from scratch of jModelTest and ProtTest, two popular tools for selecting the best-fit nucleotide and amino acid substitution models, respectively, and introduces several new features, such as ascertainment bias correction, mixture, and free-rate models, or the automatic processing of single partitions.
Abstract: ModelTest-NG is a reimplementation from scratch of jModelTest and ProtTest, two popular tools for selecting the best-fit nucleotide and amino acid substitution models, respectively. ModelTest-NG is one to two orders of magnitude faster than jModelTest and ProtTest but equally accurate and introduces several new features, such as ascertainment bias correction, mixture, and free-rate models, or the automatic processing of single partitions. ModelTest-NG is available under a GNU GPL3 license at https://github.com/ddarriba/modeltest , last accessed September 2, 2019.

783 citations


Journal ArticleDOI
TL;DR: The treeio package is designed to connect phylogenetic tree input and output, and can link external data to phylogenies and merge tree data obtained from different sources, enabling analyses of phylogeny-associated data from different disciplines in an evolutionary context.
Abstract: Phylogenetic trees and data are often stored in incompatible and inconsistent formats. The outputs of software tools that contain trees with analysis findings are often not compatible with each other, making it hard to integrate the results of different analyses in a comparative study. The treeio package is designed to connect phylogenetic tree input and output. It supports extracting phylogenetic trees as well as the outputs of commonly used analytical software. It can link external data to phylogenies and merge tree data obtained from different sources, enabling analyses of phylogeny-associated data from different disciplines in an evolutionary context. Treeio also supports export of a phylogenetic tree with heterogeneous-associated data to a single tree file, including BEAST compatible NEXUS and jtree formats; these facilitate data sharing as well as file format conversion for downstream analysis. The treeio package is designed to work with the tidytree and ggtree packages. Tree data can be processed using the tidy interface with tidytree and visualized by ggtree. The treeio package is released within the Bioconductor and rOpenSci projects. It is available at https://www.bioconductor.org/packages/treeio/.

274 citations


Journal ArticleDOI
TL;DR: GCF and sCF complement classical measures of branch support in phylogenetics by providing a full description of underlying disagreement among loci and sites, and are implemented in the IQ-TREE software package.
Abstract: We implement two measures for quantifying genealogical concordance in phylogenomic data sets: the gene concordance factor (gCF) and the novel site concordance factor (sCF). For every branch of a reference tree, gCF is defined as the percentage of "decisive" gene trees containing that branch. This measure is already in wide usage, but here we introduce a package that calculates it while accounting for variable taxon coverage among gene trees. sCF is a new measure defined as the percentage of decisive sites supporting a branch in the reference tree. gCF and sCF complement classical measures of branch support in phylogenetics by providing a full description of underlying disagreement among loci and sites. An easy to use implementation and tutorial is freely available in the IQ-TREE software package (http://www.iqtree.org/doc/Concordance-Factor, last accessed May 13, 2020).

267 citations


Journal ArticleDOI
TL;DR: RASP as discussed by the authors is a software to reconstruct ancestral states through phylogenetic trees, which can apply generalized statistical ancestral reconstruction methods to phylogenies, explore the phylogenetic signal of characters to particular trees, calculate distances between trees, and cluster trees into groups.
Abstract: With the continual progress of sequencing techniques, genome-scale data are increasingly used in phylogenetic studies. With more data from throughout the genome, the relationship between genes and different kinds of characters is receiving more attention. Here, we present version 4 of RASP, a software to reconstruct ancestral states through phylogenetic trees. RASP can apply generalized statistical ancestral reconstruction methods to phylogenies, explore the phylogenetic signal of characters to particular trees, calculate distances between trees, and cluster trees into groups. RASP 4 has an improved graphic user interface and is freely available from http://mnh.scu.edu.cn/soft/blog/RASP (program) and https://github.com/sculab/RASP (source code).

259 citations


Journal ArticleDOI
TL;DR: The 2.5 release of Hyphy includes a completely re-engineered computational core and analysis library that introduces new classes of evolutionary models and statistical tests, delivers substantial performance and stability enhancements, improves usability, streamlines end-to-end analysis workflows, makes it easier to develop custom analyses, and is mostly backwards compatible with previous HyPhy releases.
Abstract: HYpothesis testing using PHYlogenies (HyPhy) is a scriptable, open-source package for fitting a broad range of evolutionary models to multiple sequence alignments, and for conducting subsequent parameter estimation and hypothesis testing, primarily in the maximum likelihood statistical framework. It has become a popular choice for characterizing various aspects of the evolutionary process: natural selection, evolutionary rates, recombination, and coevolution. The 2.5 release (available from www.hyphy.org) includes a completely re-engineered computational core and analysis library that introduces new classes of evolutionary models and statistical tests, delivers substantial performance and stability enhancements, improves usability, streamlines end-to-end analysis workflows, makes it easier to develop custom analyses, and is mostly backward compatible with previous HyPhy releases.

252 citations



Journal ArticleDOI
Xuhua Xia1
TL;DR: It is shown that SARS-CoV-2 has the most extreme CpG deficiency in all known betacoronavirus genomes, and viral surveys focused on decreasing C pG in viral RNA genomes may provide important clues about the selective environments and viral defenses in the original hosts.
Abstract: Wild mammalian species, including bats, constitute the natural reservoir of betacoronavirus (including SARS, MERS, and the deadly SARS-CoV-2). Different hosts or host tissues provide different cellular environments, especially different antiviral and RNA modification activities that can alter RNA modification signatures observed in the viral RNA genome. The zinc finger antiviral protein (ZAP) binds specifically to CpG dinucleotides and recruits other proteins to degrade a variety of viral RNA genomes. Many mammalian RNA viruses have evolved CpG deficiency. Increasing CpG dinucleotides in these low-CpG viral genomes in the presence of ZAP consistently leads to decreased viral replication and virulence. Because ZAP exhibits tissue-specific expression, viruses infecting different tissues are expected to have different CpG signatures, suggesting a means to identify viral tissue-switching events. The author shows that SARS-CoV-2 has the most extreme CpG deficiency in all known betacoronavirus genomes. This suggests that SARS-CoV-2 may have evolved in a new host (or new host tissue) with high ZAP expression. A survey of CpG deficiency in viral genomes identified a virulent canine coronavirus (alphacoronavirus) as possessing the most extreme CpG deficiency, comparable with that observed in SARS-CoV-2. This suggests that the canine tissue infected by the canine coronavirus may provide a cellular environment strongly selecting against CpG. Thus, viral surveys focused on decreasing CpG in viral RNA genomes may provide important clues about the selective environments and viral defenses in the original hosts.

145 citations


Journal ArticleDOI
TL;DR: The results suggest that the origin of SLC may predate domestication, and that many traits considered typical of cultivated tomatoes arose in South American SLC, but were lost or diminished once these partially domesticated forms spread northward.
Abstract: The process of plant domestication is often protracted, involving underexplored intermediate stages with important implications for the evolutionary trajectories of domestication traits. Previously, tomato domestication history has been thought to involve two major transitions: one from wild Solanum pimpinellifolium L. to a semidomesticated intermediate, S. lycopersicum L. var. cerasiforme (SLC) in South America, and a second transition from SLC to fully domesticated S. lycopersicum L. var. lycopersicum in Mesoamerica. In this study, we employ population genomic methods to reconstruct tomato domestication history, focusing on the evolutionary changes occurring in the intermediate stages. Our results suggest that the origin of SLC may predate domestication, and that many traits considered typical of cultivated tomatoes arose in South American SLC, but were lost or diminished once these partially domesticated forms spread northward. These traits were then likely reselected in a convergent fashion in the common cultivated tomato, prior to its expansion around the world. Based on these findings, we reveal complexities in the intermediate stage of tomato domestication and provide insight on trajectories of genes and phenotypes involved in tomato domestication syndrome. Our results also allow us to identify underexplored germplasm that harbors useful alleles for crop improvement.

106 citations


Journal ArticleDOI
TL;DR: This work proposes a measure of quartet similarity between single-copy and multicopy trees that accounts for orthology and paralogy and introduces a method called ASTRAL-Pro (ASTRAL for PaRalogs and Orthologs) to find the species tree that optimizes the measure using dynamic programing.
Abstract: Phylogenetic inference from genome-wide data (phylogenomics) has revolutionized the study of evolution because it enables accounting for discordance among evolutionary histories across the genome. To this end, summary methods have been developed to allow accurate and scalable inference of species trees from gene trees. However, most of these methods, including the widely used ASTRAL, can only handle single-copy gene trees and do not attempt to model gene duplication and gene loss. As a result, most phylogenomic studies have focused on single-copy genes and have discarded large parts of the data. Here, we first propose a measure of quartet similarity between single-copy and multicopy trees that accounts for orthology and paralogy. We then introduce a method called ASTRAL-Pro (ASTRAL for PaRalogs and Orthologs) to find the species tree that optimizes our quartet similarity measure using dynamic programing. By studying its performance on an extensive collection of simulated data sets and on real data sets, we show that ASTRAL-Pro is more accurate than alternative methods.

Journal ArticleDOI
TL;DR: The multispecies-coalescent-with-introgression model accommodates deep coalescence and introgression and provides a natural framework for inference using genomic sequence data, and computer simulation confirms the good statistical properties of the method.
Abstract: Recent analyses suggest that cross-species gene flow or introgression is common in nature, especially during species divergences. Genomic sequence data can be used to infer introgression events and to estimate the timing and intensity of introgression, providing an important means to advance our understanding of the role of gene flow in speciation. Here, we implement the multispecies-coalescent-with-introgression model, an extension of the multispecies-coalescent model to incorporate introgression, in our Bayesian Markov chain Monte Carlo program Bpp. The multispecies-coalescent-with-introgression model accommodates deep coalescence (or incomplete lineage sorting) and introgression and provides a natural framework for inference using genomic sequence data. Computer simulation confirms the good statistical properties of the method, although hundreds or thousands of loci are typically needed to estimate introgression probabilities reliably. Reanalysis of data sets from the purple cone spruce confirms the hypothesis of homoploid hybrid speciation. We estimated the introgression probability using the genomic sequence data from six mosquito species in the Anopheles gambiae species complex, which varies considerably across the genome, likely driven by differential selection against introgressed alleles.

Journal ArticleDOI
TL;DR: A theoretical and computational framework to infer the demographic history of a population within the past 100 generations from the observed spectrum of linkage disequilibrium (LD) of pairs of loci over a wide range of recombination rates in a sample of contemporary individuals is developed.
Abstract: Inferring changes in effective population size (Ne) in the recent past is of special interest for conservation of endangered species and for human history research. Current methods for estimating the very recent historical Ne are unable to detect complex demographic trajectories involving multiple episodes of bottlenecks, drops, and expansions. We develop a theoretical and computational framework to infer the demographic history of a population within the past 100 generations from the observed spectrum of linkage disequilibrium (LD) of pairs of loci over a wide range of recombination rates in a sample of contemporary individuals. The cumulative contributions of all of the previous generations to the observed LD are included in our model, and a genetic algorithm is used to search for the sequence of historical Ne values that best explains the observed LD spectrum. The method can be applied from large samples to samples of fewer than ten individuals using a variety of genotyping and DNA sequencing data: haploid, diploid with phased or unphased genotypes and pseudohaploid data from low-coverage sequencing. The method was tested by computer simulation for sensitivity to genotyping errors, temporal heterogeneity of samples, population admixture, and structural division into subpopulations, showing high tolerance to deviations from the assumptions of the model. Computer simulations also show that the proposed method outperforms other leading approaches when the inference concerns recent timeframes. Analysis of data from a variety of human and animal populations gave results in agreement with previous estimations by other methods or with records of historical events.

Journal ArticleDOI
TL;DR: The results reinforce the role of ancestral hybridization in explosive diversification by demonstrating its significance in one of the largest recent vertebrate adaptive radiations.
Abstract: The adaptive radiation of cichlid fishes in East African Lake Malawi encompasses over 500 species that are believed to have evolved within the last 800,000 years from a common founder population. It has been proposed that hybridization between ancestral lineages can provide the genetic raw material to fuel such exceptionally high diversification rates, and evidence for this has recently been presented for the Lake Victoria region cichlid superflock. Here, we report that Lake Malawi cichlid genomes also show evidence of hybridization between two lineages that split 3-4 Ma, today represented by Lake Victoria cichlids and the riverine Astatotilapia sp. "ruaha blue." The two ancestries in Malawi cichlid genomes are present in large blocks of several kilobases, but there is little variation in this pattern between Malawi cichlid species, suggesting that the large-scale mosaic structure of the genomes was largely established prior to the radiation. Nevertheless, tens of thousands of polymorphic variants apparently derived from the hybridization are interspersed in the genomes. These loci show a striking excess of differentiation across ecological subgroups in the Lake Malawi cichlid assemblage, and parental alleles sort differentially into benthic and pelagic Malawi cichlid lineages, consistent with strong differential selection on these loci during species divergence. Furthermore, these loci are enriched for genes involved in immune response and vision, including opsin genes previously identified as important for speciation. Our results reinforce the role of ancestral hybridization in explosive diversification by demonstrating its significance in one of the largest recent vertebrate adaptive radiations.

Journal ArticleDOI
TL;DR: The pcadapt package as mentioned in this paper is a R package for performing genome scans for local adaptation, which substantially improves computational efficiency by using a different format for storing genotypes and a different algorithm for computing principal components of the genotype matrix.
Abstract: R package pcadapt is a user-friendly R package for performing genome scans for local adaptation. Here, we present version 4 of pcadapt which substantially improves computational efficiency while providing similar results. This improvement is made possible by using a different format for storing genotypes and a different algorithm for computing principal components of the genotype matrix, which is the most computationally demanding step in method pcadapt. These changes are seamlessly integrated into the existing pcadapt package, and users will experience a large reduction in computation time (by a factor of 20-60 in our analyses) as compared with previous versions.

Journal ArticleDOI
TL;DR: These analyses uncover longitudinal population structure, provide evidence for continent-wide selective sweeps, identify candidate genes for local climate adaptation, and document clines in chromosomal inversion and transposable element frequencies in European Drosophila melanogaster.
Abstract: Genetic variation is the fuel of evolution, with standing genetic variation especially important for short-term evolution and local adaptation. To date, studies of spatiotemporal patterns of genetic variation in natural populations have been challenging, as comprehensive sampling is logistically difficult, and sequencing of entire populations costly. Here, we address these issues using a collaborative approach, sequencing 48 pooled population samples from 32 locations, and perform the first continent-wide genomic analysis of genetic variation in European Drosophila melanogaster. Our analyses uncover longitudinal population structure, provide evidence for continent-wide selective sweeps, identify candidate genes for local climate adaptation, and document clines in chromosomal inversion and transposable element frequencies. We also characterize variation among populations in the composition of the fly microbiome, and identify five new DNA viruses in our samples.

Journal ArticleDOI
TL;DR: ReLERNN is described, a deep learning method for estimating a genome-wide recombination map that is accurate even with small numbers of pooled or individually sequenced genomes and maintains high accuracy in the face of demographic model misspecification, missing genotype calls, and genome inaccessibility.
Abstract: Accurately inferring the genome-wide landscape of recombination rates in natural populations is a central aim in genomics, as patterns of linkage influence everything from genetic mapping to understanding evolutionary history. Here, we describe recombination landscape estimation using recurrent neural networks (ReLERNN), a deep learning method for estimating a genome-wide recombination map that is accurate even with small numbers of pooled or individually sequenced genomes. Rather than use summaries of linkage disequilibrium as its input, ReLERNN takes columns from a genotype alignment, which are then modeled as a sequence across the genome using a recurrent neural network. We demonstrate that ReLERNN improves accuracy and reduces bias relative to existing methods and maintains high accuracy in the face of demographic model misspecification, missing genotype calls, and genome inaccessibility. We apply ReLERNN to natural populations of African Drosophila melanogaster and show that genome-wide recombination landscapes, although largely correlated among populations, exhibit important population-specific differences. Lastly, we connect the inferred patterns of recombination with the frequencies of major inversions segregating in natural Drosophila populations.

Journal ArticleDOI
TL;DR: GeneRax is the first maximum likelihood species-tree-aware phylogenetic inference software that simultaneously accounts for substitutions at the sequence level as well as gene level events, such as duplication, transfer, and loss relying on established maximum likelihood optimization algorithms.
Abstract: Inferring phylogenetic trees for individual homologous gene families is difficult because alignments are often too short, and thus contain insufficient signal, while substitution models inevitably fail to capture the complexity of the evolutionary processes. To overcome these challenges, species-tree-aware methods also leverage information from a putative species tree. However, only few methods are available that implement a full likelihood framework or account for horizontal gene transfers. Furthermore, these methods often require expensive data preprocessing (e.g., computing bootstrap trees) and rely on approximations and heuristics that limit the degree of tree space exploration. Here, we present GeneRax, the first maximum likelihood species-tree-aware phylogenetic inference software. It simultaneously accounts for substitutions at the sequence level as well as gene level events, such as duplication, transfer, and loss relying on established maximum likelihood optimization algorithms. GeneRax can infer rooted phylogenetic trees for multiple gene families, directly from the per-gene sequence alignments and a rooted, yet undated, species tree. We show that compared with competing tools, on simulated data GeneRax infers trees that are the closest to the true tree in 90% of the simulations in terms of relative Robinson-Foulds distance. On empirical data sets, GeneRax is the fastest among all tested methods when starting from aligned sequences, and it infers trees with the highest likelihood score, based on our model. GeneRax completed tree inferences and reconciliations for 1,099 Cyanobacteria families in 8 min on 512 CPU cores. Thus, its parallelization scheme enables large-scale analyses. GeneRax is available under GNU GPL at https://github.com/BenoitMorel/GeneRax (last accessed June 17, 2020).

Journal ArticleDOI
TL;DR: The complexity of TE responsiveness to stress across genetic backgrounds and genomic locations demonstrates substantial intra-specific genetic variation to control TEs with consequences for virulence.
Abstract: Transposable elements (TEs) are drivers of genome evolution and affect the expression landscape of the host genome. Stress is a major factor inducing TE activity; however, the regulatory mechanisms underlying de-repression are poorly understood. Plant pathogens are excellent models to dissect the impact of stress on TEs. The process of plant infection induces stress for the pathogen, and virulence factors (i.e., effectors) located in TE-rich regions become expressed. To dissect TE de-repression dynamics and contributions to virulence, we analyzed the TE expression landscape of four strains of the major wheat pathogen Zymoseptoria tritici. We experimentally exposed strains to nutrient starvation and host infection stress. Contrary to expectations, we show that the two distinct conditions induce the expression of different sets of TEs. In particular, the most highly expressed TEs, including miniature inverted-repeat transposable element and long terminal repeat-Gypsy element, show highly distinct de-repression across stress conditions. Both the genomic context of TEs and the genetic background stress (i.e., different strains harboring the same TEs) were major predictors of de-repression under stress. Gene expression profiles under stress varied significantly depending on the proximity to the closest TEs and genomic defenses against TEs were largely ineffective to prevent de-repression. Next, we analyzed the locus encoding the Avr3D1 effector. We show that the insertion and subsequent silencing of TEs in close proximity likely contributed to reduced expression and virulence on a specific wheat cultivar. The complexity of TE responsiveness to stress across genetic backgrounds and genomic locations demonstrates substantial intraspecific genetic variation to control TEs with consequences for virulence.

Journal ArticleDOI
TL;DR: An Aptian (Early Cretaceous) origin of asterids and the origin of all orders before the K-Pg boundary is supported and Ancestral state reconstruction at the family level suggests that the asterid ancestor was a woody terrestrial plant with simple leaves, bisexual and actinomorphic flowers with free petals and free anthers.
Abstract: Asterids are one of the most successful angiosperm lineages, exhibiting extensive morphological diversity and including a number of important crops. Despite their biological prominence and value to humans, the deep asterid phylogeny has not been fully resolved, and the evolutionary landscape underlying their radiation remains unknown. To resolve the asterid phylogeny, we sequenced 213 transcriptomes/genomes and combined them with other data sets, representing all accepted orders and nearly all families of asterids. We show fully supported monophyly of asterids, Berberidopsidales as sister to asterids, monophyly of all orders except Icacinales, Aquifoliales, and Bruniales, and monophyly of all families except Icacinaceae and Ehretiaceae. Novel taxon placements benefited from the expanded sampling with living collections from botanical gardens, resolving hitherto uncertain relationships. The remaining ambiguous placements here are likely due to limited sampling and could be addressed in the future with relevant additional taxa. Using our well-resolved phylogeny as reference, divergence time estimates support an Aptian (Early Cretaceous) origin of asterids and the origin of all orders before the Cretaceous-Paleogene boundary. Ancestral state reconstruction at the family level suggests that the asterid ancestor was a woody terrestrial plant with simple leaves, bisexual, and actinomorphic flowers with free petals and free anthers, a superior ovary with a style, and drupaceous fruits. Whole-genome duplication (WGD) analyses provide strong evidence for 33 WGDs in asterids and one in Berberidopsidales, including four suprafamilial and seven familial/subfamilial WGDs. Our results advance the understanding of asterid phylogeny and provide numerous novel evolutionary insights into their diversification and morphological evolution.

Journal ArticleDOI
TL;DR: An integrated analysis of full genome sequence data from 21 newly sequenced viruses, along with comprehensive epidemiological surveillance data collected globally over the last 15 years found four distinct phylogenetic lineages of PDCoV, which differ in their geographic circulation patterns.
Abstract: The emergence of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has shown once again that coronavirus (CoV) in animals are potential sources for epidemics in humans. Porcine deltacoronavirus (PDCoV) is an emerging enteropathogen of swine with a worldwide distribution. Here, we implemented and described an approach to analyze the epidemiology of PDCoV following its emergence in the pig population. We performed an integrated analysis of full genome sequence data from 21 newly sequenced viruses, along with comprehensive epidemiological surveillance data collected globally over the last 15 years. We found four distinct phylogenetic lineages of PDCoV, which differ in their geographic circulation patterns. Interestingly, we identified more frequent intra- and interlineage recombination and higher virus genetic diversity in the Chinese lineages compared with the USA lineage where pigs are raised in different farming systems and ecological environments. Most recombination breakpoints are located in the ORF1ab gene rather than in genes encoding structural proteins. We also identified five amino acids under positive selection in the spike protein suggesting a role for adaptive evolution. According to structural mapping, three positively selected sites are located in the N-terminal domain of the S1 subunit, which is the most likely involved in binding to a carbohydrate receptor, whereas the other two are located in or near the fusion peptide of the S2 subunit and thus might affect membrane fusion. Finally, our phylogeographic investigations highlighted notable South-North transmission as well as frequent long-distance dispersal events in China that could implicate human-mediated transmission. Our findings provide new insights into the evolution and dispersal of PDCoV that contribute to our understanding of the critical factors involved in CoVs emergence.

Journal ArticleDOI
TL;DR: After a WGD genes that returned to single copies show the highest levels and breadth of expression, gene body methylation, and intron numbers, whereas the long-retained duplicates exhibit the highest degrees of protein–protein interactions and protein lengths and the lowest methylation in gene flanking regions.
Abstract: For most sequenced flowering plants, multiple whole-genome duplications (WGDs) are found. Duplicated genes following WGD often have different fates that can quickly disappear again, be retained for long(er) periods, or subsequently undergo small-scale duplications. However, how different expression, epigenetic regulation, and functional constraints are associated with these different gene fates following a WGD still requires further investigation due to successive WGDs in angiosperms complicating the gene trajectories. In this study, we investigate lotus (Nelumbo nucifera), an angiosperm with a single WGD during the K-pg boundary. Based on improved intraspecific-synteny identification by a chromosome-level assembly, transcriptome, and bisulfite sequencing, we explore not only the fundamental distinctions in genomic features, expression, and methylation patterns of genes with different fates after a WGD but also the factors that shape post-WGD expression divergence and expression bias between duplicates. We found that after a WGD genes that returned to single copies show the highest levels and breadth of expression, gene body methylation, and intron numbers, whereas the long-retained duplicates exhibit the highest degrees of protein-protein interactions and protein lengths and the lowest methylation in gene flanking regions. For those long-retained duplicate pairs, the degree of expression divergence correlates with their sequence divergence, degree in protein-protein interactions, and expression level, whereas their biases in expression level reflecting subgenome dominance are associated with the bias of subgenome fractionation. Overall, our study on the paleopolyploid nature of lotus highlights the impact of different functional constraints on gene fate and duplicate divergence following a single WGD in plant.

Journal ArticleDOI
TL;DR: It is found that Vietnamese ethnolinguistic groups harbor multiple sources of genetic diversity that likely reflect different sources for the ancestry associated with each language family, including Austro-Asiatic groups that shifted to Austronesian languages during the past 2,500 years.
Abstract: Vietnam features extensive ethnolinguistic diversity and occupies a key position in Mainland Southeast Asia. Yet, the genetic diversity of Vietnam remains relatively unexplored, especially with genome-wide data, because previous studies have focused mainly on the majority Kinh group. Here, we analyze newly generated genome-wide single-nucleotide polymorphism data for the Kinh and 21 additional ethnic groups in Vietnam, encompassing all five major language families in Mainland Southeast Asia. In addition to analyzing the allele and haplotype sharing within the Vietnamese groups, we incorporate published data from both nearby modern populations and ancient samples for comparison. In contrast to previous studies that suggested a largely indigenous origin for Vietnamese genetic diversity, we find that Vietnamese ethnolinguistic groups harbor multiple sources of genetic diversity that likely reflect different sources for the ancestry associated with each language family. However, linguistic diversity does not completely match genetic diversity: There have been extensive interactions between the Hmong-Mien and Tai-Kadai groups; different Austro-Asiatic groups show different affinities with other ethnolinguistic groups; and we identified a likely case of cultural diffusion in which some Austro-Asiatic groups shifted to Austronesian languages during the past 2,500 years. Overall, our results highlight the importance of genome-wide data from dense sampling of ethnolinguistic groups in providing new insights into the genetic diversity and history of an ethnolinguistically diverse region, such as Vietnam.

Journal ArticleDOI
TL;DR: The results indicate that BETS is an effective alternative to other tests of temporal signal, which has the key advantage of allowing a coherent assessment of the entire model, including the molecular clock and tree prior which are essential aspects of Bayesian phylodynamic analyses.
Abstract: Phylogenetic methods can use the sampling times of molecular sequence data to calibrate the molecular clock, enabling the estimation of evolutionary rates and timescales for rapidly evolving pathogens and data sets containing ancient DNA samples A key aspect of such calibrations is whether a sufficient amount of molecular evolution has occurred over the sampling time window, that is, whether the data can be treated as having come from a measurably evolving population Here, we investigate the performance of a fully Bayesian evaluation of temporal signal (BETS) in sequence data The method involves comparing the fit to the data of two models: a model in which the data are accompanied by the actual (heterochronous) sampling times, and a model in which the samples are constrained to be contemporaneous (isochronous) We conducted simulations under a wide range of conditions to demonstrate that BETS accurately classifies data sets according to whether they contain temporal signal or not, even when there is substantial among-lineage rate variation We explore the behavior of this classification in analyses of five empirical data sets: modern samples of A/H1N1 influenza virus, the bacterium Bordetella pertussis, coronaviruses from mammalian hosts, ancient DNA from Hepatitis B virus, and mitochondrial genomes of dog species Our results indicate that BETS is an effective alternative to other tests of temporal signal In particular, this method has the key advantage of allowing a coherent assessment of the entire model, including the molecular clock and tree prior which are essential aspects of Bayesian phylodynamic analyses

Journal ArticleDOI
TL;DR: Structural variants are discovered across a population sample of 347 high-coverage, resequenced genomes of Asian rice and its wild ancestor and detected hundreds of genes gained and lost during domestication, some of which were enriched for traits of agronomic interest.
Abstract: Structural variants (SVs) are a largely unstudied feature of plant genome evolution, despite the fact that SVs contribute substantially to phenotypes. In this study, we discovered SVs across a population sample of 347 high-coverage, resequenced genomes of Asian rice (Oryza sativa) and its wild ancestor (O. rufipogon). In addition to this short-read data set, we also inferred SVs from whole-genome assemblies and long-read data. Comparisons among data sets revealed different features of genome variability. For example, genome alignment identified a large (∼4.3 Mb) inversion in indica rice varieties relative to japonica varieties, and long-read analyses suggest that ∼9% of genes from the outgroup (O. longistaminata) are hemizygous. We focused, however, on the resequencing sample to investigate the population genomics of SVs. Clustering analyses with SVs recapitulated the rice cultivar groups that were also inferred from SNPs. However, the site-frequency spectrum of each SV type-which included inversions, duplications, deletions, translocations, and mobile element insertions-was skewed toward lower frequency variants than synonymous SNPs, suggesting that SVs may be predominantly deleterious. Among transposable elements, SINE and mariner insertions were found at especially low frequency. We also used SVs to study domestication by contrasting between rice and O. rufipogon. Cultivated genomes contained ∼25% more derived SVs and mobile element insertions than O. rufipogon, indicating that SVs contribute to the cost of domestication in rice. Peaks of SV divergence were enriched for known domestication genes, but we also detected hundreds of genes gained and lost during domestication, some of which were enriched for traits of agronomic interest.

Journal ArticleDOI
TL;DR: A comprehensive phylogenetic analysis of insect InR sequences in 118 species from 23 orders is presented and the role of three InRs identified in the linden bug, Pyrrhocoris apterus, in wing polymorphism control is investigated, suggesting an independent establishment of insulin/insulin-like growth factor signaling control over wing development.
Abstract: Evidence accumulates that the functional plasticity of insulin and insulin-like growth factor signaling in insects could spring, among others, from the multiplicity of insulin receptors (InRs). Their multiple variants may be implemented in the control of insect polyphenism, such as wing or caste polyphenism. Here, we present a comprehensive phylogenetic analysis of insect InR sequences in 118 species from 23 orders and investigate the role of three InRs identified in the linden bug, Pyrrhocoris apterus, in wing polymorphism control. We identified two gene clusters (Clusters I and II) resulting from an ancestral duplication in a late ancestor of winged insects, which remained conserved in most lineages, only in some of them being subject to further duplications or losses. One remarkable yet neglected feature of InR evolution is the loss of the tyrosine kinase catalytic domain, giving rise to decoys of InR in both clusters. Within the Cluster I, we confirmed the presence of the secreted decoy of insulin receptor in all studied Muscomorpha. More importantly, we described a new tyrosine kinase-less gene (DR2) in the Cluster II, conserved in apical Holometabola for ∼300 My. We differentially silenced the three P. apterus InRs and confirmed their participation in wing polymorphism control. We observed a pattern of Cluster I and Cluster II InRs impact on wing development, which differed from that postulated in planthoppers, suggesting an independent establishment of insulin/insulin-like growth factor signaling control over wing development, leading to idiosyncrasies in the co-option of multiple InRs in polyphenism control in different taxa.

Journal ArticleDOI
TL;DR: The worst-case scenario for an invasive species is documented, in which there are now two pest species instead of one, and the native species has acquired resistance to pyrethroid insecticides through introgression.
Abstract: Hybridization between invasive and native species has raised global concern, given the dramatic increase in species range shifts and pest outbreaks due to anthropogenic dispersal. Nevertheless, secondary contact between sister lineages of local and invasive species provides a natural laboratory to understand the factors that determine introgression and the maintenance or loss of species barriers. Here, we characterize the early evolutionary outcomes following secondary contact between invasive Helicoverpa armigera and native H. zea in Brazil. We carried out whole-genome resequencing of Helicoverpa moths from Brazil in two temporal samples: during the outbreak of H. armigera in 2013 and 2017. There is evidence for a burst of hybridization and widespread introgression from local H. zea into invasive H. armigera coinciding with H. armigera expansion in 2013. However, in H. armigera, the admixture proportion and the length of introgressed blocks were significantly reduced between 2013 and 2017, suggesting selection against admixture. In contrast to the genome-wide pattern, there was striking evidence for adaptive introgression of a single region from the invasive H. armigera into local H. zea, including an insecticide resistance allele that increased in frequency over time. In summary, despite extensive gene flow after secondary contact, the species boundaries are largely maintained except for the single introgressed region containing the insecticide-resistant locus. We document the worst-case scenario for an invasive species, in which there are now two pest species instead of one, and the native species has acquired resistance to pyrethroid insecticides through introgression.

Journal ArticleDOI
TL;DR: HLA heterozygotes were also more likely to carry certain HLA alleles, including the highly protective HLA-B*57:01 variant, indicating that HLAheterozygote advantage ultimately results from a combination of quantitative and qualitative effects in antigen presentation.
Abstract: Pathogen-mediated balancing selection is regarded as a key driver of host immunogenetic diversity. A hallmark for balancing selection in humans is the heterozygote advantage at genes of the human leukocyte antigen (HLA), resulting in improved HIV-1 control. However, the actual mechanism of the observed heterozygote advantage is still elusive. HLA heterozygotes may present a broader array of antigenic viral peptides to immune cells, possibly resulting in a more efficient cytotoxic T-cell response. Alternatively, heterozygosity may simply increase the chance to carry the most protective HLA alleles, as individual HLA alleles are known to differ substantially in their association with HIV-1 control. Here, we used data from 6,311 HIV-1-infected individuals to explore the relative contribution of quantitative and qualitative aspects of peptide presentation in HLA heterozygote advantage against HIV. Screening the entire HIV-1 proteome, we observed that heterozygous individuals exhibited a broader array of HIV-1 peptides presented by their HLA class I alleles. In addition, viral load was negatively correlated with the breadth of the HIV-1 peptide repertoire bound by an individual's HLA variants, particularly at HLA-B. This suggests that heterozygote advantage at HLA-B is at least in part mediated by quantitative peptide presentation. We also observed higher HIV-1 sequence diversity among HLA-B heterozygous individuals, suggesting stronger evolutionary pressure from HLA heterozygosity. However, HLA heterozygotes were also more likely to carry certain HLA alleles, including the highly protective HLA-B*57:01 variant, indicating that HLA heterozygote advantage ultimately results from a combination of quantitative and qualitative effects in antigen presentation.

Journal ArticleDOI
TL;DR: It is shown that the McdAB system is widespread among β-cyanobacteria, often clustering with carboxysome-related components, and is absent in α-cyAnobacteria.
Abstract: Carboxysomes are protein-based organelles that are essential for allowing cyanobacteria to fix CO2. Previously, we identified a two-component system, McdAB, responsible for equidistantly positioning carboxysomes in the model cyanobacterium Synechococcus elongatus PCC 7942 (MacCready JS, Hakim P, Young EJ, Hu L, Liu J, Osteryoung KW, Vecchiarelli AG, Ducat DC. 2018. Protein gradients on the nucleoid position the carbon-fixing organelles of cyanobacteria. eLife 7:pii:e39723). McdA, a ParA-type ATPase, nonspecifically binds the nucleoid in the presence of ATP. McdB, a novel factor that directly binds carboxysomes, displaces McdA from the nucleoid. Removal of McdA from the nucleoid in the vicinity of carboxysomes by McdB causes a global break in McdA symmetry, and carboxysome motion occurs via a Brownian-ratchet-based mechanism toward the highest concentration of McdA. Despite the importance for cyanobacteria to properly position their carboxysomes, whether the McdAB system is widespread among cyanobacteria remains an open question. Here, we show that the McdAB system is widespread among β-cyanobacteria, often clustering with carboxysome-related components, and is absent in α-cyanobacteria. Moreover, we show that two distinct McdAB systems exist in β-cyanobacteria, with Type 2 systems being the most ancestral and abundant, and Type 1 systems, like that of S. elongatus, possibly being acquired more recently. Lastly, all McdB proteins share the sequence signatures of a protein capable of undergoing liquid-liquid phase separation. Indeed, we find that representatives of both McdB types undergo liquid-liquid phase separation in vitro, the first example of a ParA-type ATPase partner protein to exhibit this behavior. Our results have broader implications for understanding carboxysome evolution, biogenesis, homeostasis, and positioning in cyanobacteria.

Journal ArticleDOI
TL;DR: The genome response of the spotted wing drosophila Drosophila suzukii is characterized during the worldwide invasion of this pest insect species, by conducting a genome-wide association study to identify genes involved in adaptive processes during invasion.
Abstract: Evidence is accumulating that evolutionary changes are not only common during biological invasions but may also contribute directly to invasion success. The genomic basis of such changes is still largely unexplored. Yet, understanding the genomic response to invasion may help to predict the conditions under which invasiveness can be enhanced or suppressed. Here we characterized the genome response of the spotted wing drosophila Drosophila suzukii during the worldwide invasion of this pest insect species, by conducting a genome-wide association study to identify genes involved in adaptive processes during invasion. Genomic data from 22 population samples were analyzed to detect genetic variants associated with the status (invasive versus native) of the sampled populations based on a newly developed statistic, we called C 2 , that contrasts allele frequencies corrected for population structure. We evaluated this new statistical framework using simulated data sets and implemented it in an upgraded version of the program BayPass. We identified a relatively small set of single nucleotide polymorphisms (SNPs) that show a highly significant association with the invasive status of D. suzukii populations. In particular, two genes, RhoGEF64C and cpo, contained SNPs significantly associated with the invasive status in the two separate main invasion routes of D. suzukii. Our methodological approaches can be applied to any other invasive species, and more generally to any evolutionary model for species characterized by non-equilibrium demographic conditions for which binary covariables of interest can be defined at the population level.