scispace - formally typeset
Search or ask a question

Showing papers in "Genome Biology and Evolution in 2019"


Journal ArticleDOI
TL;DR: A data set of 1,311 high-quality genomes from the human pathogen Pseudomonas aeruginosa is used to show that a pan-genomic approach can greatly refine the population structure of bacterial species, provide new insights to define species boundaries, and generate hypotheses on the evolution of pathogenicity.
Abstract: The huge increase in the availability of bacterial genomes led us to a point in which we can investigate and query pan-genomes, for example, the full set of genes of a given bacterial species or clade. Here, we used a data set of 1,311 high-quality genomes from the human pathogen Pseudomonas aeruginosa, 619 of which were newly sequenced, to show that a pan-genomic approach can greatly refine the population structure of bacterial species, provide new insights to define species boundaries, and generate hypotheses on the evolution of pathogenicity. The 665-gene P. aeruginosa core genome presented here, which constitutes only 1% of the entire pan-genome, is the first to be in the same order of magnitude as the minimal bacterial genome and represents a conservative estimate of the actual core genome. Moreover, the phylogeny based on this core genome provides strong evidence for a five-group population structure that includes two previously undescribed groups of isolates. Comparative genomics focusing on antimicrobial resistance and virulence genes showed that variation among isolates was partly linked to this population structure. Finally, we hypothesized that horizontal gene transfer had an important role in this respect, and found a total of 3,010 putative complete and fragmented plasmids, 5% and 12% of which contained resistance or virulence genes, respectively. This work provides data and strategies to study the evolutionary trajectories of resistance and virulence in P. aeruginosa.

178 citations


Journal ArticleDOI
TL;DR: Key insights are synthesized into the spontaneous mutation process that are rapidly emerging from the partnering of classical MA experiments with high-throughput sequencing, with particular emphasis on the spontaneous rates and molecular properties of different mutational classes in nuclear and mitochondrial genomes of diverse taxa.
Abstract: Mutations spawn genetic variation which, in turn, fuels evolution. Hence, experimental investigations into the rate and fitness effects of spontaneous mutations are central to the study of evolution. Mutation accumulation (MA) experiments have served as a cornerstone for furthering our understanding of spontaneous mutations for four decades. In the pregenomic era, phenotypic measurements of fitness-related traits in MA lines were used to indirectly estimate key mutational parameters, such as the genomic mutation rate, new mutational variance per generation, and the average fitness effect of mutations. Rapidly emerging next-generating sequencing technology has supplanted this phenotype-dependent approach, enabling direct empirical estimates of the mutation rate and a more nuanced understanding of the relative contributions of different classes of mutations to the standing genetic variation. Whole-genome sequencing of MA lines bears immense potential to provide a unified account of the evolutionary process at multiple levels-the genetic basis of variation, and the evolutionary dynamics of mutations under the forces of selection and drift. In this review, we have attempted to synthesize key insights into the spontaneous mutation process that are rapidly emerging from the partnering of classical MA experiments with high-throughput sequencing, with particular emphasis on the spontaneous rates and molecular properties of different mutational classes in nuclear and mitochondrial genomes of diverse taxa, the contribution of mutations to the evolution of gene expression, and the rate and stability of transgenerational epigenetic modifications. Future advances in sequencing technologies will enable greater species representation to further refine our understanding of mutational parameters and their functional consequences.

97 citations


Journal ArticleDOI
TL;DR: The maximal matched-pairs tests of homogeneity are introduced and applied to assess the scale and impact of SRH model violations on 3,572 partitions from 35 published phylogenetic data sets and suggest that the extent and effects of model violation in phylogenetics may be substantial.
Abstract: In phylogenetic inference, we commonly use models of substitution which assume that sequence evolution is stationary, reversible, and homogeneous (SRH). Although the use of such models is often criticized, the extent of SRH violations and their effects on phylogenetic inference of tree topologies and edge lengths are not well understood. Here, we introduce and apply the maximal matched-pairs tests of homogeneity to assess the scale and impact of SRH model violations on 3,572 partitions from 35 published phylogenetic data sets. We show that roughly one-quarter of all the partitions we analyzed (23.5%) reject the SRH assumptions, and that for 25% of data sets, tree topologies inferred from all partitions differ significantly from topologies inferred using the subset of partitions that do not reject the SRH assumptions. This proportion increases when comparing trees inferred using the subset of partitions that rejects the SRH assumptions, to those inferred from partitions that do not reject the SRH assumptions. These results suggest that the extent and effects of model violation in phylogenetics may be substantial. They highlight the importance of testing for model violations and possibly excluding partitions that violate models prior to tree reconstruction. Our results also suggest that further effort in developing models that do not require SRH assumptions could lead to large improvements in the accuracy of phylogenomic inference. The scripts necessary to perform the analysis are available in https://github.com/roblanf/SRHtests, and the new tests we describe are available as a new option in IQ-TREE (http://www.iqtree.org).

91 citations


Journal ArticleDOI
TL;DR: Mapping key crustacean tagmosis patterns and developmental characters across the revised phylogeny suggests that the ancestral pancrustacean was relatively short-bodied, with extreme body elongation and anamorphic development emerging later in pancrustACEan evolution.
Abstract: The relationships of crustaceans and hexapods (Pancrustacea) have been much discussed and partially elucidated following the emergence of phylogenomic data sets. However, major uncertainties still remain regarding the position of iconic taxa such as Branchiopoda, Copepoda, Remipedia, and Cephalocarida, and the sister group relationship of hexapods. We assembled the most taxon-rich phylogenomic pancrustacean data set to date and analyzed it using a variety of methodological approaches. We prioritized low levels of missing data and found that some clades were consistently recovered independently of the analytical approach used. These include, for example, Oligostraca and Altocrustacea. Substantial support was also found for Allotriocarida, with Remipedia as the sister of Hexapoda (i.e., Labiocarida), and Branchiopoda as the sister of Labiocarida, a clade that we name Athalassocarida (="nonmarine shrimps"). Within Allotriocarida, Cephalocarida was found as the sister of Athalassocarida. Finally, moderate support was found for Hexanauplia (Copepoda as sister to Thecostraca) in alliance with Malacostraca. Mapping key crustacean tagmosis patterns and developmental characters across the revised phylogeny suggests that the ancestral pancrustacean was relatively short-bodied, with extreme body elongation and anamorphic development emerging later in pancrustacean evolution.

69 citations


Journal ArticleDOI
TL;DR: The data provide new insight into the TPS family expansion and evolution and suggest that TPSs might have originated from isoprenyl diphosphate synthase genes.
Abstract: Terpenes are organic compounds and play important roles in plant growth and development as well as in mediating interactions of plants with the environment. Terpene synthases (TPSs) are the key enzymes responsible for the biosynthesis of terpenes. Although some species were employed for the genome-wide identification and characterization of the TPS family, limited information is available regarding the evolution, expansion, and retention mechanisms occurring in this gene family. We performed a genome-wide identification of the TPS family members in 50 sequenced genomes. Additionally, we also characterized the TPS family from aromatic spearmint and basil plants using RNA-Seq data. No TPSs were identified in algae genomes but the remaining plant species encoded various numbers of the family members ranging from 2 to 79 full-length TPSs. Some species showed lineage-specific expansion of certain subfamilies, which might have contributed toward species or ecotype divergence or environmental adaptation. A large-scale family expansion was observed mainly in dicot and monocot plants, which was accompanied by frequent domain loss. Both tandem and segmental duplication significantly contributed toward family expansion and expression divergence and played important roles in the survival of these expanded genes. Our data provide new insight into the TPS family expansion and evolution and suggest that TPSs might have originated from isoprenyl diphosphate synthase genes.

64 citations


Journal ArticleDOI
TL;DR: The authors performed a comprehensive synteny and phylogenetic analyses of the eight gene families across 60 complete plant genomes and revealed that synteny conservation and diversification contributed to LEA family expansion and functional diversification in plants.
Abstract: Late embryogenesis abundant (LEA) proteins include eight multigene families that are expressed in response to water loss during seed maturation and in vegetative tissues of desiccation tolerant species. To elucidate LEA proteins evolution and diversification, we performed a comprehensive synteny and phylogenetic analyses of the eight gene families across 60 complete plant genomes. Our integrated comparative genomic approach revealed that synteny conservation and diversification contributed to LEA family expansion and functional diversification in plants. We provide examples that: 1) the genomic diversification of the Dehydrin family contributed to differential evolution of amino acid sequences, protein biochemical properties, and gene expression patterns, and led to the appearance of a novel functional motif in angiosperms; 2) ancient genomic diversification contributed to the evolution of distinct intrinsically disordered regions of LEA_1 proteins; 3) recurrent tandem-duplications contributed to the large expansion of LEA_2; and 4) dynamic synteny diversification played a role on the evolution of LEA_4 and its function on plant desiccation tolerance. Taken together, these results show that multiple evolutionary mechanisms have not only led to genomic diversification but also to structural and functional plasticity among LEA proteins which have jointly contributed to the adaptation of plants to water-limiting environments.

62 citations


Journal ArticleDOI
TL;DR: Estimates of copy number at the MHC for over 250 bird species from 68 families suggest that MHC copy number evolution in birds has been driven by life histories and differences in exposure to intra- and extracellular pathogens.
Abstract: The evolution of the major histocompatibility complex (MHC) is shaped by frequent gene duplications and deletions, which generate extensive variation in the number of loci (gene copies) between different taxa. Here, we collected estimates of copy number at the MHC for over 250 bird species from 68 families. We found contrasting patterns of copy number evolution between MHC class I and class IIB, which encode receptors for intra- and extracellular pathogens, respectively. Across the avian evolutionary tree, there was evidence of accelerated evolution and stabilizing selection acting on copy number at class I, while copy number at class IIB was primarily influenced by fluctuating selection and drift. Reconstruction of MHC copy number variation showed ancestrally low numbers of MHC loci in nonpasserines and evolution toward larger numbers of loci in passerines. Different passerine lineages had the highest duplication rates for MHC class I (Sylvioidea) and class IIB (Muscicapoidea and Passeroidea). We also found support for the correlated evolution of MHC copy number and life-history traits such as lifespan and migratory behavior. These results suggest that MHC copy number evolution in birds has been driven by life histories and differences in exposure to intra- and extracellular pathogens.

62 citations


Journal ArticleDOI
TL;DR: A comparison of this genome assembly with that of Ciona savignyi, a different species in the same genus, revealed many chromosomal inversions between these two Ciona species, suggesting that such inversions have occurred frequently and have contributed to chromosomal evolution of C Fiona species.
Abstract: Since its initial publication in 2002, the genome of Ciona intestinalis type A (Ciona robusta), the first genome sequence of an invertebrate chordate, has provided a valuable resource for a wide range of biological studies, including developmental biology, evolutionary biology, and neuroscience. The genome assembly was updated in 2008, and it included 68% of the sequence information in 14 pairs of chromosomes. However, a more contiguous genome is required for analyses of higher order genomic structure and of chromosomal evolution. Here, we provide a new genome assembly for an inbred line of this animal, constructed with short and long sequencing reads and Hi-C data. In this latest assembly, over 95% of the 123 Mb of sequence data was included in the chromosomes. Short sequencing reads predicted a genome size of 114-120 Mb; therefore, it is likely that the current assembly contains almost the entire genome, although this estimate of genome size was smaller than previous estimates. Remapping of the Hi-C data onto the new assembly revealed a large inversion in the genome of the inbred line. Moreover, a comparison of this genome assembly with that of Ciona savignyi, a different species in the same genus, revealed many chromosomal inversions between these two Ciona species, suggesting that such inversions have occurred frequently and have contributed to chromosomal evolution of Ciona species. Thus, the present assembly greatly improves an essential resource for genome-wide studies of ascidians.

62 citations


Journal ArticleDOI
TL;DR: Completion of plastome sequencing and assembly for 19 Medicago species and Trigonella foenum-graceum and comparative analysis with other IR-lacking clade taxa revealed modest divergence with regard to structural organization overall, however, one clade contained unique variation suggesting an ancestor had experienced repeat-mediated changes in plastid structure.
Abstract: The plant genome comprises a coevolving, integrated genetic system housed in three subcellular compartments: the nucleus, mitochondrion, and the plastid. The typical land plant plastid genome (plastome) comprises the sum of repeating units of 130-160 kb in length. The plastome inverted repeat (IR) divides each plastome monomer into large and small single copy regions, an architecture highly conserved across land plants. There have been varying degrees of expansion or contraction of the IR, and in a few distinct lineages, including the IR-lacking clade of papilionoid legumes, one copy of the IR has been lost. Completion of plastome sequencing and assembly for 19 Medicago species and Trigonella foenum-graceum and comparative analysis with other IR-lacking clade taxa revealed modest divergence with regard to structural organization overall. However, one clade contained unique variation suggesting an ancestor had experienced repeat-mediated changes in plastome structure. In Medicago minima, a novel IR of ∼9 kb was confirmed and the role of repeat-mediated, recombination-dependent replication in IR reemergence is discussed.

58 citations


Journal ArticleDOI
TL;DR: The high-quality chromosome-scale assembly of the nine-spined sticklebacks genome obtained with long-read sequencing technology provides a crucial resource for comparative and population genomic investigations of stickleback fishes and teleosts.
Abstract: The Gasterosteidae fish family hosts several species that are important models for eco-evolutionary, genetic, and genomic research. In particular, a wealth of genetic and genomic data has been generated for the three-spined stickleback (Gasterosteus aculeatus), the "ecology's supermodel," whereas the genomic resources for the nine-spined stickleback (Pungitius pungitius) have remained relatively scarce. Here, we report a high-quality chromosome-level genome assembly of P. pungitius consisting of 5,303 contigs (N50 = 1.2 Mbp) with a total size of 521 Mbp. These contigs were mapped to 21 linkage groups using a high-density linkage map, yielding a final assembly with 98.5% BUSCO completeness. A total of 25,062 protein-coding genes were annotated, and about 23% of the assembly was found to consist of repetitive elements. A comprehensive analysis of repetitive elements uncovered centromere-specific tandem repeats and provided insights into the evolution of retrotransposons. A multigene phylogenetic analysis inferred a divergence time of about 26 million years ago (Ma) between nine- and three-spined sticklebacks, which is far older than the commonly assumed estimate of 13 Ma. Compared with the three-spined stickleback, we identified an additional duplication of several genes in the hemoglobin cluster. Sequencing data from populations adapted to different environments indicated potential copy number variations in hemoglobin genes. Furthermore, genome-wide synteny comparisons between three- and nine-spined sticklebacks identified chromosomal rearrangements underlying the karyotypic differences between the two species. The high-quality chromosome-scale assembly of the nine-spined stickleback genome obtained with long-read sequencing technology provides a crucial resource for comparative and population genomic investigations of stickleback fishes and teleosts.

55 citations


Journal ArticleDOI
TL;DR: Results suggest that the major P. aeruginosa groups defined in part by the exoS and exoU genes are divergent from each other, and that these groups are genetically isolated and may be ecologically distinct.
Abstract: The diversification of microbial populations may be driven by many factors including adaptation to distinct ecological niches and barriers to recombination. We examined the population structure of the bacterial pathogen Pseudomonas aeruginosa by analyzing whole-genome sequences of 739 isolates from diverse sources. We confirmed that the population structure of P. aeruginosa consists of two major groups (referred to as Groups A and B) and at least two minor groups (Groups C1 and C2). Evidence for frequent intragroup but limited intergroup recombination in the core genome was observed, consistent with sexual isolation of the groups. Likewise, accessory genome analysis demonstrated more gene flow within Groups A and B than between these groups, and a few accessory genomic elements were nearly specific to one or the other group. In particular, the exoS gene was highly overrepresented in Group A compared with Group B isolates (99.4% vs. 1.1%) and the exoU gene was highly overrepresented in Group B compared with Group A isolates (95.2% vs. 1.8%). The exoS and exoU genes encode effector proteins secreted by the P. aeruginosa type III secretion system. Together these results suggest that the major P. aeruginosa groups defined in part by the exoS and exoU genes are divergent from each other, and that these groups are genetically isolated and may be ecologically distinct. Although both groups were globally distributed and caused human infections, certain groups predominated in some clinical contexts.

Journal ArticleDOI
TL;DR: The first estimation of the spontaneous mutation rate in a model unicellular eukaryote from the Stramenopile kingdom, the diatom Phaeodactylum tricornutum, is reported, enabling us to infer the effective population size of P. tricORNutum to be Ne∼8.72 × 106.
Abstract: Mutations are the origin of genetic diversity, and the mutation rate is a fundamental parameter to understand all aspects of molecular evolution. The combination of mutation-accumulation experiments and high-throughput sequencing enabled the estimation of mutation rates in most model organisms, but several major eukaryotic lineages remain unexplored. Here, we report the first estimation of the spontaneous mutation rate in a model unicellular eukaryote from the Stramenopile kingdom, the diatom Phaeodactylum tricornutum (strain RCC2967). We sequenced 36 mutation accumulation lines for an average of 181 generations per line and identified 156 de novo mutations. The base substitution mutation rate per site per generation is μbs = 4.77 × 10-10 and the insertion-deletion mutation rate is μid = 1.58 × 10-11. The mutation rate varies as a function of the nucleotide context and is biased toward an excess of mutations from GC to AT, consistent with previous observations in other species. Interestingly, the mutation rates between the genomes of organelles and the nucleus differ, with a significantly higher mutation rate in the mitochondria. This confirms previous claims based on indirect estimations of the mutation rate in mitochondria of photosynthetic eukaryotes that acquired their plastid through a secondary endosymbiosis. This novel estimate enables us to infer the effective population size of P. tricornutum to be Ne∼8.72 × 106.

Journal ArticleDOI
TL;DR: The authors used whole-genome data from five populations (Africa, North America, Europe, Central Asia, and the South Pacific) to carry out demographic inferences, with particular attention to the inclusion of migration and admixture.
Abstract: The cohabitation of Drosophila melanogaster with humans is nearly ubiquitous. Though it has been well established that this fly species originated in sub-Saharan Africa, and only recently has spread globally, many details of its swift expansion remain unclear. Elucidating the demographic history of D. melanogaster provides a unique opportunity to investigate how human movement might have impacted patterns of genetic diversity in a commensal species, as well as providing neutral null models for studies aimed at identifying genomic signatures of local adaptation. Here, we use whole-genome data from five populations (Africa, North America, Europe, Central Asia, and the South Pacific) to carry out demographic inferences, with particular attention to the inclusion of migration and admixture. We demonstrate the importance of these parameters for model fitting and show that how previous estimates of divergence times are likely to be significantly underestimated as a result of not including them. Finally, we discuss how human movement along early shipping routes might have shaped the present-day population structure of D. melanogaster.

Journal ArticleDOI
TL;DR: The analysis shows that the current taxonomic classification of Clostridium species hinders the prediction of functions and traits, suggests a new classification for this fascinating class of bacteria, and highlights the importance of phylogenomics for taxonomic studies.
Abstract: Clostridium is a large genus of obligate anaerobes belonging to the Firmicutes phylum of bacteria, most of which have a Gram-positive cell wall structure. The genus includes significant human and animal pathogens, causative of potentially deadly diseases such as tetanus and botulism. Despite their relevance and many studies suggesting that they are not a monophyletic group, the taxonomy of the group has largely been neglected. Currently, species belonging to the genus are placed in the unnatural order defined as Clostridiales, which includes the class Clostridia. Here, we used genomic data from 779 strains to study the taxonomy and evolution of the group. This analysis allowed us to 1) confirm that the group is composed of more than one genus, 2) detect major differences between pathogens classified as a single species within the group of authentic Clostridium spp. (sensu stricto), 3) identify inconsistencies between taxonomy and toxin evolution that reflect on the pervasive misclassification of strains, and 4) identify differential traits within central metabolism of members of what has been defined earlier and confirmed by us as cluster I. Our analysis shows that the current taxonomic classification of Clostridium species hinders the prediction of functions and traits, suggests a new classification for this fascinating class of bacteria, and highlights the importance of phylogenomics for taxonomic studies.

Journal ArticleDOI
TL;DR: This study established the most up-to-date view of the evolutionary relationships within this genus and highlighted several cases of poor classification, especially for the very closely related species within the Acinetobacter calcoaceticus–Acinetobacteria baumannii complex (Acb complex).
Abstract: The Gram-negative Acinetobacter genus has several species of clear medical relevance. Many fully sequenced genomes belonging to the genus have been published in recent years; however, there has not been a recent attempt to infer the evolutionary history of Acinetobacter with that vast amount of information. Here, through a phylogenomic approach, we established the most up-to-date view of the evolutionary relationships within this genus and highlighted several cases of poor classification, especially for the very closely related species within the Acinetobacter calcoaceticus-Acinetobacter baumannii complex (Acb complex). Furthermore, we determined appropriate phylogenetic markers for this genus and showed that concatenation of the top 13 gives a very decent reflection of the evolutionary relationships for the genus Acinetobacter. The intersection between our top markers and previously defined universal markers is very small. In general, our study shows that, although there seems to be hardly any universal markers, bespoke phylogenomic approaches can be used to infer the phylogeny of different bacterial genera. We expect that ad hoc phylogenomic approaches will be the standard in the years to come and will provide enough information to resolve intricate evolutionary relationships like those observed in the Acb complex.

Journal ArticleDOI
TL;DR: The whole-genome sequence of a second Acropora species, A. millepora, is reported, which has been the most extensively studied Acroporas species at the molecular level by virtue of its wide distribution and the ease with which it can be identified in what is a highly speciose genus.
Abstract: [Excerpt] Reef-building corals are iconic animals that are in global decline as a consequence of increasing anthropogenic pressure, but the development of strategies to ensure their conservation is constrained by our limited understanding of the molecular bases of many aspects of coral biology. Some coral genera are particularly sensitive to stress and, among these, Acropora is of particular significance because this is the dominant genus of reef-building corals in the Indo-Pacific. These factors have led to members of this genus often being the subjects of investigation into coral responses to various physical and biological stressors. Fittingly, the first coral genome to be sequenced was Acropora digitifera; the availability of this whole-genome sequence (Shinzato et al. 2011) allowed substantial progress in several areas of coral biology, including the molecular underpinnings of symbiosis and calcification (Hamada et al. 2013; Ramos-Silva et al. 2013). Here we report the whole-genome sequence of a second Acropora species, A. millepora, which has been the most extensively studied Acropora species at the molecular level (reviewed in Miller et al. 2011) by virtue of its wide distribution (Carpenter et al. 2008; Madin et al. 2016) and the ease with which it can be identified in what is a highly speciose genus.

Journal ArticleDOI
TL;DR: Threespine stickleback fish are an excellent model to examine the evolution of recombination over short evolutionary timescales and recombination rates indeed varied at a fine-scale across the genome, with many regions organized into narrow hotspots.
Abstract: Meiotic recombination is a highly conserved process that has profound effects on genome evolution. At a fine-scale, recombination rates can vary drastically across genomes, often localized into small recombination "hotspots" with highly elevated rates, surrounded by regions with little recombination. In most species studied, the location of hotspots within genomes is highly conserved across broad evolutionary timescales. The main exception to this pattern is in mammals, where hotspot location can evolve rapidly among closely related species and even among populations within a species. Hotspot position in mammals is controlled by the gene, Prdm9, whereas in species with conserved hotspots, a functional Prdm9 is typically absent. Due to a limited number of species where recombination rates have been estimated at a fine-scale, it remains unclear whether hotspot conservation is always associated with the absence of a functional Prdm9. Threespine stickleback fish (Gasterosteus aculeatus) are an excellent model to examine the evolution of recombination over short evolutionary timescales. Using a linkage disequilibrium-based approach, we found recombination rates indeed varied at a fine-scale across the genome, with many regions organized into narrow hotspots. Hotspots had highly divergent landscapes between stickleback populations, where only ∼15% of these hotspots were shared. Our results indicate that fine-scale recombination rates may be diverging between closely related populations of threespine stickleback fish. Interestingly, we found only a weak association of a PRDM9 binding motif within hotspots, which suggests that threespine stickleback fish may possess a novel mechanism for targeting recombination hotspots at a fine-scale.

Journal ArticleDOI
TL;DR: How improved assembly of genomes from metagenomes would be the most straight-forward approach for improving the inference of gene gain and loss events is discussed.
Abstract: High-throughput shotgun metagenomics sequencing has enabled the profiling of myriad natural communities. These data are commonly used to identify gene families and pathways that were potentially gained or lost in an environment and which may be involved in microbial adaptation. Despite the widespread interest in these events, there are no established best practices for identifying gene gain and loss in metagenomics data. Horizontal gene transfer (HGT) represents several mechanisms of gene gain that are especially of interest in clinical microbiology due to the rapid spread of antibiotic resistance genes in natural communities. Several additional mechanisms of gene gain and loss, including gene duplication, gene loss-of-function events, and de novo gene birth are also important to consider in the context of metagenomes but have been less studied. This review is largely focused on detecting HGT in prokaryotic metagenomes, but methods for detecting these other mechanisms are first discussed. For this article to be self-contained, we provide a general background on HGT and the different possible signatures of this process. Lastly, we discuss how improved assembly of genomes from metagenomes would be the most straight-forward approach for improving the inference of gene gain and loss events. Several recent technological advances could help improve metagenome assemblies: long-read sequencing, determining the physical proximity of contigs, optical mapping of short sequences along chromosomes, and single-cell metagenomics. The benefits and limitations of these advances are discussed and open questions in this area are highlighted.

Journal ArticleDOI
TL;DR: Investigating the extent to which intergenic mutations contribute to the evolutionary response of a clinically important bacterial pathogen, Pseudomonas aeruginosa, to the host environment and whether Intergenic mutations have distinct roles in host adaptation finds that intergenic mutation facilitate essential genes to become targets of evolution.
Abstract: Bacterial pathogens evolve during the course of infection as they adapt to the selective pressures that confront them inside the host. Identification of adaptive mutations and their contributions to pathogen fitness remains a central challenge. Although mutations can either target intergenic or coding regions in the pathogen genome, studies of host adaptation have focused predominantly on molecular evolution within coding regions, whereas the role of intergenic mutations remains unclear. Here, we address this issue and investigate the extent to which intergenic mutations contribute to the evolutionary response of a clinically important bacterial pathogen, Pseudomonas aeruginosa, to the host environment, and whether intergenic mutations have distinct roles in host adaptation. We characterize intergenic evolution in 44 clonal lineages of P. aeruginosa and identify 77 intergenic regions in which parallel evolution occurs. At the genetic level, we find that mutations in regions under selection are located primarily within regulatory elements upstream of transcriptional start sites. At the functional level, we show that some of these mutations both increase or decrease transcription of genes and are directly responsible for evolution of important pathogenic phenotypes including antibiotic sensitivity. Importantly, we find that intergenic mutations facilitate essential genes to become targets of evolution. In summary, our results highlight the evolutionary significance of intergenic mutations in creating host-adapted strains, and that intergenic and coding regions have different qualitative contributions to this process.

Journal ArticleDOI
TL;DR: The analysis of Z chromosome evolution and gene expression across 12 paleognaths shows that paleognath Z chromosomes are atypical at the genomic level, but the evolutionary forces maintaining largely homomorphic sex chromosomes in these species remain elusive.
Abstract: Standard models of sex chromosome evolution propose that recombination suppression leads to the degeneration of the heterogametic chromosome, as is seen for the Y chromosome in mammals and the W chromosome in most birds. Unlike other birds, paleognaths (ratites and tinamous) possess large nondegenerate regions on their sex chromosomes (PARs or pseudoautosomal regions). It remains unclear why these large PARs are retained over >100 Myr, and how this retention impacts the evolution of sex chromosomes within this system. To address this puzzle, we analyzed Z chromosome evolution and gene expression across 12 paleognaths, several of whose genomes have recently been sequenced. We confirm at the genomic level that most paleognaths retain large PARs. As in other birds, we find that all paleognaths have incomplete dosage compensation on the regions of the Z chromosome homologous to degenerated portions of the W (differentiated regions), but we find no evidence for enrichments of male-biased genes in PARs. We find limited evidence for increased evolutionary rates (faster-Z) either across the chromosome or in differentiated regions for most paleognaths with large PARs, but do recover signals of faster-Z evolution in tinamou species with mostly degenerated W chromosomes, similar to the pattern seen in neognaths. Unexpectedly, in some species, PAR-linked genes evolve faster on average than genes on autosomes, suggested by diverse genomic features to be due to reduced efficacy of selection in paleognath PARs. Our analysis shows that paleognath Z chromosomes are atypical at the genomic level, but the evolutionary forces maintaining largely homomorphic sex chromosomes in these species remain elusive.

Journal ArticleDOI
TL;DR: Reduced representation bisulfite sequencing on red blood cell derived DNA showed genome-wide temporal changes in more than 40,000 out of the 522,643 CpG sites examined, and sites that showed a temporal and treatment-specific response in DNA methylation are candidate sites of interest for future studies trying to understand the link betweenDNA methylation patterns and timing of reproduction.
Abstract: In seasonal environments, timing of reproduction is a trait with important fitness consequences, but we know little about the molecular mechanisms that underlie the variation in this trait. Recentl ...

Journal ArticleDOI
TL;DR: The first draft genome of H. pluvialis is reported, which provides a solid foundation for the discovery of the genetic basis for theoretical and commercial astaxanthin enrichment.
Abstract: Haematococcus pluvialis is a freshwater species of Chlorophyta, family Haematococcaceae. It is well known for its capacity to synthesize high amounts of astaxanthin, which is a strong antioxidant that has been utilized in aquaculture and cosmetics. To improve astaxanthin yield and to establish genetic resources for H. pluvialis, we performed whole-genome sequencing, assembly, and annotation of this green microalga. A total of 83.1 Gb of raw reads were sequenced. After filtering the raw reads, we subsequently generated a draft assembly with a genome size of 669.0 Mb, a scaffold N50 of 288.6 kb, and predicted 18,545 genes. We also established a robust phylogenetic tree from 14 representative algae species. With additional transcriptome data, we revealed some novel potential genes that are involved in the synthesis, accumulation, and regulation of astaxanthin production. In addition, we generated an isoform-level reference transcriptome set of 18,483 transcripts with high confidence. Alternative splicing analysis demonstrated that intron retention is the most frequent mode. In summary, we report the first draft genome of H. pluvialis. These genomic resources along with transcriptomic data provide a solid foundation for the discovery of the genetic basis for theoretical and commercial astaxanthin enrichment.

Journal ArticleDOI
TL;DR: The chromosome-scale de novo assembly and genome annotation of Rhododendron williamsianum is reported as a basis for continued study of this large genus of >1,000 species and finds evidence for two shared, ancient WGDs in Rhododendedron and Vaccinium members that predate the Ericaceae family and, in one case, the Ericales order.
Abstract: The genus Rhododendron (Ericaceae), which includes horticulturally important plants such as azaleas, is a highly diverse and widely distributed genus of >1,000 species. Here, we report the chromosome-scale de novo assembly and genome annotation of Rhododendron williamsianum as a basis for continued study of this large genus. We created multiple short fragment genomic libraries, which were assembled using ALLPATHS-LG. This was followed by contiguity preserving transposase sequencing (CPT-seq) and fragScaff scaffolding of a large fragment library, which improved the assembly by decreasing the number of scaffolds and increasing scaffold length. Chromosome-scale scaffolding was performed by proximity-guided assembly (LACHESIS) using chromatin conformation capture (Hi-C) data. Chromosome-scale scaffolding was further refined and linkage groups defined by restriction-site associated DNA (RAD) sequencing of the parents and progeny of a genetic cross. The resulting linkage map confirmed the LACHESIS clustering and ordering of scaffolds onto chromosomes and rectified large-scale inversions. Assessments of the R. williamsianum genome assembly and gene annotation estimate them to be 89% and 79% complete, respectively. Predicted coding sequences from genome annotation were used in syntenic analyses and for generating age distributions of synonymous substitutions/site between paralgous gene pairs, which identified whole-genome duplications (WGDs) in R. williamsianum. We then analyzed other publicly available Ericaceae genomes for shared WGDs. Based on our spatial and temporal analyses of paralogous gene pairs, we find evidence for two shared, ancient WGDs in Rhododendron and Vaccinium (cranberry/blueberry) members that predate the Ericaceae family and, in one case, the Ericales order.

Journal ArticleDOI
TL;DR: The complete circular genome of wAlbB from the Aa23 cell line is assembled using long-read PacBio sequencing at 500× median coverage and KEGG analysis revealed the absence of five genes in w albB which are present in other Wolbachia.
Abstract: Wolbachia, an alpha-proteobacterium closely related to Rickettsia, is a maternally transmitted, intracellular symbiont of arthropods and nematodes. Aedes albopictus mosquitoes are naturally infected with Wolbachia strains wAlbA and wAlbB. Cell line Aa23 established from Ae. albopictus embryos retains only wAlbB and is a key model to study host-endosymbiont interactions. We have assembled the complete circular genome of wAlbB from the Aa23 cell line using long-read PacBio sequencing at 500× median coverage. The assembled circular chromosome is 1.48 megabases in size, an increase of more than 300 kb over the published draft wAlbB genome. The annotation of the genome identified 1,205 protein coding genes, 34 tRNA, 3 rRNA, 1 tmRNA, and 3 other ncRNA loci. The long reads enabled sequencing over complex repeat regions which are difficult to resolve with short-read sequencing. Thirteen percent of the genome comprised insertion sequence elements distributed throughout the genome, some of which cause pseudogenization. Prophage WO genes encoding some essential components of phage particle assembly are missing, while the remainder are found in five prophage regions/WO-like islands or scattered around the genome. Orthology analysis identified a core proteome of 535 orthogroups across all completed Wolbachia genomes. The majority of proteins could be annotated using Pfam and eggNOG analyses, including ankyrins and components of the Type IV secretion system. KEGG analysis revealed the absence of five genes in wAlbB which are present in other Wolbachia. The availability of a complete circular chromosome from wAlbB will enable further biochemical, molecular, and genetic analyses on this strain and related Wolbachia.

Journal ArticleDOI
TL;DR: It is found that water availability is the main climatic variable shaping local adaptation of the species, and 821 SNPs showing significant associations with climatic variables or combinations of them are found based on the consistent results of three different genotype–environment association methods.
Abstract: Understanding the genomic basis of local adaptation is crucial to determine the potential of long-lived woody species to withstand changes in their natural environment. In the past, efforts to dissect the genomic architecture in gymnosperms species have been limited due to the absence of reference genomes. Recently, the genomes of some commercially important conifers, such as loblolly pine, have become available, allowing whole-genome studies of these species. In this study, we test for associations between 87k SNPs, obtained from whole-genome resequencing of loblolly pine individuals, and 270 environmental variables and combinations of them. We determine the geographic location of significant loci and identify their genomic location using our newly constructed ultradense 26k SNP linkage map. We found that water availability is the main climatic variable shaping local adaptation of the species, and found 821 SNPs showing significant associations with climatic variables or combinations of them based on the consistent results of three different genotype-environment association methods. Our results suggest that adaptation to climate in the species might have occurred by many changes in the frequency of alleles with moderate to small effect sizes, and by the smaller contribution of large effect alleles in genes related to moisture deficit, temperature and precipitation. Genomic regions of low recombination and high population differentiation harbored SNPs associated with groups of environmental variables, suggesting climate adaptation might have evolved as a result of different selection pressures acting on groups of genes associated with an aspect of climate rather than on individual environmental variables.

Journal ArticleDOI
TL;DR: This work focuses on such oversized eukaryotic TEs, including retrotransposons and DNA transposons, and outlines their complex and often combinatorial nature and closely intertwined relationship with viruses, and discusses their potential for participating in transfer of long stretches of DNA in eukARYotes.
Abstract: Transposable elements (TEs) are ubiquitous in both prokaryotes and eukaryotes, and the dynamic character of their interaction with host genomes brings about numerous evolutionary innovations and shapes genome structure and function in a multitude of ways. In traditional classification systems, TEs are often being depicted in simplistic ways, based primarily on the key enzymes required for transposition, such as transposases/recombinases and reverse transcriptases. Recent progress in whole-genome sequencing and long-read assembly, combined with expansion of the familiar range of model organisms, resulted in identification of unprecedentedly long transposable units spanning dozens or even hundreds of kilobases, initially in prokaryotic and more recently in eukaryotic systems. Here, we focus on such oversized eukaryotic TEs, including retrotransposons and DNA transposons, outline their complex and often combinatorial nature and closely intertwined relationship with viruses, and discuss their potential for participating in transfer of long stretches of DNA in eukaryotes.

Journal ArticleDOI
TL;DR: A new method, Cp-hap, is developed to detect all possible structural haplotypes of chloroplast genomes of quadripartite structure using long-read sequencing data, suggesting that flip-flop recombination mediateschloroplast structural heteroplasmy.
Abstract: The chloroplast genome usually has a quadripartite structure consisting of a large single copy region and a small single copy region separated by two long inverted repeats. It has been known for some time that a single cell may contain at least two structural haplotypes of this structure, which differ in the relative orientation of the single copy regions. However, the methods required to detect and measure the abundance of the structural haplotypes are labor-intensive, and this phenomenon remains understudied. Here, we develop a new method, Cp-hap, to detect all possible structural haplotypes of chloroplast genomes of quadripartite structure using long-read sequencing data. We use this method to conduct a systematic analysis and quantification of chloroplast structural haplotypes in 61 land plant species across 19 orders of Angiosperms, Gymnosperms, and Pteridophytes. Our results show that there are two chloroplast structural haplotypes which occur with equal frequency in most land plant individuals. Nevertheless, species whose chloroplast genomes lack inverted repeats or have short inverted repeats have just a single structural haplotype. We also show that the relative abundance of the two structural haplotypes remains constant across multiple samples from a single individual plant, suggesting that the process which maintains equal frequency of the two haplotypes operates rapidly, consistent with the hypothesis that flip-flop recombination mediates chloroplast structural heteroplasmy. Our results suggest that previous claims of differences in chloroplast genome structure between species may need to be revisited.

Journal ArticleDOI
TL;DR: The normalized index COUSIN (for COdon Usage Similarity INdex), that compares the CUPrefs of a query against those of a reference and normalizes the output over a Null Hypothesis of random codon usage is introduced.
Abstract: Codon Usage Preferences (CUPrefs) describe the unequal usage of synonymous codons at the gene, chromosome, or genome levels. Numerous indices have been developed to evaluate CUPrefs, either in absolute terms or with respect to a reference. We introduce the normalized index COUSIN (for COdon Usage Similarity INdex), that compares the CUPrefs of a query against those of a reference and normalizes the output over a Null Hypothesis of random codon usage. The added value of COUSIN is to be easily interpreted, both quantitatively and qualitatively. An eponymous software written in Python3 is available for local or online use (http://cousin.ird.fr). This software allows for an easy and complete analysis of CUPrefs via COUSIN, includes seven other indices, and provides additional features such as statistical analyses, clustering, and CUPrefs optimization for gene expression. We illustrate the flexibility of COUSIN and highlight its advantages by analyzing the complete coding sequences of eight divergent genomes. Strikingly, COUSIN captures a bimodal distribution in the CUPrefs of human and chicken genes hitherto unreported with such precision. COUSIN opens new perspectives to uncover CUPrefs specificities in genomes in a practical, informative, and user-friendly way.

Journal ArticleDOI
TL;DR: The results support dosage balance constraint as a specific property of genes involved in biological interactions, including physical PPIs, and suggest that additional factors may be differently influencing the evolution of genes following duplication, depending on the species, time, and mechanism of origin.
Abstract: Gene duplicates, generated through either whole genome duplication (WGD) or small-scale duplication (SSD), are prominent in angiosperms and are believed to play an important role in adaptation and in generating evolutionary novelty. Previous studies reported contrasting evolutionary and functional dynamics of duplicate genes depending on the mechanism of origin, a behavior that is hypothesized to stem from constraints to maintain the relative dosage balance between the genes concerned and their interaction context. However, the mechanisms ultimately influencing loss and retention of gene duplicates over evolutionary time are not yet fully elucidated. Here, by using a robust classification of gene duplicates in Arabidopsis thaliana, Solanum lycopersicum, and Zea mays, large RNAseq expression compendia and an extensive protein-protein interaction (PPI) network from Arabidopsis, we investigated the impact of PPIs on the differential evolutionary and functional fate of WGD and SSD duplicates. In all three species, retained WGD duplicates show stronger constraints to diverge at the sequence and expression level than SSD ones, a pattern that is also observed for shared PPI partners between Arabidopsis duplicates. PPIs are preferentially distributed among WGD duplicates and specific functional categories. Furthermore, duplicates with PPIs tend to be under stronger constraints to evolve than their counterparts without PPIs regardless of their mechanism of origin. Our results support dosage balance constraint as a specific property of genes involved in biological interactions, including physical PPIs, and suggest that additional factors may be differently influencing the evolution of genes following duplication, depending on the species, time, and mechanism of origin.

Journal ArticleDOI
TL;DR: In this paper, the authors sequenced and assembled the genome of a Colletotrichum higginsianum (Ch) strain, resulting in a highly contiguous genome assembly, which was compared with the chromosome-level genome assembly of another strain to identify genomic variations between strains.
Abstract: Phytopathogen genomes are under constant pressure to change, as pathogens are locked in an evolutionary arms race with their hosts, where pathogens evolve effector genes to manipulate their hosts, whereas the hosts evolve immune components to recognize the products of these genes. Colletotrichum higginsianum (Ch), a fungal pathogen with no known sexual morph, infects Brassicaceae plants including Arabidopsis thaliana. Previous studies revealed that Ch differs in its virulence toward various Arabidopsis thaliana ecotypes, indicating the existence of coevolutionary selective pressures. However, between-strain genomic variations in Ch have not been studied. Here, we sequenced and assembled the genome of a Ch strain, resulting in a highly contiguous genome assembly, which was compared with the chromosome-level genome assembly of another strain to identify genomic variations between strains. We found that the two closely related strains vary in terms of large-scale rearrangements, the existence of strain-specific regions, and effector candidate gene sets and that these variations are frequently associated with transposable elements (TEs). Ch has a compartmentalized genome consisting of gene-sparse, TE-dense regions with more effector candidate genes and gene-dense, TE-sparse regions harboring conserved genes. Additionally, analysis of the conservation patterns and syntenic regions of effector candidate genes indicated that the two strains vary in their effector candidate gene sets because of de novo evolution, horizontal gene transfer, or gene loss after divergence. Our results reveal mechanisms for generating genomic diversity in this asexual pathogen, which are important for understanding its adaption to hosts.