scispace - formally typeset
Search or ask a question

Showing papers in "Molecular Biology and Evolution in 2019"


Journal ArticleDOI
TL;DR: SLiM 3 is introduced, which contains two key advancements aimed at abolishing limitations in the Wright–Fisher model, and adds support for continuous space, including spatial interactions and spatial maps of environmental variables.
Abstract: With the desire to model population genetic processes under increasingly realistic scenarios, forward genetic simulations have become a critical part of the toolbox of modern evolutionary biology. The SLiM forward genetic simulation framework is one of the most powerful and widely used tools in this area. However, its foundation in the Wright-Fisher model has been found to pose an obstacle to implementing many types of models; it is difficult to adapt the Wright-Fisher model, with its many assumptions, to modeling ecologically realistic scenarios such as explicit space, overlapping generations, individual variation in reproduction, density-dependent population regulation, individual variation in dispersal or migration, local extinction and recolonization, mating between subpopulations, age structure, fitness-based survival and hard selection, emergent sex ratios, and so forth. In response to this need, we here introduce SLiM 3, which contains two key advancements aimed at abolishing these limitations. First, the new non-Wright-Fisher or "nonWF" model type provides a much more flexible foundation that allows the easy implementation of all of the above scenarios and many more. Second, SLiM 3 adds support for continuous space, including spatial interactions and spatial maps of environmental variables. We provide a conceptual overview of these new features, and present several example models to illustrate their use.

583 citations


Journal ArticleDOI
TL;DR: CNNs are capable of outperforming expert-derived statistical methods and offer a new path forward in cases where no likelihood approach exists, and are shown to perform accurate evolutionary model selection and parameter estimation, even on problems that have not received detailed theoretical treatments.
Abstract: Population-scale genomic data sets have given researchers incredible amounts of information from which to infer evolutionary histories. Concomitant with this flood of data, theoretical and methodological advances have sought to extract information from genomic sequences to infer demographic events such as population size changes and gene flow among closely related populations/species, construct recombination maps, and uncover loci underlying recent adaptation. To date, most methods make use of only one or a few summaries of the input sequences and therefore ignore potentially useful information encoded in the data. The most sophisticated of these approaches involve likelihood calculations, which require theoretical advances for each new problem, and often focus on a single aspect of the data (e.g., only allele frequency information) in the interest of mathematical and computational tractability. Directly interrogating the entirety of the input sequence data in a likelihood-free manner would thus offer a fruitful alternative. Here, we accomplish this by representing DNA sequence alignments as images and using a class of deep learning methods called convolutional neural networks (CNNs) to make population genetic inferences from these images. We apply CNNs to a number of evolutionary questions and find that they frequently match or exceed the accuracy of current methods. Importantly, we show that CNNs perform accurate evolutionary model selection and parameter estimation, even on problems that have not received detailed theoretical treatments. Thus, when applied to population genetic alignments, CNNs are capable of outperforming expert-derived statistical methods and offer a new path forward in cases where no likelihood approach exists.

139 citations


Journal ArticleDOI
TL;DR: A least-squares estimation approach for confounder estimation that provides a unique framework for several categories of genomic data, not restricted to genotypes, and outperforms other fast approaches based on principal component or surrogate variable analysis.
Abstract: Gene-environment association (GEA) studies are essential to understand the past and ongoing adaptations of organisms to their environment, but those studies are complicated by confounding due to unobserved demographic factors. Although the confounding problem has recently received considerable attention, the proposed approaches do not scale with the high-dimensionality of genomic data. Here, we present a new estimation method for latent factor mixed models (LFMMs) implemented in an upgraded version of the corresponding computer program. We developed a least-squares estimation approach for confounder estimation that provides a unique framework for several categories of genomic data, not restricted to genotypes. The speed of the new algorithm is several order faster than existing GEA approaches and then our previous version of the LFMM program. In addition, the new method outperforms other fast approaches based on principal component or surrogate variable analysis. We illustrate the program use with analyses of the 1000 Genomes Project data set, leading to new findings on adaptation of humans to their environment, and with analyses of DNA methylation profiles providing insights on how tobacco consumption could affect DNA methylation in patients with rheumatoid arthritis. Software availability: Software is available in the R package lfmm at https://bcm-uga.github.io/lfmm/.

127 citations


Journal ArticleDOI
TL;DR: PastML was applied to the phylogeography of Dengue serotype 2 (DENV2), and the evolution of drug resistances in a large HIV data set, and showed the uncertainty of the human-sylvatic DENV2 geographic origin.
Abstract: The reconstruction of ancestral scenarios is widely used to study the evolution of characters along phylogenetic trees. One commonly uses the marginal posterior probabilities of the character states, or the joint reconstruction of the most likely scenario. However, marginal reconstructions provide users with state probabilities, which are difficult to interpret and visualize, whereas joint reconstructions select a unique state for every tree node and thus do not reflect the uncertainty of inferences. We propose a simple and fast approach, which is in between these two extremes. We use decision-theory concepts (namely, the Brier score) to associate each node in the tree to a set of likely states. A unique state is predicted in tree regions with low uncertainty, whereas several states are predicted in uncertain regions, typically around the tree root. To visualize the results, we cluster the neighboring nodes associated with the same states and use graph visualization tools. The method is implemented in the PastML program and web server. The results on simulated data demonstrate the accuracy and robustness of the approach. PastML was applied to the phylogeography of Dengue serotype 2 (DENV2), and the evolution of drug resistances in a large HIV data set. These analyses took a few minutes and provided convincing results. PastML retrieved the main transmission routes of human DENV2 and showed the uncertainty of the human-sylvatic DENV2 geographic origin. With HIV, the results show that resistance mutations mostly emerge independently under treatment pressure, but resistance clusters are found, corresponding to transmissions among untreated patients.

126 citations


Journal ArticleDOI
TL;DR: Sequenceserver is a tool for running BLAST and visually inspecting BLAST results for biological interpretation and uses simple algorithms to prevent potential analysis errors and provides flexible text-based and visual outputs to support researcher productivity.
Abstract: Comparing newly obtained and previously known nucleotide and amino-acid sequences underpins modern biological research. BLAST is a well-established tool for such comparisons but is challenging to use on new data sets. We combined a user-centric design philosophy with sustainable software development approaches to create Sequenceserver, a tool for running BLAST and visually inspecting BLAST results for biological interpretation. Sequenceserver uses simple algorithms to prevent potential analysis errors and provides flexible text-based and visual outputs to support researcher productivity. Our software can be rapidly installed for use by individuals or on shared servers.

124 citations


Journal ArticleDOI
TL;DR: Simulation results indicate that the transmission-based method is better in identifying direct transmissions than a SNP threshold, with dissimilarity between clusterings, and results show that it is likely to outperform the SNP-threshold method where clock rates are variable and sample collection times are spread out.
Abstract: Whole-genome sequencing (WGS) is increasingly used to aid the understanding of pathogen transmission. A first step in analyzing WGS data is usually to define "transmission clusters," sets of cases that are potentially linked by direct transmission. This is often done by including two cases in the same cluster if they are separated by fewer single-nucleotide polymorphisms (SNPs) than a specified threshold. However, there is little agreement as to what an appropriate threshold should be. We propose a probabilistic alternative, suggesting that the key inferential target for transmission clusters is the number of transmissions separating cases. We characterize this by combining the number of SNP differences and the length of time over which those differences have accumulated, using information about case timing, molecular clock, and transmission processes. Our framework has the advantage of allowing for variable mutation rates across the genome and can incorporate other epidemiological data. We use two tuberculosis studies to illustrate the impact of our approach: with British Columbia data by using spatial divisions; with Republic of Moldova data by incorporating antibiotic resistance. Simulation results indicate that our transmission-based method is better in identifying direct transmissions than a SNP threshold, with dissimilarity between clusterings of on average 0.27 bits compared with 0.37 bits for the SNP-threshold method and 0.84 bits for randomly permuted data. These results show that it is likely to outperform the SNP-threshold method where clock rates are variable and sample collection times are spread out. We implement the method in the R package transcluster.

102 citations


Journal ArticleDOI
TL;DR: All phylogenetic reconstructions, based on 248 genes and using site-heterogeneous mixture models, robustly resolve the evolutionary origin of Telonemia as sister to the Sar supergroup, and propose the moniker “TSAR” to accommodate this new mega-assemblage in the phylogeny of eukaryotes.
Abstract: The resolution of the broad-scale tree of eukaryotes is constantly improving, but the evolutionary origin of several major groups remains unknown. Resolving the phylogenetic position of these "orphan" groups is important, especially those that originated early in evolution, because they represent missing evolutionary links between established groups. Telonemia is one such orphan taxon for which little is known. The group is composed of molecularly diverse biflagellated protists, often prevalent although not abundant in aquatic environments. Telonemia has been hypothesized to represent a deeply diverging eukaryotic phylum but no consensus exists as to where it is placed in the tree. Here, we established cultures and report the phylogenomic analyses of three new transcriptome data sets for divergent telonemid lineages. All our phylogenetic reconstructions, based on 248 genes and using site-heterogeneous mixture models, robustly resolve the evolutionary origin of Telonemia as sister to the Sar supergroup. This grouping remains well supported when as few as 60% of the genes are randomly subsampled, thus is not sensitive to the sets of genes used but requires a minimal alignment length to recover enough phylogenetic signal. Telonemia occupies a crucial position in the tree to examine the origin of Sar, one of the most lineage-rich eukaryote supergroups. We propose the moniker "TSAR" to accommodate this new mega-assemblage in the phylogeny of eukaryotes.

80 citations


Journal ArticleDOI
TL;DR: This protocol shows how to use TempEst, BEAUti, and BEAST 1.10 (http://beast.community/; last accessed July 29, 2019), LogCombiner as well as Tracer in a complete workflow.
Abstract: Inferring past population dynamics over time from heterochronous molecular sequence data is often achieved using the Bayesian Skygrid model, a non-parametric coalescent model that estimates the effective population size over time. Available in BEAST, a cross-platform program for Bayesian analysis of molecular sequences using Markov chain Monte Carlo, this coalescent model is often estimated in conjunction with a molecular clock model to produce time-stamped phylogenetic trees. We here provide a practical guide to using BEAST and its accompanying applications for the purpose of drawing inference under these models. We focus on best practices, potential pitfalls and recommendations that can be generalized to other software packages for Bayesian inference. This protocol shows how to use TempEst, BEAUti and BEAST 1.10 (http://beast.community/), LogCombiner as well as Tracer in a complete workflow.

79 citations


Journal ArticleDOI
TL;DR: It is concluded that existing phylogenomic approaches to infer the Tree of Life may be highly misleading without considering the genomic architecture of phylogenetic signal relative to recombination rate and its interplay with historical hybridization.
Abstract: Current phylogenomic approaches implicitly assume that the predominant phylogenetic signal within a genome reflects the true evolutionary history of organisms, without assessing the confounding effects of postspeciation gene flow that can produce a mosaic of phylogenetic signals that interact with recombinational variation. Here, we tested the validity of this assumption with a phylogenomic analysis of 27 species of the cat family, assessing local effects of recombination rate on species tree inference and divergence time estimation across their genomes. We found that the prevailing phylogenetic signal within the autosomes is not always representative of the most probable speciation history, due to ancient hybridization throughout felid evolution. Instead, phylogenetic signal was concentrated within regions of low recombination, and notably enriched within large X chromosome recombination cold spots that exhibited recurrent patterns of strong genetic differentiation and selective sweeps across mammalian orders. By contrast, regions of high recombination were enriched for signatures of ancient gene flow, and these sequences inflated crown-lineage divergence times by ∼40%. We conclude that existing phylogenomic approaches to infer the Tree of Life may be highly misleading without considering the genomic architecture of phylogenetic signal relative to recombination rate and its interplay with historical hybridization.

77 citations


Journal ArticleDOI
TL;DR: The mechanisms by which Candida albicans acquires the ability to survive both chemotherapeutic agents and antifungal drugs are analyzed, highlighting the potential clinical consequences for the management of cancer chemotherapy patients at risk of fungal infections.
Abstract: Aneuploidy is common both in tumor cells responding to chemotherapeutic agents and in fungal cells adapting to antifungal drugs. Because aneuploidy simultaneously affects many genes, it has the potential to confer multiple phenotypes to the same cells. Here, we analyzed the mechanisms by which Candida albicans, the most prevalent human fungal pathogen, acquires the ability to survive both chemotherapeutic agents and antifungal drugs. Strikingly, adaptation to both types of drugs was accompanied by the acquisition of specific whole-chromosome aneuploidies, with some aneuploid karyotypes recovered independently and repeatedly from very different drug conditions. Specifically, strains selected for survival in hydroxyurea, an anticancer drug, acquired cross-adaptation to caspofungin, a first-line antifungal drug, and both acquired traits were attributable to trisomy of the same chromosome: loss of trisomy was accompanied by loss of adaptation to both drugs. Mechanistically, aneuploidy simultaneously altered the copy number of most genes on chromosome 2, yet survival in hydroxyurea or caspofungin required different genes and stress response pathways. Similarly, chromosome 5 monosomy conferred increased tolerance to both fluconazole and to caspofungin, antifungals with different mechanisms of action. Thus, the potential for cross-adaptation is not a feature of aneuploidy per se; rather, it is dependent on specific genes harbored on given aneuploid chromosomes. Furthermore, pre-exposure to hydroxyurea increased the frequency of appearance of caspofungin survivors, and hydroxyurea-adapted C. albicans cells were refractory to antifungal drug treatment in a mouse model of systemic candidiasis. This highlights the potential clinical consequences for the management of cancer chemotherapy patients at risk of fungal infections.

75 citations


Journal ArticleDOI
TL;DR: Genomic analyses shed light on the molecular mechanisms underlying cetacean traits, including gigantism, and will contribute to the development of future targets for human cancer therapies.
Abstract: Cetaceans are a clade of highly specialized aquatic mammals that include the largest animals that have ever lived. The largest whales can have ∼1,000× more cells than a human, with long lifespans, leaving them theoretically susceptible to cancer. However, large-bodied and long-lived animals do not suffer higher risks of cancer mortality than humans-an observation known as Peto's Paradox. To investigate the genomic bases of gigantism and other cetacean adaptations, we generated a de novo genome assembly for the humpback whale (Megaptera novaeangliae) and incorporated the genomes of ten cetacean species in a comparative analysis. We found further evidence that rorquals (family Balaenopteridae) radiated during the Miocene or earlier, and inferred that perturbations in abundance and/or the interocean connectivity of North Atlantic humpback whale populations likely occurred throughout the Pleistocene. Our comparative genomic results suggest that the evolution of cetacean gigantism was accompanied by strong selection on pathways that are directly linked to cancer. Large segmental duplications in whale genomes contained genes controlling the apoptotic pathway, and genes inferred to be under accelerated evolution and positive selection in cetaceans were enriched for biological processes such as cell cycle checkpoint, cell signaling, and proliferation. We also inferred positive selection on genes controlling the mammalian appendicular and cranial skeletal elements in the cetacean lineage, which are relevant to extensive anatomical changes during cetacean evolution. Genomic analyses shed light on the molecular mechanisms underlying cetacean traits, including gigantism, and will contribute to the development of future targets for human cancer therapies.

Journal ArticleDOI
TL;DR: A graph-based clustering method to address MSA uncertainty and error in the software Divvier, which uses a probabilistic model to identify clusters of characters that have strong statistical evidence of shared homology and substantially outperforms existing filtering software.
Abstract: Multiple sequence alignment (MSA) is ubiquitous in evolution and bioinformatics. MSAs are usually taken to be a known and fixed quantity on which to perform downstream analysis despite extensive ev ...

Journal ArticleDOI
TL;DR: The recent advances and outstanding challenges in orthology inference are reviewed, as revealed at a symposium and meeting held at the University of Southern California in 2017.
Abstract: Gene families evolve by the processes of speciation (creating orthologs), gene duplication (paralogs) and horizontal gene transfer (xenologs), in addition to sequence divergence and gene loss. Orthologs in particular play an essential role in comparative genomics and phylogenomic analyses. With the continued sequencing of organisms across the tree of life, the data are available to reconstruct the unique evolutionary histories of tens of thousands of gene families. Accurate reconstruction of these histories, however, is a challenging computational problem, and the focus of the Quest for Orthologs Consortium. We review the recent advances and outstanding challenges in this field, as revealed at a symposium and meeting held at the University of Southern California in 2017. Key advances have been made both at the level of orthology algorithm development and with respect to coordination across the community of algorithm developers and orthology end-users. Applications spanned a broad range, including gene function prediction, hylostratigraphy, genome evolution, and phylogenomics. The meetings highlighted the increasing use of meta-analyses integrating results from multiple different algorithms, and discussed ongoing challenges in orthology inference as well as the next steps toward improvement and integration of orthology resources.

Journal ArticleDOI
TL;DR: These results suggest that genomic methods for reconstructing species’ effective population size histories can be applied to nonmodel organisms without highly contiguous reference genomes, and are capable of detecting independently documented effects of historical geological events.
Abstract: Reconstructing species' demographic histories is a central focus of molecular ecology and evolution. Recently, an expanding suite of methods leveraging either the sequentially Markovian coalescent (SMC) or the site-frequency spectrum has been developed to reconstruct population size histories from genomic sequence data. However, few studies have investigated the robustness of these methods to genome assemblies of varying quality. In this study, we first present an improved genome assembly for the Tasmanian devil using the Chicago library method. Compared with the original reference genome, our new assembly reduces the number of scaffolds (from 35,975 to 10,010) and increases the scaffold N90 (from 0.101 to 2.164 Mb). Second, we assess the performance of four contemporary genomic methods for inferring population size history (PSMC, MSMC, SMC++, Stairway Plot), using the two devil genome assemblies as well as simulated, artificially fragmented genomes that approximate the hypothesized demographic history of Tasmanian devils. We demonstrate that each method is robust to assembly quality, producing similar estimates of Ne when simulated genomes were fragmented into up to 5,000 scaffolds. Overall, methods reliant on the SMC are most reliable between ∼300 generations before present (gbp) and 100 kgbp, whereas methods exclusively reliant on the site-frequency spectrum are most reliable between the present and 30 gbp. Our results suggest that when used in concert, genomic methods for reconstructing species' effective population size histories 1) can be applied to nonmodel organisms without highly contiguous reference genomes, and 2) are capable of detecting independently documented effects of historical geological events.

Journal ArticleDOI
TL;DR: The mitogenomic phylogeny also partly resolved the brood diversification process, and showed that none of the 13- and 17-year species within the species groups was monophyletic, possibly due to gene flow between them.
Abstract: The mass application of whole mitogenome (MG) sequencing has great potential for resolving complex phylogeographic patterns that cannot be resolved by partial mitogenomic sequences or nuclear markers. North American periodical cicadas (Magicicada) are well known for their periodical mass emergence at 17- and 13-year intervals in the north and south, respectively. Magicicada comprises three species groups, each containing one 17-year species and one or two 13-year species. Within each life cycle, single-aged cohorts, called broods, of periodical cicadas emerge in different years, and most broods contain members of all three species groups. There are 12 and three extant broods of 17- and 13-year cicadas, respectively. The phylogeographic relationships among the populations and broods within the species groups have not been clearly resolved. We analyzed 125 whole MG sequences from all broods and seven species within three species groups to ascertain the divergence history of the geographic and allochronic populations and their life cycles. Our mitogenomic phylogeny analysis clearly revealed that each of the three species groups had largely similar phylogeographic subdivisions (east, middle, and west) and demographic histories (rapid population expansion after the last glacial period). The mitogenomic phylogeny also partly resolved the brood diversification process, which could be explained by hypothetical temporary life cycle shifts, and showed that none of the 13- and 17-year species within the species groups was monophyletic, possibly due to gene flow between them. Our findings clearly reveal phylogeographic structures in the three Magicicada species groups, demonstrating the advantage of whole MG sequence data in phylogeographic studies.

Journal ArticleDOI
TL;DR: The genomic variation in Tibetan sheep is investigated using whole-genome sequences, single nucleotide polymorphism arrays, mitochondrial DNA, and Y-chromosomal variants in 986 samples throughout their distribution range to contribute to a depth understanding of early pastoralism and the local adaptation of Tibetan sheep as well as the late-Holocene human occupation of the QTP.
Abstract: Tibetan sheep are the most common and widespread domesticated animals on the Qinghai-Tibetan Plateau (QTP) and have played an essential role in the permanent human occupation of this high-altitude region. However, the precise timing, route, and process of sheep pastoralism in the QTP region remain poorly established, and little is known about the underlying genomic changes that occurred during the process. Here, we investigate the genomic variation in Tibetan sheep using whole-genome sequences, single nucleotide polymorphism arrays, mitochondrial DNA, and Y-chromosomal variants in 986 samples throughout their distribution range. We detect strong signatures of selection in genes involved in the hypoxia and ultraviolet signaling pathways (e.g., HIF-1 pathway and HBB and MITF genes) and in genes associated with morphological traits such as horn size and shape (e.g., RXFP2). We identify clear signals of argali (Ovis ammon) introgression into sympatric Tibetan sheep, covering 5.23-5.79% of their genomes. The introgressed genomic regions are enriched in genes related to oxygen transportation system, sensory perception, and morphological phenotypes, in particular the genes HBB and RXFP2 with strong signs of adaptive introgression. The spatial distribution of genomic diversity and demographic reconstruction of the history of Tibetan sheep show a stepwise pattern of colonization with their initial spread onto the QTP from its northeastern part ∼3,100 years ago, followed by further southwest expansion to the central QTP ∼1,300 years ago. Together with archeological evidence, the date and route reveal the history of human expansions on the QTP by the Tang-Bo Ancient Road during the late Holocene. Our findings contribute to a depth understanding of early pastoralism and the local adaptation of Tibetan sheep as well as the late-Holocene human occupation of the QTP.

Journal ArticleDOI
TL;DR: An improved version of the A. pisum genome is presented based on the use of two long-range proximity ligation methods that illuminates the mode of gene family evolution by providing proximity information between paralogs and shows that long- range scaffolding methods can substantially improve assemblies of repetitive genomes and facilitate study of genefamily evolution and structural variation.
Abstract: Genome structural variations, including duplications, deletions, insertions, and inversions, are central in the evolution of eukaryotic genomes. However, structural variations present challenges for high-quality genome assembly, hampering efforts to understand the evolution of gene families and genome architecture. An example is the genome of the pea aphid (Acyrthosiphon pisum) for which the current assembly is composed of thousands of short scaffolds, many of which are known to be misassembled. Here, we present an improved version of the A. pisum genome based on the use of two long-range proximity ligation methods. The new assembly contains four long scaffolds (40-170 Mb), corresponding to the three autosomes and the X chromosome of A. pisum, and encompassing 86% of the new assembly. Assembly accuracy is supported by several quality assessments. Using this assembly, we identify the chromosomal locations and relative ages of duplication events, and the locations of horizontally acquired genes. The improved assembly illuminates the mode of gene family evolution by providing proximity information between paralogs. By estimating nucleotide polymorphism and coverage depth from resequencing data, we determined that many short scaffolds not assembling to chromosomes represent hemizygous regions, which are especially frequent on the highly repetitive X chromosome. Aligning the X-linked aphicarus region, responsible for male wing dimorphism, to the new assembly revealed a 50-kb deletion that cosegregates with the winged male phenotype in some clones. These results show that long-range scaffolding methods can substantially improve assemblies of repetitive genomes and facilitate study of gene family evolution and structural variation.

Journal ArticleDOI
TL;DR: It is discovered that Persian walnut arose as a hybrid between the American and the Asian lineages and that J. regia (and its landrace J. sigillata) resulted from massive introgression from an immigrating Asian butternut into the genome of an American black walnut.
Abstract: Persian walnut (Juglans regia) is cultivated worldwide for its high-quality wood and nuts, but its origin has remained mysterious because in phylogenies it occupies an unresolved position between American black walnuts and Asian butternuts. Equally unclear is the origin of the only American butternut, J. cinerea. We resequenced the whole genome of 80 individuals from 19 of the 22 species of Juglans and assembled the genome of its relatives Pterocarya stenoptera and Platycarya strobilacea. Using phylogenetic-network analysis of single-copy nuclear genes, genome-wide site pattern probabilities, and Approximate Bayesian Computation, we discovered that J. regia (and its landrace J. sigillata) arose as a hybrid between the American and the Asian lineages and that J. cinerea resulted from massive introgression from an immigrating Asian butternut into the genome of an American black walnut. Approximate Bayesian Computation modeling placed the hybrid origin in the late Pliocene, ∼3.45 My, with both parental lineages since having gone extinct in Europe.

Journal ArticleDOI
TL;DR: Results indicate that extensive genome reduction occurred in the ancestral Buchnera prior to aphid diversification and that reduction has continued since, with losses greater in some lineages and for some loci.
Abstract: An evolutionary consequence of uniparentally transmitted symbiosis is degradation of symbiont genomes. We use the system of aphids and their maternally inherited obligate endosymbiont, Buchnera aphidicola, to explore the evolutionary process of genome degradation. We compared complete genome sequences for 39 Buchnera strains, including 23 newly sequenced symbiont genomes from diverse aphid hosts. We reconstructed the genome of the most recent shared Buchnera ancestor, which contained 616 protein-coding genes, and 39 RNA genes. The extent of subsequent gene loss varied across lineages, resulting in modern genomes ranging from 412 to 646 kb and containing 354-587 protein-coding genes. Loss events were highly nonrandom across loci. Genes involved in replication, transcription, translation, and amino acid biosynthesis are largely retained, whereas genes underlying ornithine biosynthesis, stress responses, and transcriptional regulation were lost repeatedly. Aside from losses, gene order is almost completely stable. The main exceptions involve movement between plasmid and chromosome locations of genes underlying tryptophan and leucine biosynthesis and supporting nutrition of aphid hosts. This set of complete genomes enabled tests for signatures of positive diversifying selection. Of 371 Buchnera genes tested, 29 genes show strong support for ongoing positive selection. These include genes encoding outer membrane porins that are expected to be involved in direct interactions with hosts. Collectively, these results indicate that extensive genome reduction occurred in the ancestral Buchnera prior to aphid diversification and that reduction has continued since, with losses greater in some lineages and for some loci.

Journal ArticleDOI
TL;DR: This work uses locally adapted and phenotypically differentiated Arabidopsis lyrata populations from two altitudinal gradients in Norway to detect signatures of selection for local adaptation, and estimates patterns of lineage specific differentiation among these populations.
Abstract: Short-scale local adaptation is a complex process involving selection, migration and drift. The expected effects on the genome are well grounded in theory but examining these on an empirical level has proven difficult, as it requires information about local selection, demographic history and recombination rate variation. Here, we use locally adapted and phenotypically differentiated Arabidopsis lyrata populations from two altitudinal gradients in Norway to test these expectations at the whole-genome level. Demography modelling indicates that populations within the gradients diverged less than 2 kya and that the sites are connected by gene flow. The gene flow estimates are, however, highly asymmetric with migration from high to low altitudes being several times more frequent than vice versa. To detect signatures of selection for local adaptation, we estimate patterns of lineage specific differentiation among these populations. Theory predicts that gene flow leads to concentration of adaptive loci in areas of low recombination; a pattern we observe in both lowland-alpine comparisons. Although most selected loci display patterns of conditional neutrality, we found indications of genetic trade-offs, with one locus particularly showing high differentiation and signs of selection in both populations. Our results further suggest that resistance to solar radiation is an important adaptation to alpine environments, while vegetative growth and bacterial defense are indicated as selected traits in the lowland habitats. These results provide insights into genetic architectures and evolutionary processes driving local adaptation under gene flow. We also contribute to understanding of traits and biological processes underlying alpine adaptation in northern latitudes.

Journal ArticleDOI
TL;DR: This study provides strong evidence of 'ghost introgression' as the cause of DMD, and it is suggested that ' ghost introgressive' may be a widely overlooked phenomenon in nature.
Abstract: In the absence of nuclear-genomic differentiation between two populations, deep mitochondrial divergence (DMD) is a form of mito-nuclear discordance. Such instances of DMD are rare and might variably be explained by unusual cases of female-linked selection, by male-biased dispersal, by 'speciation reversal' or by mitochondrial capture through genetic introgression. Here we analyze DMD in an Asian Phylloscopus leaf warbler (Aves: Phylloscopidae) complex. Bioacoustic, morphological and genomic data demonstrate close similarity between the taxa affinis and occisinensis, even though DMD previously led to their classification as two distinct species. Using population genomic and comparative genomic methods on 45 whole genomes, including historical reconstructions of effective population size, genomic peaks of differentiation and genomic linkage, we infer that the form affinis is likely the product of a westward expansion in which it replaced a now-extinct congener that was the donor of its mtDNA and small portions of its nuclear genome. This study provides strong evidence of 'ghost introgression' as the cause of DMD, and we suggest that 'ghost introgression' may be a widely overlooked phenomenon in nature.

Journal ArticleDOI
TL;DR: A full probabilistic approach for phylogenomic reconciliation-based WGD inference is developed, accounting for both gene tree and reconciliation uncertainty using a method based on the principle of amalgamated likelihood estimation.
Abstract: Gene tree-species tree reconciliation methods have been employed for studying ancient whole-genome duplication (WGD) events across the eukaryotic tree of life. Most approaches have relied on using maximum likelihood trees and the maximum parsimony reconciliation thereof to count duplication events on specific branches of interest in a reference species tree. Such approaches do not account for uncertainty in the gene tree and reconciliation, or do so only heuristically. The effects of these simplifications on the inference of ancient WGDs are unclear. In particular, the effects of variation in gene duplication and loss rates across the species tree have not been considered. Here, we developed a full probabilistic approach for phylogenomic reconciliation-based WGD inference, accounting for both gene tree and reconciliation uncertainty using a method based on the principle of amalgamated likelihood estimation. The model and methods are implemented in a maximum likelihood and Bayesian setting and account for variation of duplication and loss rates across the species tree, using methods inspired by phylogenetic divergence time estimation. We applied our newly developed framework to ancient WGDs in land plants and investigated the effects of duplication and loss rate variation on reconciliation and gene count based assessment of these earlier proposed WGDs.

Journal ArticleDOI
TL;DR: The main contribution of this version of OrthoMaM is the increase in the number of taxa: 116 mammalian genomes for 14,509 one-to-one orthologous genes.
Abstract: We present version 10 of OrthoMaM, a database of orthologous mammalian markers. OrthoMaM is already 11 years old and since the outset it has kept on improving, providing alignments and phylogenetic trees of high-quality computed with state-of-the-art methods on up-to-date data. The main contribution of this version is the increase in the number of taxa: 116 mammalian genomes for 14,509 one-to-one orthologous genes. This has been made possible by the combination of genomic data deposited in Ensembl complemented by additional good-quality genomes only available in NCBI. Version 10 users will benefit from pipeline improvements and a completely redesigned web-interface.

Journal ArticleDOI
TL;DR: The genomes of four ampullariids spanning the Old World and New World have conserved ancient bilaterial karyotype features and a novel Hox gene cluster rearrangement, making them valuable in comparative genomic studies.
Abstract: The family Ampullariidae includes both aquatic and amphibious apple snails. They are an emerging model for evolutionary studies due to the high diversity, ancient history, and wide geographical distribution. Insight into drivers of ampullariid evolution is hampered, however, by the lack of genomic resources. Here, we report the genomes of four ampullariids spanning the Old World (Lanistes nyassanus) and New World (Pomacea canaliculata, P. maculata, and Marisa cornuarietis) clades. The ampullariid genomes have conserved ancient bilaterial karyotype features and a novel Hox gene cluster rearrangement, making them valuable in comparative genomic studies. They have expanded gene families related to environmental sensing and cellulose digestion, which may have facilitated some ampullarids to become notorious invasive pests. In the amphibious Pomacea, novel acquisition of an egg neurotoxin and a protein for making the calcareous eggshell may have been key adaptations enabling their transition from underwater to terrestrial egg deposition.

Journal ArticleDOI
TL;DR: It is argued that the initial stages of domestication include dynamic alterations in DNA methylation of developmental genes that affect the neural crest, and a conserved molecular process to explain Darwin’s domestication syndrome across vertebrates is suggested.
Abstract: Domestication of wild animals induces a set of phenotypic characteristics collectively known as the domestication syndrome. However, how this syndrome emerges is still not clear. Recently, the neural crest cell deficit hypothesis proposed that it is generated by a mildly disrupted neural crest cell developmental program, but clear support is lacking due to the difficulties of distinguishing pure domestication effects from preexisting genetic differences between farmed and wild mammals and birds. Here, we use a farmed fish as model to investigate the role of persistent changes in DNA methylation (epimutations) in the process of domestication. We show that early domesticates of sea bass, with no genetic differences with wild counterparts, contain epimutations in tissues with different embryonic origins. About one fifth of epimutations that persist into adulthood are established by the time of gastrulation and affect genes involved in developmental processes that are expressed in embryonic structures, including the neural crest. Some of these genes are differentially expressed in sea bass with lower jaw malformations, a key feature of domestication syndrome. Interestingly, these epimutations significantly overlap with cytosine-to-thymine polymorphisms after 25 years of selective breeding. Furthermore, epimutated genes coincide with genes under positive selection in other domesticates. We argue that the initial stages of domestication include dynamic alterations in DNA methylation of developmental genes that affect the neural crest. Our results indicate a role for epimutations during the beginning of domestication that could be fixed as genetic variants and suggest a conserved molecular process to explain Darwin's domestication syndrome across vertebrates.

Journal ArticleDOI
TL;DR: The results suggest that S. bovis/S.
Abstract: Introgression among parasite species has the potential to transfer traits of biomedical importance across species boundaries. The parasitic blood fluke Schistosoma haematobium causes urogenital schistosomiasis in humans across sub-Saharan Africa. Hybridization with other schistosome species is assumed to occur commonly, because genetic crosses between S. haematobium and livestock schistosomes, including S. bovis, can be staged in the laboratory, and sequencing of mtDNA and rDNA amplified from microscopic miracidia larvae frequently reveals markers from different species. However, the frequency, direction, age, and genomic consequences of hybridization are unknown. We hatched miracidia from eggs and sequenced the exomes from 96 individual S. haematobium miracidia from infected patients from Niger and the Zanzibar archipelago. These data revealed no evidence for contemporary hybridization between S. bovis and S. haematobium in our samples. However, all Nigerien S. haematobium genomes sampled show hybrid ancestry, with 3.3-8.2% of their nuclear genomes derived from S. bovis, providing evidence of an ancient introgression event that occurred at least 108-613 generations ago. Some S. bovis-derived alleles have spread to high frequency or reached fixation and show strong signatures of directional selection; the strongest signal spans a single gene in the invadolysin gene family (Chr. 4). Our results suggest that S. bovis/S. haematobium hybridization occurs rarely but demonstrate profound consequences of ancient introgression from a livestock parasite into the genome of S. haematobium, the most prevalent schistosome species infecting humans.

Journal ArticleDOI
TL;DR: It is found that the WGD events were typically associated with shifts in climatic niche but did not find a direct association with WGDs and diversification rate shifts, and found evidence for significant gene family expansion in genes with stress adaptation and clades found in extreme environments.
Abstract: Several plant lineages have evolved adaptations that allow survival in extreme and harsh environments including many families within the plant clade Portulacineae (Caryophyllales) such as the Cactaceae, Didiereaceae, and Montiaceae. Here, using newly generated transcriptomic data, we reconstructed the phylogeny of Portulacineae and examined potential correlates between molecular evolution and adaptation to harsh environments. Our phylogenetic results were largely congruent with previous analyses, but we identified several early diverging nodes characterized by extensive gene tree conflict. For particularly contentious nodes, we present detailed information about the phylogenetic signal for alternative relationships. We also analyzed the frequency of gene duplications, confirmed previously identified whole genome duplications (WGD), and proposed a previously unidentified WGD event within the Didiereaceae. We found that the WGD events were typically associated with shifts in climatic niche but did not find a direct association with WGDs and diversification rate shifts. Diversification shifts occurred within the Portulacaceae, Cactaceae, and Anacampserotaceae, and whereas these did not experience WGDs, the Cactaceae experienced extensive gene duplications. We examined gene family expansion and molecular evolutionary patterns with a focus on genes associated with environmental stress responses and found evidence for significant gene family expansion in genes with stress adaptation and clades found in extreme environments. These results provide important directions for further and deeper examination of the potential links between molecular evolutionary patterns and adaptation to harsh environments.

Journal ArticleDOI
TL;DR: It is concluded that differentiated sex chromosomes were already present in the common ancestor of Anguimorpha living in the early Cretaceous or even in the Jurassic Period, placing anguimorphan sex chromosomes among the oldest known in vertebrates.
Abstract: Sex determination in varanids, Gila monsters, beaded lizards, and other anguimorphan lizards is still poorly understood. Sex chromosomes were reported only in a few species based solely on cytogenetics, which precluded assessment of their homology. We uncovered Z-chromosome-specific genes in varanids from their transcriptomes. Comparison of differences in gene copy numbers between sexes across anguimorphan lizards and outgroups revealed that homologous differentiated ZZ/ZW sex chromosomes are present in Gila monsters, beaded lizards, alligator lizards, and a wide phylogenetic spectrum of varanids. However, these sex chromosomes are not homologous to those known in other amniotes. We conclude that differentiated sex chromosomes were already present in the common ancestor of Anguimorpha living in the early Cretaceous or even in the Jurassic Period, 115-180 Ma, placing anguimorphan sex chromosomes among the oldest known in vertebrates. The analysis of transcriptomes of Komodo dragon (Varanus komodoensis) showed that the expression levels of genes linked to anguimorphan sex chromosomes are not balanced between sexes. Besides expanding our knowledge on vertebrate sex chromosome evolution, our study has important practical relevance for breeding and ecological studies. We introduce the first, widely applicable technique of molecular sexing in varanids, Gila monsters, and beaded lizards, where reliable determination of sex based on external morphology is dubious even in adults.

Journal ArticleDOI
TL;DR: A map of the acquisition of structural variation is generated and the fundamental stages that shaped the evolution of the mitoribosomal large subunit are reconstructed to suggest a critical role for ablation and expansion of rapidly evolving mt-rRNA.
Abstract: Mitochondrial ribosomes (mitoribosomes) are essential components of all mitochondria that synthesize proteins encoded by the mitochondrial genome. Unlike other ribosomes, mitoribosomes are highly variable across species. The basis for this diversity is not known. Here, we examine the composition and evolutionary history of mitoribosomes across the phylogenetic tree by combining three-dimensional structural information with a comparative analysis of the secondary structures of mitochondrial rRNAs (mt-rRNAs) and available proteomic data. We generate a map of the acquisition of structural variation and reconstruct the fundamental stages that shaped the evolution of the mitoribosomal large subunit and led to this diversity. Our analysis suggests a critical role for ablation and expansion of rapidly evolving mt-rRNA. These changes cause structural instabilities that are "patched" by the acquisition of pre-existing compensatory elements, thus providing opportunities for rapid evolution. This mechanism underlies the incorporation of mt-tRNA into the central protuberance of the mammalian mitoribosome, and the altered path of the polypeptide exit tunnel of the yeast mitoribosome. We propose that since the toolkits of elements utilized for structural patching differ between mitochondria of different species, it fosters the growing divergence of mitoribosomes.

Journal ArticleDOI
TL;DR: It is demonstrated that missense mutations in the EPAS1 gene provided key evolutionary molecular adaptation to Tibetan horses living in high-altitude hypoxic environments, revealing possible targets for genomic selection programs aimed at increasing hypoxia tolerance in livestock and providing a textbook example of evolutionary convergence across independent mammal lineages.
Abstract: High altitude represents some of the most extreme environments worldwide. The genetic changes underlying adaptation to such environments have been recently identified in multiple animals but remain unknown in horses. Here, we sequence the complete genome of 138 domestic horses encompassing a whole altitudinal range across China to uncover the genetic basis for adaptation to high-altitude hypoxia. Our genome dataset includes 65 lowland animals across ten Chinese native breeds, 61 horses living at least 3,300 meters above sea level across seven locations along Qinghai-Tibet Plateau, as well as 7 Thoroughbred and 5 Przewalski's horses added for comparison. We find that Tibetan horses do not descend from Przewalski's horses but were most likely introduced from a distinct horse lineage, following the emergence of pastoral nomadism in Northwestern China ∼3,700 years ago. We identify that the endothelial PAS domain protein 1 gene (EPAS1, alsoHIF2A) shows the strongest signature for positive selection in the Tibetan horse genome. Two missense mutations at this locus appear strongly associated with blood physiological parameters facilitating blood circulation as well as oxygen transportation and consumption in hypoxic conditions. Functional validation through protein mutagenesis shows that these mutations increase EPAS1 stability and its hetero dimerization affinity to ARNT (HIF1B). Our study demonstrates that missense mutations in the EPAS1 gene provided key evolutionary molecular adaptation to Tibetan horses living in high-altitude hypoxic environments. It reveals possible targets for genomic selection programs aimed at increasing hypoxia tolerance in livestock and provides a textbook example of evolutionary convergence across independent mammal lineages.