scispace - formally typeset
Search or ask a question

Showing papers in "Molecular Biology and Evolution in 2017"


Journal ArticleDOI
TL;DR: The DNA Sequence Polymorphism (DnaSP) software as mentioned in this paper is a popular tool for performing exhaustive population genetic analyses on multiple sequence alignments, such as single and multi-locus coalescent simulations under a wide range of demographic scenarios.
Abstract: We present version 6 of the DNA Sequence Polymorphism (DnaSP) software, a new version of the popular tool for performing exhaustive population genetic analyses on multiple sequence alignments. This major upgrade incorporates novel functionalities to analyze large data sets, such as those generated by high-throughput sequencing technologies. Among other features, DnaSP 6 implements: 1) modules for reading and analyzing data from genomic partitioning methods, such as RADseq or hybrid enrichment approaches, 2) faster methods scalable for high-throughput sequencing data, and 3) summary statistics for the analysis of multi-locus population genetics data. Furthermore, DnaSP 6 includes novel modules to perform single- and multi-locus coalescent simulations under a wide range of demographic scenarios. The DnaSP 6 program, with extensive documentation, is freely available at http://www.ub.edu/dnasp.

3,277 citations


Journal ArticleDOI
TL;DR: A major expansion of the TimeTree resource is reported, which more than triples the number of species and more thanTriple thenumber of studies assembled, which will lead to broader and better understanding of the interplay of the change in the biosphere with the diversity of species on Earth.
Abstract: Evolutionary information on species divergence times is fundamental to studies of biodiversity, development, and disease. Molecular dating has enhanced our understanding of the temporal patterns of species divergences over the last five decades, and the number of studies is increasing quickly due to an exponential growth in the available collection of molecular sequences from diverse species and large number of genes. Our TimeTree resource is a public knowledge-base with the primary focus to make available all species divergence times derived using molecular sequence data to scientists, educators, and the general public in a consistent and accessible format. Here, we report a major expansion of the TimeTree resource, which more than triples the number of species (>97,000) and more than triples the number of studies assembled (>3,000). Furthermore, scientists can access not only the divergence time between two species or higher taxa, but also a timetree of a group of species and a timeline that traces a species' evolution through time. The new timetree and timeline visualizations are integrated with display of events on earth and environmental history over geological time, which will lead to broader and better understanding of the interplay of the change in the biosphere with the diversity of species on Earth. The next generation TimeTree resource is publicly available online at http://www.timetree.org.

1,880 citations


Journal ArticleDOI
TL;DR: EggNOG-mapper is developed, a tool for functional annotation of large sets of sequences based on fast orthology assignments using precomputed clusters and phylogenies from the eggNOG database, and scored within the top-5 methods in the three GO categories using the CAFA2 NK-partial benchmark.
Abstract: Orthology assignment is ideally suited for functional inference. However, because predicting orthology is computationally intensive at large scale, and most pipelines are relatively inaccessible (e.g., new assignments only available through database updates), less precise homology-based functional transfer is still the default for (meta-)genome annotation. We, therefore, developed eggNOG-mapper, a tool for functional annotation of large sets of sequences based on fast orthology assignments using precomputed clusters and phylogenies from the eggNOG database. To validate our method, we benchmarked Gene Ontology (GO) predictions against two widely used homology-based approaches: BLAST and InterProScan. Orthology filters applied to BLAST results reduced the rate of false positive assignments by 11%, and increased the ratio of experimentally validated terms recovered over all terms assigned per protein by 15%. Compared with InterProScan, eggNOG-mapper achieved similar proteome coverage and precision while predicting, on average, 41 more terms per protein and increasing the rate of experimentally validated terms recovered over total term assignments per protein by 35%. EggNOG-mapper predictions scored within the top-5 methods in the three GO categories using the CAFA2 NK-partial benchmark. Finally, we evaluated eggNOG-mapper for functional annotation of metagenomics data, yielding better performance than interProScan. eggNOG-mapper runs ∼15× faster than BLAST and at least 2.5× faster than InterProScan. The tool is available standalone and as an online service at http://eggnog-mapper.embl.de.

1,756 citations


Journal ArticleDOI
TL;DR: The software, “Smart Model Selection” (SMS), is implemented in the PhyML environment and available using two interfaces: command-line (to be integrated in pipelines) and a web server (http://www.atgc-montpellier.fr/phyml-sms/).
Abstract: Model selection using likelihood-based criteria (e.g., AIC) is one of the first steps in phylogenetic analysis. One must select both a substitution matrix and a model for rates across sites. A simple method is to test all combinations and select the best one. We describe heuristics to avoid these extensive calculations. Runtime is divided by $2 with results remaining nearly the same, and the method performs well compared with ProtTest and jModelTest2. Our software, "Smart Model Selection" (SMS), is implemented in the PhyML environment and available using two interfaces: command-line (to be integrated in pipelines) and a web server (http://www.atgc-montpellier.fr/phyml-sms/).

1,323 citations


Journal ArticleDOI
TL;DR: To enable fuller use of available data and more accurate inference of species tree topologies, divergence times, and substitution rates, a new version of *BEAST is developed called StarBEAST2, and species tree relaxed clocks are introduced to enable accurate estimates of per-species substitution rates.
Abstract: Fully Bayesian multispecies coalescent (MSC) methods like *BEAST estimate species trees from multiple sequence alignments. Today thousands of genes can be sequenced for a given study, but using that many genes with *BEAST is intractably slow. An alternative is to use heuristic methods which compromise accuracy or completeness in return for speed. A common heuristic is concatenation, which assumes that the evolutionary history of each gene tree is identical to the species tree. This is an inconsistent estimator of species tree topology, a worse estimator of divergence times, and induces spurious substitution rate variation when incomplete lineage sorting is present. Another class of heuristics directly motivated by the MSC avoids many of the pitfalls of concatenation but cannot be used to estimate divergence times. To enable fuller use of available data and more accurate inference of species tree topologies, divergence times, and substitution rates, we have developed a new version of *BEAST called StarBEAST2. To improve convergence rates we add analytical integration of population sizes, novel MCMC operators and other optimizations. Computational performance improved by 13.5× and 13.8× respectively when analyzing two empirical data sets, and an average of 33.1× across 30 simulated data sets. To enable accurate estimates of per-species substitution rates, we introduce species tree relaxed clocks, and show that StarBEAST2 is a more powerful and robust estimator of rate variation than concatenation. StarBEAST2 is available through the BEAUTi package manager in BEAST 2.4 and above.

346 citations


Journal ArticleDOI
TL;DR: PhyloNetworks is a Julia package for the inference, manipulation, visualization, and use of phylogenetic networks in an interactive environment and is the first software providing tools to summarize a set of networks with measures of tree edge support, hybrid edge Support, and hybrid node support.
Abstract: PhyloNetworks is a Julia package for the inference, manipulation, visualization, and use of phylogenetic networks in an interactive environment. Inference of phylogenetic networks is done with maximum pseudolikelihood from gene trees or multi-locus sequences (SNaQ), with possible bootstrap analysis. PhyloNetworks is the first software providing tools to summarize a set of networks (from a bootstrap or posterior sample) with measures of tree edge support, hybrid edge support, and hybrid node support. Networks can be used for phylogenetic comparative analysis of continuous traits, to estimate ancestral states or do a phylogenetic regression. The software is available in open source and with documentation at https://github.com/crsl4/PhyloNetworks.jl.

230 citations


Journal ArticleDOI
TL;DR: SLiM 2 is presented: an evolutionary simulation framework that combines a powerful, fast engine for forward population genetic simulations with the capability of modeling a wide variety of complex evolutionary scenarios, and achieves this flexibility through scriptability.
Abstract: Modern population genomic datasets hold immense promise for revealing the evolutionary processes operating in natural populations, but a crucial prerequisite for this goal is the ability to model realistic evolutionary scenarios and predict their expected patterns in genomic data. To that end, we present SLiM 2: an evolutionary simulation framework that combines a powerful, fast engine for forward population genetic simulations with the capability of modeling a wide variety of complex evolutionary scenarios. SLiM achieves this flexibility through scriptability, which provides control over most aspects of the simulated evolutionary scenarios with a simple R-like scripting language called Eidos. An example SLiM simulation is presented to illustrate the power of this approach. SLiM 2 also includes a graphical user interface for simulation construction, interactive runtime control, and dynamic visualization of simulation output, facilitating easy and fast model development with quick prototyping and visual debugging. We conclude with a performance comparison between SLiM and two other popular forward genetic simulation packages.

227 citations


Journal ArticleDOI
TL;DR: It is demonstrated that the ratio of mitochondrial over nuclear mutation rate is highly variable among animal taxa, and the among-phyla homogeneity in within-species mtDNA diversity is due to a negative correlation between mtDNA per-generation mutation rate and effective population size, irrespective of the action of natural selection.
Abstract: It is commonly assumed that mitochondrial DNA (mtDNA) evolves at a faster rate than nuclear DNA (nuDNA) in animals. This has contributed to the popularity of mtDNA as a molecular marker in evolutionary studies. Analyzing 121 multilocus data sets and four phylogenomic data sets encompassing 4,676 species of animals, we demonstrate that the ratio of mitochondrial over nuclear mutation rate is highly variable among animal taxa. In nonvertebrates, such as insects and arachnids, the ratio of mtDNA over nuDNA mutation rate varies between 2 and 6, whereas it is above 20, on average, in vertebrates such as scaled reptiles and birds. Interestingly, this variation is sufficient to explain the previous report of a similar level of mitochondrial polymorphism, on average, between vertebrates and nonvertebrates, which was originally interpreted as reflecting the effect of pervasive positive selection. Our analysis rather indicates that the among-phyla homogeneity in within-species mtDNA diversity is due to a negative correlation between mtDNA per-generation mutation rate and effective population size, irrespective of the action of natural selection. Finally, we explore the variation in the absolute per-year mutation rate of both mtDNA and nuDNA using a reduced data set for which fossil calibration is available, and discuss the potential determinants of mutation rate variation across genomes and taxa. This study has important implications regarding DNA-based identification methods in predicting that mtDNA barcoding should be less reliable in nonvertebrates than in vertebrates.

217 citations


Journal ArticleDOI
TL;DR: RWTY as mentioned in this paper is an R package that implements established and new methods for diagnosing phylogenetic MCMC convergence in a single convenient interface, which can be used for large data sets.
Abstract: Bayesian inference using Markov chain Monte Carlo (MCMC) has become one of the primary methods used to infer phylogenies from sequence data. Assessing convergence is a crucial component of these analyses, as it establishes the reliability of the posterior distribution estimates of the tree topology and model parameters sampled from the MCMC. Numerous tests and visualizations have been developed for this purpose, but many of the most popular methods are implemented in ways that make them inconvenient to use for large data sets. RWTY is an R package that implements established and new methods for diagnosing phylogenetic MCMC convergence in a single convenient interface.

179 citations


Journal ArticleDOI
TL;DR: The results provide support for a model in which different rice subspecies had separate origins, but that de novo domestication occurred only once, in O. sativa ssp.
Abstract: The origin of domesticated Asian rice (Oryza sativa) has been a contentious topic, with conflicting evidence for either single or multiple domestication of this key crop species. We examined the evolutionary history of domesticated rice by analyzing de novo assembled genomes from domesticated rice and its wild progenitors. Our results indicate multiple origins, where each domesticated rice subpopulation (japonica, indica, and aus) arose separately from progenitor O. rufipogon and/or O. nivara. Coalescence-based modeling of demographic parameters estimate that the first domesticated rice population to split off from O. rufipogon was O. sativa ssp. japonica, occurring at ∼13.1-24.1 ka, which is an order of magnitude older then the earliest archeological date of domestication. This date is consistent, however, with the expansion of O. rufipogon populations after the Last Glacial Maximum ∼18 ka and archeological evidence for early wild rice management in China. We also show that there is significant gene flow from japonica to both indica (∼17%) and aus (∼15%), which led to the transfer of domestication alleles from early-domesticated japonica to proto-indica and proto-aus populations. Our results provide support for a model in which different rice subspecies had separate origins, but that de novo domestication occurred only once, in O. sativa ssp. japonica, and introgressive hybridization from early japonica to proto-indica and proto-aus led to domesticated indica and aus rice.

178 citations


Journal ArticleDOI
TL;DR: The transmission tree inference methodology is uniquely suited to use in a public health environment during real-time outbreak investigations by accounting for unsampled cases and an outbreak which may not have reached its end.
Abstract: Genomic data are increasingly being used to understand infectious disease epidemiology. Isolates from a given outbreak are sequenced, and the patterns of shared variation are used to infer which isolates within the outbreak are most closely related to each other. Unfortunately, the phylogenetic trees typically used to represent this variation are not directly informative about who infected whom-a phylogenetic tree is not a transmission tree. However, a transmission tree can be inferred from a phylogeny while accounting for within-host genetic diversity by coloring the branches of a phylogeny according to which host those branches were in. Here we extend this approach and show that it can be applied to partially sampled and ongoing outbreaks. This requires computing the correct probability of an observed transmission tree and we herein demonstrate how to do this for a large class of epidemiological models. We also demonstrate how the branch coloring approach can incorporate a variable number of unique colors to represent unsampled intermediates in transmission chains. The resulting algorithm is a reversible jump Monte-Carlo Markov Chain, which we apply to both simulated data and real data from an outbreak of tuberculosis. By accounting for unsampled cases and an outbreak which may not have reached its end, our method is uniquely suited to use in a public health environment during real-time outbreak investigations. We implemented this transmission tree inference methodology in an R package called TransPhylo, which is freely available from https://github.com/xavierdidelot/TransPhylo.

Journal ArticleDOI
TL;DR: Evidence that soft sweeps are widespread and account for the vast majority of recent human adaptation is found, and insights into the role of sexual selection, cancer risk, and central nervous system development in recent human evolution are revealed.
Abstract: The degree to which adaptation in recent human evolution shapes genetic variation remains controversial. This is in part due to the limited evidence in humans for classic "hard selective sweeps", wherein a novel beneficial mutation rapidly sweeps through a population to fixation. However, positive selection may often proceed via "soft sweeps" acting on mutations already present within a population. Here, we examine recent positive selection across six human populations using a powerful machine learning approach that is sensitive to both hard and soft sweeps. We found evidence that soft sweeps are widespread and account for the vast majority of recent human adaptation. Surprisingly, our results also suggest that linked positive selection affects patterns of variation across much of the genome, and may increase the frequencies of deleterious mutations. Our results also reveal insights into the role of sexual selection, cancer risk, and central nervous system development in recent human evolution.

Journal ArticleDOI
TL;DR: The results support the hypothesis that the last common ancestor of Amoebozoa was sexual and flagellated, and it also may have had the ability to disperse propagules from a sporocarp-type fruiting body.
Abstract: Amoebozoa is the eukaryotic supergroup sister to Obazoa, the lineage that contains the animals and Fungi, as well as their protistan relatives, and the breviate and apusomonad flagellates. Amoebozoa is extraordinarily diverse, encompassing important model organisms and significant pathogens. Although amoebozoans are integral to global nutrient cycles and present in nearly all environments, they remain vastly understudied. We present a robust phylogeny of Amoebozoa based on broad representative set of taxa in a phylogenomic framework (325 genes). By sampling 61 taxa using culture-based and single-cell transcriptomics, our analyses show two major clades of Amoebozoa, Discosea, and Tevosa. This phylogeny refutes previous studies in major respects. Our results support the hypothesis that the last common ancestor of Amoebozoa was sexual and flagellated, it also may have had the ability to disperse propagules from a sporocarp-type fruiting body. Overall, the main macroevolutionary patterns in Amoebozoa appear to result from the parallel losses of homologous characters of a multiphase life cycle that included flagella, sex, and sporocarps rather than independent acquisition of convergent features.

Journal ArticleDOI
TL;DR: STRIDE correctly identifies the root of the species tree in multiple large-scale molecular phylogenetic data sets spanning a wide range of timescales and taxonomic groups, and the novel probability model implemented in STRIDE can accurately represent the ambiguity in species tree root assignment for data sets where information is limited.
Abstract: The correct interpretation of any phylogenetic tree is dependent on that tree being correctly rooted. We present STRIDE, a fast, effective, and outgroup-free method for identification of gene duplication events and species tree root inference in large-scale molecular phylogenetic analyses. STRIDE identifies sets of well-supported in-group gene duplication events from a set of unrooted gene trees, and analyses these events to infer a probability distribution over an unrooted species tree for the location of its root. We show that STRIDE correctly identifies the root of the species tree in multiple large-scale molecular phylogenetic data sets spanning a wide range of timescales and taxonomic groups. We demonstrate that the novel probability model implemented in STRIDE can accurately represent the ambiguity in species tree root assignment for data sets where information is limited. Furthermore, application of STRIDE to outgroup-free inference of the origin of the eukaryotic tree resulted in a root probability distribution that provides additional support for leading hypotheses for the origin of the eukaryotes.

Journal ArticleDOI
TL;DR: It is found that Japan emerges as the most probable source of the earliest recorded invasion into Hawaii, and a new, more efficient, ABC method, ABC random forest (ABC-RF) is used, which out-performs ABC-LDA when using a comparable and more manageable number of simulated datasets.
Abstract: Deciphering invasion routes from molecular data is crucial to understanding biological invasions, including identifying bottlenecks in population size and admixture among distinct populations. Here, we unravel the invasion routes of the invasive pest Drosophila suzukii using a multi-locus microsatellite dataset (25 loci on 23 worldwide sampling locations). To do this, we use approximate Bayesian computation (ABC), which has improved the reconstruction of invasion routes, but can be computationally expensive. We use our study to illustrate the use of a new, more efficient, ABC method, ABC random forest (ABC-RF) and compare it to a standard ABC method (ABC-LDA). We find that Japan emerges as the most probable source of the earliest recorded invasion into Hawaii. Southeast China and Hawaii together are the most probable sources of populations in western North America, which then in turn served as sources for those in eastern North America. European populations are genetically more homogeneous than North American populations, and their most probable source is northeast China, with evidence of limited gene flow from the eastern US as well. All introduced populations passed through bottlenecks, and analyses reveal five distinct admixture events. These findings can inform hypotheses concerning how this species evolved between different and independent source and invasive populations. Methodological comparisons indicate that ABC-RF and ABC-LDA show concordant results if ABC-LDA is based on a large number of simulated datasets but that ABC-RF out-performs ABC-LDA when using a comparable and more manageable number of simulated datasets, especially when analyzing complex introduction scenarios.

Journal ArticleDOI
TL;DR: The results suggest that angiosperms and gymnosperms differ considerably in their rates of molecular evolution per unit time, with gymnosperm rates being, on average, seven times lower than angiosperm species.
Abstract: The majority of variation in rates of molecular evolution among seed plants remains both unexplored and unexplained. Although some attention has been given to flowering plants, reports of molecular evolutionary rates for their sister plant clade (gymnosperms) are scarce, and to our knowledge differences in molecular evolution among seed plant clades have never been tested in a phylogenetic framework. Angiosperms and gymnosperms differ in a number of features, of which contrasting reproductive biology, life spans, and population sizes are the most prominent. The highly conserved morphology of gymnosperms evidenced by similarity of extant species to fossil records and the high levels of macrosynteny at the genomic level have led scientists to believe that gymnosperms are slow-evolving plants, although some studies have offered contradictory results. Here, we used 31,968 nucleotide sites obtained from orthologous genes across a wide taxonomic sampling that includes representatives of most conifers, cycads, ginkgo, and many angiosperms with a sequenced genome. Our results suggest that angiosperms and gymnosperms differ considerably in their rates of molecular evolution per unit time, with gymnosperm rates being, on average, seven times lower than angiosperm species. Longer generation times and larger genome sizes are some of the factors explaining the slow rates of molecular evolution found in gymnosperms. In contrast to their slow rates of molecular evolution, gymnosperms possess higher substitution rate ratios than angiosperm taxa. Finally, our study suggests stronger and more efficient purifying and diversifying selection in gymnosperm than in angiosperm species, probably in relation to larger effective population sizes.

Journal ArticleDOI
TL;DR: Odorant receptors and odorant binding proteins present only in hexapods (insects) and absent from all other arthropod lineages, indicating that they are not universal adaptations to land.
Abstract: Chemosensory-related gene (CRG) families have been studied extensively in insects, but their evolutionary history across the Arthropoda had remained relatively unexplored. Here, we address current hypotheses and prior conclusions on CRG family evolution using a more comprehensive data set. In particular, odorant receptors were hypothesized to have proliferated during terrestrial colonization by insects (hexapods), but their association with other pancrustacean clades and with independent terrestrial colonizations in other arthropod subphyla have been unclear. We also examine hypotheses on which arthropod CRG family is most ancient. Thus, we reconstructed phylogenies of CRGs, including those from new arthropod genomes and transcriptomes, and mapped CRG gains and losses across arthropod lineages. Our analysis was strengthened by including crustaceans, especially copepods, which reside outside the hexapod/branchiopod clade within the subphylum Pancrustacea. We generated the first high-resolution genome sequence of the copepod Eurytemora affinis and annotated its CRGs. We found odorant receptors and odorant binding proteins present only in hexapods (insects) and absent from all other arthropod lineages, indicating that they are not universal adaptations to land. Gustatory receptors likely represent the oldest chemosensory receptors among CRGs, dating back to the Placozoa. We also clarified and confirmed the evolutionary history of antennal ionotropic receptors across the Arthropoda. All antennal ionotropic receptors in E. affinis were expressed more highly in males than in females, suggestive of an association with male mate-recognition behavior. This study is the most comprehensive comparative analysis to date of CRG family evolution across the largest and most speciose metazoan phylum Arthropoda.

Journal ArticleDOI
TL;DR: In animals, the data confirm that longevity and propagule size are the variables that best explain the variation in πS among species, and in plants longevity also plays a major role as well as mating system.
Abstract: A central question in evolutionary biology is why some species have more genetic diversity than others and a no less important question is why selection efficacy varies among species. Although these questions have started to be tackled in animals, they have not been addressed to the same extent in plants. Here, we estimated nucleotide diversity at synonymous, πS, and nonsynonymous sites, πN, and a measure of the efficacy of selection, the ratio πN/πS, in 34 animal and 28 plant species using full genome data. We then evaluated the relationship of nucleotide diversity and selection efficacy with effective population size, the distribution of fitness effect and life history traits. In animals, our data confirm that longevity and propagule size are the variables that best explain the variation in πS among species. In plants longevity also plays a major role as well as mating system. As predicted by the nearly neutral theory of molecular evolution, the log of πN/πS decreased linearly with the log of πS but the slope was weaker in plants than in animals. This appears to be due to a higher mutation rate in long lived plants, and the difference disappears when πS is rescaled by the mutation rate. Differences in the distribution of fitness effect of new mutations also contributed to variation in πN/πS among species.

Journal ArticleDOI
TL;DR: A novel algorithm called fastGEAR is introduced which identifies lineages in diverse microbial alignments, and recombinations between them and from external origins, and provides insight into recombinations affecting deep branches of the phylogenetic tree.
Abstract: Prokaryotic evolution is affected by horizontal transfer of genetic material through recombination. Inference of an evolutionary tree of bacteria thus relies on accurate identification of the population genetic structure and recombination-derived mosaicism. Rapidly growing databases represent a challenge for computational methods to detect recombinations in bacterial genomes. We introduce a novel algorithm called fastGEAR which identifies lineages in diverse microbial alignments, and recombinations between them and from external origins. The algorithm detects both recent recombinations (affecting a few isolates) and ancestral recombinations between detected lineages (affecting entire lineages), thus providing insight into recombinations affecting deep branches of the phylogenetic tree. In simulations, fastGEAR had comparable power to detect recent recombinations and outstanding power to detect the ancestral ones, compared with state-of-the-art methods, often with a fraction of computational cost. We demonstrate the utility of the method by analyzing a collection of 616 whole-genomes of a recombinogenic pathogen Streptococcus pneumoniae, for which the method provided a high-resolution view of recombination across the genome. We examined in detail the penicillin-binding genes across the Streptococcus genus, demonstrating previously undetected genetic exchanges between different species at these three loci. Hence, fastGEAR can be readily applied to investigate mosaicism in bacterial genes across multiple species. Finally, fastGEAR correctly identified many known recombination hotspots and pointed to potential new ones. Matlab code and Linux/Windows executables are available at https://users.ics.aalto.fi/~pemartti/fastGEAR/ (last accessed February 6, 2017).

Journal ArticleDOI
TL;DR: Collateral sensitivity can result from resistance mutations in regulatory genes such as nalC or mexZ, which mediate aminoglycoside sensitivity in β-lactam-adapted populations, or the two-component regulatory system gene pmrB, which enhances penicillin sensitivity in gentamicin-resistant populations, which in turn determine their potential in antibiotic therapy.
Abstract: When bacteria evolve resistance against a particular antibiotic, they may simultaneously gain increased sensitivity against a second one. Such collateral sensitivity may be exploited to develop novel, sustainable antibiotic treatment strategies aimed at containing the current, dramatic spread of drug resistance. To date, the presence and molecular basis of collateral sensitivity has only been studied in few bacterial species and is unknown for opportunistic human pathogens such as Pseudomonas aeruginosa. In the present study, we assessed patterns of collateral effects by experimentally evolving 160 independent populations of P. aeruginosa to high levels of resistance against eight commonly used antibiotics. The bacteria evolved resistance rapidly and expressed both collateral sensitivity and cross-resistance. The pattern of such collateral effects differed to those previously reported for other bacterial species, suggesting interspecific differences in the underlying evolutionary trade-offs. Intriguingly, we also identified contrasting patterns of collateral sensitivity and cross-resistance among the replicate populations adapted to the same drug. Whole-genome sequencing of 81 independently evolved populations revealed distinct evolutionary paths of resistance to the selective drug, which determined whether bacteria became cross-resistant or collaterally sensitive towards others. Based on genomic and functional genetic analysis, we demonstrate that collateral sensitivity can result from resistance mutations in regulatory genes such as nalC or mexZ, which mediate aminoglycoside sensitivity in β-lactam-adapted populations, or the two-component regulatory system gene pmrB, which enhances penicillin sensitivity in gentamicin-resistant populations. Our findings highlight substantial variation in the evolved collateral effects among replicates, which in turn determine their potential in antibiotic therapy.

Journal ArticleDOI
TL;DR: A new summary statistic, β, is proposed, which detects clusters of alleles at similar frequencies at potentially subjected to long-term balancing selection in humans and reports two balanced haplotypes—localized to the genes WFS1 and CADM2—that are strongly linked to association signals for complex traits.
Abstract: Balancing selection occurs when multiple alleles are maintained in a population, which can result in their preservation over long evolutionary time periods. A characteristic signature of this long-term balancing selection is an excess number of intermediate frequency polymorphisms near the balanced variant. However, the expected distribution of allele frequencies at these loci has not been extensively detailed, and therefore existing summary statistic methods do not explicitly take it into account. Using simulations, we show that new mutations which arise in close proximity to a site targeted by balancing selection accumulate at frequencies nearly identical to that of the balanced allele. In order to scan the genome for balancing selection, we propose a new summary statistic, β, which detects these clusters of alleles at similar frequencies. Simulation studies show that compared with existing summary statistics, our measure has improved power to detect balancing selection, and is reasonably powered in non-equilibrium demographic models and under a range of recombination and mutation rates. We compute β on 1000 Genomes Project data to identify loci potentially subjected to long-term balancing selection in humans. We report two balanced haplotypes-localized to the genes WFS1 and CADM2-that are strongly linked to association signals for complex traits. Our approach is computationally efficient and applicable to species that lack appropriate outgroup sequences, allowing for well-powered analysis of selection in the wide variety of species for which population data are rapidly being generated.

Journal ArticleDOI
TL;DR: It is concluded that selection is a major contributor to the transition:transversion substitution bias in viruses and that this effect is only partially explained by the greater likelihood of transversion mutations to cause radical as opposed to conservative amino acid changes.
Abstract: The substitution rates of transitions are higher than expected by chance relative to those of transversions. Many have argued that selection disfavors transversions, as nonsynonymous transversions are less likely to conserve biochemical properties of the original amino acid. Only recently has it become feasible to directly test this selective hypothesis by comparing the fitness effects of a large number of transition and transversion mutations. For example, a recent study of six viruses and one beta-lactamase gene did not find evidence supporting the selective hypothesis. Here, we analyze the relative fitness effects of transition and transversion mutations from our recently published genome-wide study of mutational fitness effects in influenza virus. In contrast to prior work, we find that transversions are significantly more detrimental than transitions. Using what we believe to be an improved statistical framework, we also identify a similar trend in two HIV data sets. We further demonstrate a fitness difference in transition and transversion mutations using four deep mutational scanning data sets of influenza virus and HIV, which provided adequate statistical power. We find that three of the most commonly cited radical/conservative amino acid categories are predictive of fitness, supporting their utility in studies of positive selection and codon usage bias. We conclude that selection is a major contributor to the transition:transversion substitution bias in viruses and that this effect is only partially explained by the greater likelihood of transversion mutations to cause radical as opposed to conservative amino acid changes.

Journal ArticleDOI
TL;DR: In this study, the shell matrix proteins of four highly divergent bivalves were analyzed and a significant number of the identified SMPs contained domains related to immune functions, implying their involvement not only in immunity, but also environmental adaptation.
Abstract: Bivalves have evolved a range of complex shell forming mechanisms that are reflected by their incredible diversity in shell mineralogy and microstructures. A suite of proteins exported to the shell matrix space plays a significant role in controlling these features, in addition to underpinning some of the physical properties of the shell itself. Although, there is a general consensus that a minimum basic protein tool kit is required for shell construction, to date, this remains undefined. In this study the shell matrix proteins (SMPs) of four highly divergent bivalves (The Pacific oyster, Crassostrea gigas; the blue mussel, Mytilus edulis; the clam, Mya truncata and the king scallop, Pecten maximus) were analyzed in an identical fashion using proteomics pipeline. This enabled us to identify the critical elements of a “basic tool kit” for calcification processes, which were conserved across the taxa irrespective of the shell morphology and arrangement of the crystal surfaces. In addition, protein domains controlling the crystal layers specific to aragonite and calcite were also identified. Intriguingly, a significant number of the identified SMPs contained domains related to immune functions. These were often are unique to each species implying their involvement not only in immunity, but also environmental adaptation. This suggests that the SMPs are selectively exported in a complex mix to endow the shell with both mechanical protection and biochemical defense.

Journal ArticleDOI
TL;DR: This is an example where hybrid genome resolution is driven by positive selection on existing heterozygosity and demonstrates that even infrequent outcrossing may have lasting impacts on adaptation.
Abstract: Hybridization is often considered maladaptive, but sometimes hybrids can invade new ecological niches and adapt to novel or stressful environments better than their parents. The genomic changes that occur following hybridization that facilitate genome resolution and/or adaptation are not well understood. Here, we examine hybrid genome evolution using experimental evolution of de novo interspecific hybrid yeast Saccharomyces cerevisiae × Saccharomyces uvarum and their parentals. We evolved these strains in nutrient-limited conditions for hundreds of generations and sequenced the resulting cultures identifying numerous point mutations, copy number changes, and loss of heterozygosity (LOH) events, including species-biased amplification of nutrient transporters. We focused on a particularly interesting example, in which we saw repeated LOH at the high-affinity phosphate transporter gene PHO84 in both intra- and interspecific hybrids. Using allele replacement methods, we tested the fitness of different alleles in hybrid and S. cerevisiae strain backgrounds and found that the LOH is indeed the result of selection on one allele over the other in both S. cerevisiae and the hybrids. This is an example where hybrid genome resolution is driven by positive selection on existing heterozygosity and demonstrates that even infrequent outcrossing may have lasting impacts on adaptation.

Journal ArticleDOI
TL;DR: It is demonstrated that genes encoding ion channels KCND3, CACNA1FB, and ATP4A were differentially methylated between the marine and the freshwater populations, suggesting that an immediate epigenetic response to freshwater conditions can be maintained in freshwater population.
Abstract: The three-spined stickleback (Gasterosteus aculeatus) represents a convenient model to study microevolution-adaptation to a freshwater environment. Although genetic adaptations to freshwater environments are well-studied, epigenetic adaptations have attracted little attention. In this work, we investigated the role of DNA methylation in the adaptation of the marine stickleback population to freshwater conditions. DNA methylation profiling was performed in marine and freshwater populations of sticklebacks, as well as in marine sticklebacks placed into a freshwater environment and freshwater sticklebacks placed into seawater. We showed that the DNA methylation profile after placing a marine stickleback into fresh water partially converged to that of a freshwater stickleback. For six genes including ATP4A ion pump and NELL1, believed to be involved in skeletal ossification, we demonstrated similar changes in DNA methylation in both evolutionary and short-term adaptation. This suggested that an immediate epigenetic response to freshwater conditions can be maintained in freshwater population. Interestingly, we observed enhanced epigenetic plasticity in freshwater sticklebacks that may serve as a compensatory regulatory mechanism for the lack of genetic variation in the freshwater population. For the first time, we demonstrated that genes encoding ion channels KCND3, CACNA1FB, and ATP4A were differentially methylated between the marine and the freshwater populations. Other genes encoding ion channels were previously reported to be under selection in freshwater populations. Nevertheless, the genes that harbor genetic and epigenetic changes were not the same, suggesting that epigenetic adaptation is a complementary mechanism to selection of genetic variants favorable for freshwater environment.

Journal ArticleDOI
TL;DR: An exact numerical solution to the structured coalescent that does not require the inference of migration histories is presented, which is computationally unfeasible for large data sets, and clarifies the assumptions of previously developed approximate methods and allows us to provide an improved approximation to the Structured coalescent.
Abstract: Phylogeographic methods can help reveal the movement of genes between populations of organisms. This has been widely done to quantify pathogen movement between different host populations, the migration history of humans, and the geographic spread of languages or gene flow between species using the location or state of samples alongside sequence data. Phylogenies therefore offer insights into migration processes not available from classic epidemiological or occurrence data alone. Phylogeographic methods have however several known shortcomings. In particular, one of the most widely used methods treats migration the same as mutation, and therefore does not incorporate information about population demography. This may lead to severe biases in estimated migration rates for data sets where sampling is biased across populations. The structured coalescent on the other hand allows us to coherently model the migration and coalescent process, but current implementations struggle with complex data sets due to the need to infer ancestral migration histories. Thus, approximations to the structured coalescent, which integrate over all ancestral migration histories, have been developed. However, the validity and robustness of these approximations remain unclear. We present an exact numerical solution to the structured coalescent that does not require the inference of migration histories. Although this solution is computationally unfeasible for large data sets, it clarifies the assumptions of previously developed approximate methods and allows us to provide an improved approximation to the structured coalescent. We have implemented these methods in BEAST2, and we show how these methods compare under different scenarios.

Journal ArticleDOI
TL;DR: A direct estimate of the mutation rate in the bumblebee (Bombus terrestris), this being a close relative of the honeybee but with a much lower recombination rate, and evidence for a direct coupling between recombination and mutation is found.
Abstract: Accurate knowledge of the mutation rate provides a base line for inferring expected rates of evolution, for testing evolutionary hypotheses and for estimation of key parameters. Advances in sequencing technology now permit direct estimates of the mutation rate from sequencing of close relatives. Within insects there have been three prior such estimates, two in nonsocial insects (Drosophila: 2.8 × 10-9 per bp per haploid genome per generation; Heliconius: 2.9 × 10-9) and one in a social species, the honeybee (3.4 × 10-9). Might the honeybee's rate be ∼20% higher because it has an exceptionally high recombination rate and recombination may be directly or indirectly mutagenic? To address this possibility, we provide a direct estimate of the mutation rate in the bumblebee (Bombus terrestris), this being a close relative of the honeybee but with a much lower recombination rate. We confirm that the crossover rate of the bumblebee is indeed much lower than honeybees (8.7 cM/Mb vs. 37 cM/Mb). Importantly, we find no significant difference in the mutation rates: we estimate for bumblebees a rate of 3.6 × 10-9 per haploid genome per generation (95% confidence intervals 2.38 × 10-9 and 5.37 × 10-9) which is just 5% higher than the estimate that of honeybees. Both genomes have approximately one new mutation per haploid genome per generation. While we find evidence for a direct coupling between recombination and mutation (also seen in honeybees), the effect is so weak as to leave almost no footprint on any between-species differences. The similarity in mutation rates suggests an approximate constancy of the mutation rate in insects.

Journal ArticleDOI
TL;DR: The central conclusions are that retrospective studies may underestimate the complexity of selective events and the Ne relevant for adaptation for malaria is considerably higher than previously estimated.
Abstract: Multiple kelch13 alleles conferring artemisinin resistance (ART-R) are currently spreading through Southeast Asian malaria parasite populations, providing a unique opportunity to observe an ongoing soft selective sweep, investigate why resistance alleles have evolved multiple times and determine fundamental population genetic parameters for Plasmodium We sequenced kelch13 (n = 1,876), genotyped 75 flanking SNPs, and measured clearance rate (n = 3,552) in parasite infections from Western Thailand (2001-2014). We describe 32 independent coding mutations including common mutations outside the kelch13 propeller associated with significant reductions in clearance rate. Mutations were first observed in 2003 and rose to 90% by 2014, consistent with a selection coefficient of ∼0.079. ART-R allele diversity rose until 2012 and then dropped as one allele (C580Y) spread to high frequency. The frequency with which adaptive alleles arise is determined by the rate of mutation and the population size. Two factors drive this soft sweep: (1) multiple kelch13 amino-acid mutations confer resistance providing a large mutational target-we estimate the target is 87-163 bp. (2) The population mutation parameter (Θ = 2Neμ) can be estimated from the frequency distribution of ART-R alleles and is ∼5.69, suggesting that short term effective population size is 88 thousand to 1.2 million. This is 52-705 times greater than Ne estimated from fluctuation in allele frequencies, suggesting that we have previously underestimated the capacity for adaptive evolution in Plasmodium Our central conclusions are that retrospective studies may underestimate the complexity of selective events and the Ne relevant for adaptation for malaria is considerably higher than previously estimated.

Journal ArticleDOI
TL;DR: It is proposed that the down-regulation of EPAS1 contributes to the molecular basis of Tibetans’ adaption to high-altitude hypoxia.
Abstract: Tibetans are well adapted to the hypoxic environments at high altitude, yet the molecular mechanism of this adaptation remains elusive. We reported comprehensive genetic and functional analyses of EPAS1, a gene encoding hypoxia inducible factor 2α (HIF-2α) with the strongest signal of selection in previous genome-wide scans of Tibetans. We showed that the Tibetan-enriched EPAS1 variants down-regulate expression in human umbilical endothelial cells and placentas. Heterozygous EPAS1 knockout mice display blunted physiological responses to chronic hypoxia, mirroring the situation in Tibetans. Furthermore, we found that the Tibetan version of EPAS1 is not only associated with the relatively low hemoglobin level as a polycythemia protectant, but also is associated with a low pulmonary vasoconstriction response in Tibetans. We propose that the down-regulation of EPAS1 contributes to the molecular basis of Tibetans' adaption to high-altitude hypoxia.

Journal ArticleDOI
TL;DR: It is shown that the vast majority of the genes missing from avian genome assemblies are actually present in most species of birds, and a positive and significant correlation between the ratio of nonsynonymous to synonymous substitution rate (dN/dS) and life-history traits in Neoaves is uncovered.
Abstract: According to current assemblies, avian genomes differ from those of the other lineages of amniotes in 1) containing a lower number of genes; 2) displaying a high stability of karyotype and recombination map; and 3) lacking any correlation between evolutionary rates (dN/dS) and life-history traits, unlike mammals and nonavian reptiles. We question the reality of the bird missing genes and investigate whether insufficient representation of bird gene content might have biased previous evolutionary analyses. Mining RNAseq data, we show that the vast majority of the genes missing from avian genome assemblies are actually present in most species of birds. These mainly correspond to the GC-rich fraction of the bird genome, which is the most difficult to sequence, assemble and annotate. With the inclusion of these genes in a phylogenomic analysis of high-quality alignments, we uncover a positive and significant correlation between the ratio of nonsynonymous to synonymous substitution rate (dN/dS) and life-history traits in Neoaves. We report a strong effect of GC-biased gene conversion on the dN/dS ratio in birds and a peculiar behavior of Palaeognathae (ostrich and allies) and Galloanserae (chickens, ducks and allies). Avian genomes do not contain fewer genes than mammals or nonavian reptiles. Previous analyses have overlooked ∼15% of the bird gene complement. GC-rich regions, which are the most difficult to access, are a key component of amniote genomes. They experience peculiar molecular processes and must be included for unbiased functional and comparative genomic analyses in birds.