scispace - formally typeset
Search or ask a question

Showing papers on "Phylogenetic tree published in 2021"


Journal ArticleDOI
TL;DR: PANTHER as mentioned in this paper is a publicly available knowledgebase that stores the results of an extensive phylogenetic reconstruction pipeline that includes computational and manual processes and quality control steps, which can be used in a variety of applications.
Abstract: Phylogenetics is a powerful tool for analyzing protein sequences, by inferring their evolutionary relationships to other proteins. However, phylogenetics analyses can be challenging: they are computationally expensive and must be performed carefully in order to avoid systematic errors and artifacts. PANTHER (http://pantherdb.org) is a publicly available, user-focused knowledgebase that stores the results of an extensive phylogenetic reconstruction pipeline that includes computational and manual processes and quality control steps. First, fully reconciled phylogenetic trees (including ancestral protein sequences) are reconstructed for a set of "reference" protein sequences obtained from fully sequenced genomes of organisms across the tree of life. Second, the resulting phylogenetic trees are manually reviewed and annotated with function evolution events: inferred gains and losses of protein function along branches of the phylogenetic tree. Here, we describe in detail the current contents of PANTHER, how those contents are generated, and how they can be used in a variety of applications. The PANTHER knowledgebase can be downloaded or accessed via an extensive API. In addition, PANTHER provides software tools to facilitate the application of the knowledgebase to common protein sequence analysis tasks: exploring an annotated genome by gene function; performing "enrichment analysis" of lists of genes; annotating a single sequence or large batch of sequences by homology; and assessing the likelihood that a genetic variant at a particular site in a protein will have deleterious effects. This article is protected by copyright. All rights reserved.

146 citations


Journal ArticleDOI
TL;DR: Results of phylogenetic analyses, in particular those conducted under the default settings of current phylogenetic inference tools, as well as downstream analyses on the inferred phylogenies, should be considered and interpreted with extreme caution.
Abstract: Numerous studies covering some aspects of SARS-CoV-2 data analyses are being published on a daily basis, including a regularly updated phylogeny on nextstrain.org. Here, we review the difficulties of inferring reliable phylogenies by example of a data snapshot comprising a quality-filtered subset of 8,736 out of all 16,453 virus sequences available on May 5, 2020 from gisaid.org. We find that it is difficult to infer a reliable phylogeny on these data due to the large number of sequences in conjunction with the low number of mutations. We further find that rooting the inferred phylogeny with some degree of confidence either via the bat and pangolin outgroups or by applying novel computational methods on the ingroup phylogeny does not appear to be credible. Finally, an automatic classification of the current sequences into subclasses using the mPTP tool for molecular species delimitation is also, as might be expected, not possible, as the sequences are too closely related. We conclude that, although the application of phylogenetic methods to disentangle the evolution and spread of COVID-19 provides some insight, results of phylogenetic analyses, in particular those conducted under the default settings of current phylogenetic inference tools, as well as downstream analyses on the inferred phylogenies, should be considered and interpreted with extreme caution.

120 citations


Journal ArticleDOI
TL;DR: It is found that phylogenetic signal deep in the amphibian phylogeny varies greatly across loci in a manner that is consistent with incomplete lineage sorting in the ancestral lineage of extant amphibians, and a surprisingly younger timescale for crown and ordinal amphibian diversification than previously reported.
Abstract: Molecular phylogenies have yielded strong support for many parts of the amphibian Tree of Life, but poor support for the resolution of deeper nodes, including relationships among families and orders. To clarify these relationships, we provide a phylogenomic perspective on amphibian relationships by developing a taxon-specific Anchored Hybrid Enrichment protocol targeting hundreds of conserved exons which are effective across the class. After obtaining data from 220 loci for 286 species (representing 94% of the families and 44% of the genera), we estimate a phylogeny for extant amphibians and identify gene tree-species tree conflict across the deepest branches of the amphibian phylogeny. We perform locus-by-locus genealogical interrogation of alternative topological hypotheses for amphibian monophyly, focusing on interordinal relationships. We find that phylogenetic signal deep in the amphibian phylogeny varies greatly across loci in a manner that is consistent with incomplete lineage sorting in the ancestral lineage of extant amphibians. Our results overwhelmingly support amphibian monophyly and a sister relationship between frogs and salamanders, consistent with the Batrachia hypothesis. Species tree analyses converge on a small set of topological hypotheses for the relationships among extant amphibian families. These results clarify several contentious portions of the amphibian Tree of Life, which in conjunction with a set of vetted fossil calibrations, support a surprisingly younger timescale for crown and ordinal amphibian diversification than previously reported. More broadly, our study provides insight into the sources, magnitudes, and heterogeneity of support across loci in phylogenomic data sets.[AIC; Amphibia; Batrachia; Phylogeny; gene tree-species tree discordance; genomics; information theory.].

108 citations


Journal ArticleDOI
TL;DR: A robust phylogenomic framework to explore the tempo and mode of fungal evolution and offer directions for future fungal phylogenetic and taxonomic studies is provided.

104 citations


Posted ContentDOI
09 Jan 2021-bioRxiv
TL;DR: This work leverage 155 genome assemblies, from 149 species, to generate a fossil-calibrated phylogeny and conduct multilocus tests for introgression across nine monophyletic radiations within the genus Drosophila, providing the first evidence of introgressive events occurring across the evolutionary history of this genus.
Abstract: Genome-scale sequence data has invigorated the study of hybridization and introgression, particularly in animals. However, outside of a few notable cases, we lack systematic tests for introgression at a larger phylogenetic scale across entire clades. Here we leverage 155 genome assemblies, from 149 species, to generate a fossil-calibrated phylogeny and conduct multilocus tests for introgression across 9 monophyletic radiations within the genus Drosophila. Using complementary phylogenomic approaches, we identify widespread introgression across the evolutionary history of Drosophila. Mapping gene-tree discordance onto the phylogeny revealed that both ancient and recent introgression has occurred, with introgression at the base of species radiations being particularly common. Our results provide the first evidence of introgression occurring across the evolutionary history of Drosophila and highlight the need to continue to study the evolutionary consequences of hybridization and introgression in this genus and across the Tree of Life.

90 citations


Journal ArticleDOI
TL;DR: This article performed phylogenetic analyses of Lamiaceae to infer relationships at the tribal level using 79 protein-coding plastid genes from 175 accessions representing 170 taxa, 79 genera, and all 12 subfamilies.
Abstract: A robust molecular phylogeny is fundamental for developing a stable classification and providing a solid framework to understand patterns of diversification, historical biogeography, and character evolution. As the sixth largest angiosperm family, Lamiaceae, or the mint family, consitutes a major source of aromatic oil, wood, ornamentals, and culinary and medicinal herbs, making it an exceptionally important group ecologically, ethnobotanically, and floristically. The lack of a reliable phylogenetic framework for this family has thus far hindered broad-scale biogeographic studies and our comprehension of diversification. Although significant progress has been made towards clarifying Lamiaceae relationships during the past three decades, the resolution of a phylogenetic backbone at the tribal level has remained one of the greatest challenges due to limited availability of genetic data. We performed phylogenetic analyses of Lamiaceae to infer relationships at the tribal level using 79 protein-coding plastid genes from 175 accessions representing 170 taxa, 79 genera, and all 12 subfamilies. Both maximum likelihood and Bayesian analyses yielded a more robust phylogenetic hypothesis relative to previous studies and supported the monophyly of all 12 subfamilies, and a classification for 22 tribes, three of which are newly recognized in this study. As a consequence, we propose an updated phylogenetically informed tribal classification for Lamiaceae that is supplemented with a detailed summary of taxonomic history, generic and species diversity, morphology, synapomorphies, and distribution for each subfamily and tribe. Increased taxon sampling conjoined with phylogenetic analyses based on plastome sequences has provided robust support at both deep and shallow nodes and offers new insights into the phylogenetic relationships among tribes and subfamilies of Lamiaceae. This robust phylogenetic backbone of Lamiaceae will serve as a framework for future studies on mint classification, biogeography, character evolution, and diversification.

89 citations


Journal ArticleDOI
TL;DR: Shuangbin Xu Zehan Dai Southern Medical University Pingfan GuoSouthern Medical University Xiaocong Fu Southern Medical university Shanshan Liu Southern MedicalUniversity Lang Zhou Southern Medical universities Wenli Tang Southern Medical Universities Tingze Feng Southern Medical School.
Abstract: We present the ggtreeExtra package for visualizing heterogeneous data with a phylogenetic tree in a circular or rectangular layout (https://www.bioconductor.org/packages/ggtreeExtra). The package supports more data types and visualization methods than other tools. It supports using the grammar of graphics syntax to present data on a tree with richly annotated layers and allows evolutionary statistics inferred by commonly used software to be integrated and visualized with external data. GgtreeExtra is a universal tool for tree data visualization. It extends the applications of the phylogenetic tree in different disciplines by making more domain-specific data to be available to visualize and interpret in the evolutionary context.

84 citations


Journal ArticleDOI
TL;DR: In this paper, the authors leverage 155 genome assemblies from 149 species to generate a fossil-calibrated phylogeny and conduct multilocus tests for introgression across 9 monophyletic radiations within the genus Drosophila.

73 citations


Posted ContentDOI
27 May 2021-bioRxiv
TL;DR: It is concluded that there has been relatively recent geographic movement and co-circulation of these viruses’ ancestors, extending across their bat host ranges in China and Southeast Asia over the last 100 years or so.
Abstract: Summary The lack of an identifiable intermediate host species for the proximal animal ancestor of SARS-CoV-2, and the large geographical distance between Wuhan and where the closest evolutionary related coronaviruses circulating in horseshoe bats (Sarbecoviruses) have been identified, is fuelling speculation on the natural origins of SARS-CoV-2. We have comprehensively analysed phylogenetic relations between SARS-CoV-2, and the related bat and pangolin Sarbecoviruses sampled so far. Determining the likely recombination events reveals a highly reticulate evolutionary history within this group of coronaviruses. Clustering of the inferred recombination events is non-random with evidence that Spike, the main target for humoral immunity, is beside a recombination hotspot likely driving antigenic shift in the ancestry of bat Sarbecoviruses. Coupled with the geographic ranges of their hosts and the sampling locations, across southern China, and into Southeast Asia, we confirm horseshoe bats, Rhinolophus, are the likely SARS-CoV-2 progenitor reservoir species. By tracing the recombinant sequence patterns, we conclude that there has been relatively recent geographic movement and co-circulation of these viruses’ ancestors, extending across their bat host ranges in China and Southeast Asia over the last 100 years or so. We confirm that a direct proximal ancestor to SARS-CoV-2 is yet to be sampled, since the closest relative shared a common ancestor with SARS-CoV-2 approximately 40 years ago. Our analysis highlights the need for more wildlife sampling to (i) pinpoint the exact origins of SARS-CoV-2’s animal progenitor, and (ii) survey the extent of the diversity in the related Sarbecoviruses’ phylogeny that present high risk for future spillover. Highlights The origin of SARS-CoV-2 can be traced to horseshoe bats, genus Rhinolophus, with ranges in both China and Southeast Asia. The closest known relatives of SARS-CoV-2 exhibit frequent transmission among their Rhinolophus host species. Sarbecoviruses have undergone extensive recombination throughout their evolutionary history. Accounting for the mosaic patterns of these recombinants is important when inferring relatedness to SARS-CoV-2. Breakpoint patterns are consistent with recombination hotspots in the coronavirus genome, particularly upstream of the pike open reading frame with a coldspot in S1.

70 citations


Journal ArticleDOI
19 Jan 2021
TL;DR: The most complete database of microorganisms identified as being capable of degrading plastics to date is provided in this article, where the authors collated data on genes and enzymes related to the degradation of all types of plastic to identify 16,170 putative plastic degradation orthologs by mining publicly available microbial genomes.
Abstract: The number of plastic-degrading microorganisms reported is rapidly increasing, making it possible to explore the conservation and distribution of presumed plastic-degrading traits across the diverse microbial tree of life. Putative degraders of conventional high-molecular-weight polymers, including polyamide, polystyrene, polyvinylchloride, and polypropylene, are spread widely across bacterial and fungal branches of the tree of life, although evidence for plastic degradation by a majority of these taxa appears limited. In contrast, we found strong degradation evidence for the synthetic polymer polylactic acid (PLA), and the microbial species related to its degradation are phylogenetically conserved among the bacterial family Pseudonocardiaceae We collated data on genes and enzymes related to the degradation of all types of plastic to identify 16,170 putative plastic degradation orthologs by mining publicly available microbial genomes. The plastic with the largest number of putative orthologs, 10,969, was the natural polymer polyhydroxybutyrate (PHB), followed by the synthetic polymers polyethylene terephthalate (PET) and polycaprolactone (PCL), with 8,233 and 6,809 orthologs, respectively. These orthologous genes were discovered in the genomes of 6,000 microbial species, and most of them are as yet not identified as plastic degraders. Furthermore, all these species belong to 12 different microbial phyla, of which just 7 phyla have reported degraders to date. We have centralized information on reported plastic-degrading microorganisms within an interactive and updatable phylogenetic tree and database to confirm the global and phylogenetic diversity of putative plastic-degrading taxa and provide new insights into the evolution of microbial plastic-degrading capabilities and avenues for future discovery.IMPORTANCE We have collated the most complete database of microorganisms identified as being capable of degrading plastics to date. These data allow us to explore the phylogenetic distribution of these organisms and their enzymes, showing that traits for plastic degradation are predominantly not phylogenetically conserved. We found 16,170 putative plastic degradation orthologs in the genomes of 12 different phyla, which suggests a vast potential for the exploration of these traits in other taxa. Besides making the database available to the scientific community, we also created an interactive phylogenetic tree that can display all of the collated information, facilitating visualization and exploration of the data. Both the database and the tree are regularly updated to keep up with new scientific reports. We expect that our work will contribute to the field by increasing the understanding of the genetic diversity and evolution of microbial plastic-degrading traits.

60 citations


Journal ArticleDOI
TL;DR: This study suggests that using species-level phylogenies resolved at the genus level with species being attached to their genera as polytomies is appropriate in studies exploring patterns of phylogenetic structure of species in ecological communities across geographical and ecological gradients.

Journal ArticleDOI
TL;DR: Furin cleavage sites occurred independently for multiple times in the evolution of the coronavirus family, supporting the natural occurring hypothesis of SARS-CoV-2.

Journal ArticleDOI
TL;DR: In this article, the largest plastid dataset of angiosperms, composed of 80 genes from 4792 plastomes of 4660 species in 2024 genera representing all currently recognized families.
Abstract: Background Flowering plants (angiosperms) are dominant components of global terrestrial ecosystems, but phylogenetic relationships at the familial level and above remain only partially resolved, greatly impeding our full understanding of their evolution and early diversification. The plastome, typically mapped as a circular genome, has been the most important molecular data source for plant phylogeny reconstruction for decades. Results Here, we assembled by far the largest plastid dataset of angiosperms, composed of 80 genes from 4792 plastomes of 4660 species in 2024 genera representing all currently recognized families. Our phylogenetic tree (PPA II) is essentially congruent with those of previous plastid phylogenomic analyses but generally provides greater clade support. In the PPA II tree, 75% of nodes at or above the ordinal level and 78% at or above the familial level were resolved with high bootstrap support (BP ≥ 90). We obtained strong support for many interordinal and interfamilial relationships that were poorly resolved previously within the core eudicots, such as Dilleniales, Saxifragales, and Vitales being resolved as successive sisters to the remaining rosids, and Santalales, Berberidopsidales, and Caryophyllales as successive sisters to the asterids. However, the placement of magnoliids, although resolved as sister to all other Mesangiospermae, is not well supported and disagrees with topologies inferred from nuclear data. Relationships among the five major clades of Mesangiospermae remain intractable despite increased sampling, probably due to an ancient rapid radiation. Conclusions We provide the most comprehensive dataset of plastomes to date and a well-resolved phylogenetic tree, which together provide a strong foundation for future evolutionary studies of flowering plants.

Journal ArticleDOI
TL;DR: It is shown that coalescent approaches and multi-locus phylogeny are crucial to establish species boundaries in Colletotrichum and no single marker could discriminate between species in all complexes.
Abstract: Colletotrichum is one of the most important plant pathogenic genera that is responsible for numerous diseases which can have a profound impact on the agricultural sector. Species delineation is difficult due to a lack of distinctive phenotypic variation. Therefore, in this study three different genomic approaches based on phylogenetic, evolutionary and coalescent-based methods are applied to establish robust species boundaries. The reliability of five different DNA barcodes was also assessed to provide further insights into species delineation. The ITS region can resolve the placement of taxa up to the species complex level. The GAPDH and TUB2 markers are determined to be the most informative for most complexes. However, no single marker could discriminate between species in all complexes, therefore different molecular approaches based on multi-locus datasets are recommended. This is the first study to provide an estimated divergence time for all species complexes in Colletotrichum. The estimated divergence time for species complexes ranged between 4.8 to 32.2 MYA. Based on the high level of congruent results obtained from the different molecular approaches, a new species complex, the Colletotrichum agaves complex is introduced. This complex consists of five taxa which are characterised by the presence of straight or slightly curved conidia with obtuse apices. This study shows that coalescent approaches and multi-locus phylogeny are crucial to establish species boundaries in Colletotrichum. The taxonomic placement of three singleton taxa Colletotrichum axonopodi, C. cariniferi and C. parallelophorum is revised. We accept 248 species and provide recommendations regarding species boundaries in the graminicola–caudatum complex.

Journal ArticleDOI
TL;DR: The results suggest that multiple processes have been involved in the evolutionary history of Polemonium and that the plastid genome does not accurately reflect species relationships.
Abstract: Phylogenomic data from a rapidly increasing number of studies provide new evidence for resolving relationships in recently radiated clades, but they also pose new challenges for inferring evolutionary histories. Most existing methods for reconstructing phylogenetic hypotheses rely solely on algorithms that only consider incomplete lineage sorting (ILS) as a cause of intra- or intergenomic discordance. Here, we utilize a variety of methods, including those to infer phylogenetic networks, to account for both ILS and introgression as a cause for nuclear and cytoplasmic-nuclear discordance using phylogenomic data from the recently radiated flowering plant genus Polemonium (Polemoniaceae), an ecologically diverse genus in Western North America with known and suspected gene flow between species. We find evidence for widespread discordance among nuclear loci that can be explained by both ILS and reticulate evolution in the evolutionary history of Polemonium. Furthermore, the histories of organellar genomes show strong discordance with the inferred species tree from the nuclear genome. Discordance between the nuclear and plastid genome is not completely explained by ILS, and only one case of discordance is explained by detected introgression events. Our results suggest that multiple processes have been involved in the evolutionary history of Polemonium and that the plastid genome does not accurately reflect species relationships. We discuss several potential causes for this cytoplasmic-nuclear discordance, which emerging evidence suggests is more widespread across the Tree of Life than previously thought. [Cyto-nuclear discordance, genomic discordance, phylogenetic networks, plastid capture, Polemoniaceae, Polemonium, reticulations.].

Journal ArticleDOI
TL;DR: The results showed that both species share 95% of their genes corresponding to more than 700 strain-specific proteins, preventing species delimitation based on traditional concepts and providing an important starting point for the establishment of a stable phylogeny of the Xylariales.
Abstract: The Hypoxylaceae (Xylariales, Ascomycota) is a diverse family of mainly saprotrophic fungi, which commonly occur in angiosperm-dominated forests around the world. Despite their importance in forest and plant ecology as well as a prolific source of secondary metabolites and enzymes, genome sequences of related taxa are scarce and usually derived from environmental isolates. To address this lack of knowledge thirteen taxonomically well-defined representatives of the family and one member of the closely related Xylariaceae were genome sequenced using combinations of Illumina and Oxford nanopore technologies or PacBio sequencing. The workflow leads to high quality draft genome sequences with an average N50 of 3.0 Mbp. A backbone phylogenomic tree was calculated based on the amino acid sequences of 4912 core genes reflecting the current accepted taxonomic concept of the Hypoxylaceae. A Percentage of Conserved Proteins (POCP) analysis revealed that 70% of the proteins are conserved within the family, a value with potential application for the definition of family boundaries within the order Xylariales. Also, Hypomontagnella spongiphila is proposed as a new marine derived lineage of Hypom. monticulosa based on in-depth genomic comparison and morphological differences of the cultures. The results showed that both species share 95% of their genes corresponding to more than 700 strain-specific proteins. This difference is not reflected by standard taxonomic assessments (morphology of sexual and asexual morph, chemotaxonomy, phylogeny), preventing species delimitation based on traditional concepts. Genetic changes are likely to be the result of environmental adaptations and selective pressure, the driving force of speciation. These data provide an important starting point for the establishment of a stable phylogeny of the Xylariales; they enable studies on evolution, ecological behavior and biosynthesis of natural products; and they significantly advance the taxonomy of fungi.

Journal ArticleDOI
26 Feb 2021
TL;DR: A phylogenetic classification or cladonomy of the extant amphibians derived from a supermatrix-based phylogenetic analysis using 4060 amphibian species is presented, which allows a bijective or isomorphic relationship between the phylogenetic hypothesis and the classification through a rigorous use of suprageneric ranks.
Abstract: Although currently most taxonomists claim to adhere to the concept of ‘phylogenetic taxonomy’, in fact most of the zoological classifications currently published are only in part ‘phylogenetic’ but include also phenetic or gradist approaches, in their arbitrary choices of the nodes formally recognised as taxa and in their attribution of ranks to these taxa. We here propose a new approach to ‘phylogenetic taxonomy and nomenclature’, exemplified by a phylogenetic classification or cladonomy of the extant amphibians (subclass Lissamphibia of the class Amphibia) derived from a supermatrix-based phylogenetic analysis using 4060 amphibian species, i.e. about half of the 8235 species recognised on 31 October 2020. These taxa were represented by a mean of 3029 bp (range: 197–13849 bp) of DNA sequence data from a mean of 4 genes (range: 1‒15). The cladistic tree thus generated was transferred into a classification according to a new taxonomic and nomenclatural methodology presented here, which allows a bijective or isomorphic relationship between the phylogenetic hypothesis and the classification through a rigorous use of suprageneric ranks, in which their hierarchy mirrors the structure of the tree. Our methodology differs from all previous ones in several particulars: [1] whereas the current International Code of Zoological Nomenclature uses only three ‘groups of names’ (species, genus and family), we recognise four nominal-series (species, genus, family and class); [2] we strictly follow the Code for the establishment of the valid nomen (scientific name) of taxa in the three lower nominal-series (however, in a few situations, we suggest improvements to the current Rules of the Code); [3] we provide precise and unambiguous Criteria for the assignment of suprageneric nomina to either the family- or the class-series, excluding nomina proposed expressly under unranked or pseudoranked nomenclatural systems; [4] in the class-series, for which the Code provides only incomplete Rules concerning availability, we provide precise, complete and unambiguous Criteria for the nomenclatural availability, taxonomic allocation and nomenclatural validity and correctness of nomina; [5] we stress the fact that nomenclatural ranks do not have biological definitions or meanings and that they should never be used in an ‘absolute’ way (e.g., to express degrees of genetic or phenetic divergence between taxa or hypothesised ages of cladogeneses) but in a ‘relative’ way: two taxa which are considered phylogenetically as sister-taxa should always be attributed to the same nomenclatural rank, but taxa bearing the same rank in different ‘clades’ are by no means ‘equivalent’, as the number of ranks depends largely on the number of terminal taxa (species) and on the degree of phylogenetic resolution of the tree; [6] because of this lack of ‘equivalence’, some arbitrary criteria are necessary to fix a starting point for assigning a given suprageneric rank to some taxa, from which the ranks of all other taxa will automatically derive through a simple implementation of the hierarchy of ranks: for this purpose we chose the rank family and we propose a ‘Ten Criteria Procedure’ allowing to fix the position of this rank in any zoological classification. As a result of the implementation of this set of Criteria, we obtained a new ranked classification of extant lissamphibians using 25 suprageneric ranks below the rank class (11 class-series and 14 family-series ranks), and including 34 class-series and 573 family-series taxa, and where the 575 genera we recognise are referred to 69 families and 87 subfamilies. We provide new nomina and diagnoses for 10 class-series taxa, 171 family-series taxa, 14 genus-series taxa and 1 species. As many new species of amphibians are permanently described, this classification and its nomenclature will certainly have to change many times in the future but, using the clear, explicit, complete, automatic and unambiguous methodology presented here, these changes will be easy to implement, and will not depend on subjective and arbitrary choices as it has too often been the case in the last decades. We suggest that applying this methodology in other zoological groups would improve considerably the homogeneity, clarity and usefulness of zoological taxonomy and nomenclature.

Journal ArticleDOI
TL;DR: Zhang et al. as discussed by the authors sequenced the chloroplast genomes of all species of Atractylodes using high-throughput sequencing, and the results indicated that all species have a typical quadripartite structure and ranges from 152,294 bp (A. carlinoides) to 153,261‰bp(A. macrocephala) in size.
Abstract: Atractylodes DC is the basic original plant of the widely used herbal medicines “Baizhu” and “Cangzhu” and an endemic genus in East Asia. Species within the genus have minor morphological differences, and the universal DNA barcodes cannot clearly distinguish the systemic relationship or identify the species of the genus. In order to solve these question, we sequenced the chloroplast genomes of all species of Atractylodes using high-throughput sequencing. The results indicate that the chloroplast genome of Atractylodes has a typical quadripartite structure and ranges from 152,294 bp (A. carlinoides) to 153,261 bp (A. macrocephala) in size. The genome of all species contains 113 genes, including 79 protein-coding genes, 30 transfer RNA genes and four ribosomal RNA genes. Four hotspots, rpl22-rps19-rpl2, psbM-trnD, trnR-trnT(GGU), and trnT(UGU)-trnL, and a total of 42–47 simple sequence repeats (SSR) were identified as the most promising potentially variable makers for species delimitation and population genetic studies. Phylogenetic analyses of the whole chloroplast genomes indicate that Atractylodes is a clade within the tribe Cynareae; Atractylodes species form a monophyly that clearly reflects the relationship within the genus. Our study included investigations of the sequences and structural genomic variations, phylogenetics and mutation dynamics of Atractylodes chloroplast genomes and will facilitate future studies in population genetics, taxonomy and species identification.

Journal ArticleDOI
TL;DR: In this paper, the authors investigated the effect of taxonomic sampling via sequential deletion of basally branching pseudoscorpion superfamilies, as well as varying gene occupancy thresholds in supermatrices.
Abstract: Long-branch attraction is a systematic artifact that results in erroneous groupings of fast-evolving taxa. The combination of short, deep internodes in tandem with long-branch attraction artifacts has produced empirically intractable parts of the Tree of Life. One such group is the arthropod subphylum Chelicerata, whose backbone phylogeny has remained unstable despite improvements in phylogenetic methods and genome-scale data sets. Pseudoscorpion placement is particularly variable across data sets and analytical frameworks, with this group either clustering with other long-branch orders or with Arachnopulmonata (scorpions and tetrapulmonates). To surmount long-branch attraction, we investigated the effect of taxonomic sampling via sequential deletion of basally branching pseudoscorpion superfamilies, as well as varying gene occupancy thresholds in supermatrices. We show that concatenated supermatrices and coalescent-based summary species tree approaches support a sister group relationship of pseudoscorpions and scorpions, when more of the basally branching taxa are sampled. Matrix completeness had demonstrably less influence on tree topology. As an external arbiter of phylogenetic placement, we leveraged the recent discovery of an ancient genome duplication in the common ancestor of Arachnopulmonata as a litmus test for competing hypotheses of pseudoscorpion relationships. We generated a high-quality developmental transcriptome and the first genome for pseudoscorpions to assess the incidence of arachnopulmonate-specific duplications (e.g., homeobox genes and miRNAs). Our results support the inclusion of pseudoscorpions in Arachnopulmonata (new definition), as the sister group of scorpions. Panscorpiones (new name) is proposed for the clade uniting Scorpiones and Pseudoscorpiones.

Journal ArticleDOI
TL;DR: How synteny-based phylogeny can be complementary to traditional methods and could provide additional insights into some long-standing controversial phylogenetic relationships is discussed.
Abstract: Plant genomes vary greatly in size, organization, and architecture. Such structural differences may be highly relevant for inference of genome evolution dynamics and phylogeny. Indeed, microsynteny-the conservation of local gene content and order-is recognized as a valuable source of phylogenetic information, but its use for the inference of large phylogenies has been limited. Here, by combining synteny network analysis, matrix representation, and maximum likelihood phylogenetic inference, we provide a way to reconstruct phylogenies based on microsynteny information. Both simulations and use of empirical data sets show our method to be accurate, consistent, and widely applicable. As an example, we focus on the analysis of a large-scale whole-genome data set for angiosperms, including more than 120 available high-quality genomes, representing more than 50 different plant families and 30 orders. Our 'microsynteny-based' tree is largely congruent with phylogenies proposed based on more traditional sequence alignment-based methods and current phylogenetic classifications but differs for some long-contested and controversial relationships. For instance, our synteny-based tree finds Vitales as early diverging eudicots, Saxifragales within superasterids, and magnoliids as sister to monocots. We discuss how synteny-based phylogenetic inference can complement traditional methods and could provide additional insights into some long-standing controversial phylogenetic relationships.

Journal ArticleDOI
TL;DR: Fossils provide our only direct window into evolutionary events in the distant past, and incorporating them into phylogenetic hypotheses of living clades can help time-calibrate divergences, as well as...
Abstract: Fossils provide our only direct window into evolutionary events in the distant past. Incorporating them into phylogenetic hypotheses of living clades can help time-calibrate divergences, as well as...

Journal ArticleDOI
TL;DR: In this article, the Betula L (birch) is a pioneer hardwood tree species with ecological, economic, and evolutionary importance in the Northern Hemisphere and the authors sequenced the B platyphylla genome and assembled the sequences into 14 chromosomes.
Abstract: Betula L (birch) is a pioneer hardwood tree species with ecological, economic, and evolutionary importance in the Northern Hemisphere We sequenced the Betula platyphylla genome and assembled the sequences into 14 chromosomes The Betula genome lacks evidence of recent whole-genome duplication and has the same paleoploidy level as Vitis vinifera and Prunus mume Phylogenetic analysis of lignin pathway genes coupled with tissue-specific expression patterns provided clues for understanding the formation of higher ratios of syringyl to guaiacyl lignin observed in Betula species Our transcriptome analysis of leaf tissues under a time-series cold stress experiment revealed the presence of the MEKK1-MKK2-MPK4 cascade and six additional mitogen-activated protein kinases that can be linked to a gene regulatory network involving many transcription factors and cold tolerance genes Our genomic and transcriptome analyses provide insight into the structures, features, and evolution of the B platyphylla genome The chromosome-level genome and gene resources of B platyphylla obtained in this study will facilitate the identification of important and essential genes governing important traits of trees and genetic improvement of B platyphylla

Journal ArticleDOI
TL;DR: It is found that low-occupancy data sets analyzed as nucleotides can result in more congruent relationships than high occupancy data set analyzed as amino acids, as in phylotranscriptomics, and omitting data, through amino acid translation or via retention of only high occupancy loci, may have a deleterious effect in phylogenetic reconstruction.
Abstract: Genome-scale data sets are converging on robust, stable phylogenetic hypotheses for many lineages; however, some nodes have shown disagreement across classes of data. We use spiders (Araneae) as a system to identify the causes of incongruence in phylogenetic signal between three classes of data: exons (as in phylotranscriptomics), noncoding regions (included in ultraconserved elements [UCE] analyses), and a combination of both (as in UCE analyses). Gene orthologs, coded as amino acids and nucleotides (with and without third codon positions), were generated by querying published transcriptomes for UCEs, recovering 1,931 UCE loci (codingUCEs). We expected that congeners represented in the codingUCE and UCEs data would form clades in the presence of phylogenetic signal. Noncoding regions derived from UCE sequences were recovered to test the stability of relationships. Phylogenetic relationships resulting from all analyses were largely congruent. All nucleotide data sets from transcriptomes, UCEs, or a combination of both recovered similar topologies in contrast with results from transcriptomes analyzed as amino acids. Most relationships inferred from low-occupancy data sets, containing several hundreds of loci, were congruent across Araneae, as opposed to high occupancy data matrices with fewer loci, which showed more variation. Furthermore, we found that low-occupancy data sets analyzed as nucleotides (as is typical of UCE data sets) can result in more congruent relationships than high occupancy data sets analyzed as amino acids (as in phylotranscriptomics). Thus, omitting data, through amino acid translation or via retention of only high occupancy loci, may have a deleterious effect in phylogenetic reconstruction.

Journal ArticleDOI
TL;DR: In this paper, the authors used whole nuclear, plastid, and organellar genomes from 12 species in the rapidly radiated, ecologically diverse, and actively hybridizing genus of peatmoss (Sphagnum) to reconstruct the species phylogeny and quantify introgression using a suite of phylogenomic methods.
Abstract: The relative importance of introgression for diversification has long been a highly disputed topic in speciation research and remains an open question despite the great attention it has received over the past decade. Gene flow leaves traces in the genome similar to those created by incomplete lineage sorting (ILS), and identification and quantification of gene flow in the presence of ILS is challenging and requires knowledge about the true phylogenetic relationship among the species. We use whole nuclear, plastid, and organellar genomes from 12 species in the rapidly radiated, ecologically diverse, actively hybridizing genus of peatmoss (Sphagnum) to reconstruct the species phylogeny and quantify introgression using a suite of phylogenomic methods. We found extensive phylogenetic discordance among nuclear and organellar phylogenies, as well as across the nuclear genome and the nodes in the species tree, best explained by extensive ILS following the rapid radiation of the genus rather than by postspeciation introgression. Our analyses support the idea of ancient introgression among the ancestral lineages followed by ILS, whereas recent gene flow among the species is highly restricted despite widespread interspecific hybridization known in the group. Our results contribute to phylogenomic understanding of how speciation proceeds in rapidly radiated, actively hybridizing species groups, and demonstrate that employing a combination of diverse phylogenomic methods can facilitate untangling complex phylogenetic patterns created by ILS and introgression.

Journal ArticleDOI
TL;DR: In this paper, the authors used genome-wide capture of nuclear gene sequences, plus skimming of organellar sequences, to investigate the phylogenomics of monkeyflowers in Mimulus section Erythranthe (27 accessions from seven species).
Abstract: Inferences about past processes of adaptation and speciation require a gene-scale and genome-wide understanding of the evolutionary history of diverging taxa. In this study, we use genome-wide capture of nuclear gene sequences, plus skimming of organellar sequences, to investigate the phylogenomics of monkeyflowers in Mimulus section Erythranthe (27 accessions from seven species). Taxa within Erythranthe, particularly the parapatric and putatively sister species M. lewisii (bee-pollinated) and M. cardinalis (hummingbird-pollinated), have been a model system for investigating the ecological genetics of speciation and adaptation for over five decades. Across >8000 nuclear loci, multiple methods resolve a predominant species tree in which M. cardinalis groups with other hummingbird-pollinated taxa (37% of gene trees), rather than being sister to M. lewisii (32% of gene trees). We independently corroborate a single evolution of hummingbird pollination syndrome in Erythranthe by demonstrating functional redundancy in genetic complementation tests of floral traits in hybrids; together, these analyses overturn a textbook case of pollination-syndrome convergence. Strong asymmetries in allele sharing (Patterson’s D-statistic and related tests) indicate that gene tree discordance reflects ancient and recent introgression rather than incomplete lineage sorting. Consistent with abundant introgression blurring the history of divergence, low-recombination and adaptation-associated regions support the new species tree, while high-recombination regions generate phylogenetic evidence for sister status for M. lewisii and M. cardinalis. Population-level sampling of core taxa also revealed two instances of chloroplast capture, with Sierran M. lewisii and Southern Californian M. parishii each carrying organelle genomes nested within respective sympatric M. cardinalis clades. A recent organellar transfer from M. cardinalis, an outcrosser where selfish cytonuclear dynamics are more likely, may account for the unexpected cytoplasmic male sterility effects of selfer M. parishii organelles in hybrids with M. lewisii. Overall, our phylogenomic results reveal extensive reticulation throughout the evolutionary history of a classic monkeyflower radiation, suggesting that natural selection (re-)assembles and maintains species-diagnostic traits and barriers in the face of gene flow. Our findings further underline the challenges, even in reproductively isolated species, in distinguishing re-use of adaptive alleles from true convergence and emphasize the value of a phylogenomic framework for reconstructing the evolutionary genetics of adaptation and speciation.

Journal ArticleDOI
TL;DR: Using phylogenetics and “high order genomic structures” including trimer spectrums, codon usage and dinucleotide suppression, distinct clustering is observed of all human coronaviruses that formed phylogenetic clades with their closest animal relatives, indicating they have encompassed long evolutionary histories within specific ecological niches before jumping species barrier to infect humans.

Journal ArticleDOI
TL;DR: In this article, the authors applied the genome skimming approach of next-generation sequencing to address whether the lack of resolution at the tip of the Apioideae phylogenetic tree is due to limited information loci or the footprint of ancient radiation.

Journal ArticleDOI
TL;DR: The identification of a small number of putative recombinants within the first year of SARS-CoV-2 circulation underscores the need to sustain efforts to monitor the emergence of new genotypes generated through recombination.
Abstract: Viral recombination can generate novel genotypes with unique phenotypic characteristics, including transmissibility and virulence. Although the capacity for recombination among betacoronaviruses is well documented, recombination between strains of Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) has not been characterized in detail. Here, we present a lightweight approach for detecting genomes that are potentially recombinant. This approach relies on identifying the mutations that primarily determine SARSCoV-2 clade structure and then screening genomes for ones that contain multiple mutational markers from distinct clades. Among the over 537,000 genomes queried that were deposited on GISAID.org prior to 16 February 2021, we detected 1,175 potential recombinant sequences. Using a highly conservative criteria to exclude sequences that may have originated through de novo mutation, we find that at least 30 per cent (n = 358) are likely of recombinant origin. An analysis of deep-sequencing data for these putative recombinants, where available, indicated that the majority are high quality. Additional phylogenetic analysis and the observed co-circulation of predicted parent clades in the geographic regions of exposure further support the feasibility of recombination in this subset of potential recombinants. An analysis of these genomes did not reveal evidence for recombination hotspots in the SARS-CoV-2 genome. While most of the putative recombinant sequences we detected were genetic singletons, a small number of genetically identical or highly similar recombinant sequences were identified in the same geographic region, indicative of locally circulating lineages. Recombinant genomes were also found to have originated from parental lineages with substitutions of concern, including D614G, N501Y, E484K, and L452R. Adjusting for an unequal probability of detecting recombinants derived from different parent clades and for geographic variation in clade abundance, we estimate that at most 0.2–2.5 per cent of circulating viruses in the USA and UK are recombinant. Our identification of a small number of putative recombinants within the first year of SARS-CoV-2 circulation underscores the need to sustain efforts to monitor the emergence of new genotypes generated through recombination.

Journal ArticleDOI
TL;DR: A highly resolved phylogeny of Arundinarieae is provided, new light is shed on the radiation and reticulate evolutionary history of this tribe, and an empirical example for the study of recalcitrant plant radiations is provided.
Abstract: Rapid evolutionary radiations are among the most challenging phylogenetic problems, wherein different types of data (e.g., morphology and molecular) or genetic markers (e.g., nuclear and organelle) often yield inconsistent results. The tribe Arundinarieae, that is, the temperate bamboos, is a clade of tetraploid originated 22 Ma and subsequently radiated in East Asia. Previous studies of Arundinarieae have found conflicting relationships and/or low support. Here, we obtain nuclear markers from ddRAD data for 213 Arundinarieae taxa and parallel sampling of chloroplast genomes from genome skimming for 147 taxa. We first assess the feasibility of using ddRAD-seq data for phylogenetic estimates of paleopolyploid and rapidly radiated lineages, optimize clustering thresholds, and analysis workflow for orthology identification. Reference-based ddRAD data assembly approaches perform well and yield strongly supported relationships that are generally concordant with morphology-based taxonomy. We recover five major lineages, two of which are notable (the pachymorph and leptomorph lineages), in that they correspond with distinct rhizome morphologies. By contrast, the phylogeny from chloroplast genomes differed significantly. Based on multiple lines of evidence, the ddRAD tree is favored as the best species tree estimation for temperate bamboos. Using a time-calibrated ddRAD tree, we find that Arundinarieae diversified rapidly around the mid-Miocene corresponding with intensification of the East Asian monsoon and the evolution of key innovations including the leptomorph rhizomes. Our results provide a highly resolved phylogeny of Arundinarieae, shed new light on the radiation and reticulate evolutionary history of this tribe, and provide an empirical example for the study of recalcitrant plant radiations. [Arundinarieae; ddRAD; paleopolyploid; genome skimming; rapid diversification; incongruence.].

Journal ArticleDOI
TL;DR: In this paper, a comprehensive genome-wide analysis of the LACS gene family across green plants was performed, followed by phylogenetic clustering analysis, gene structure determination, detection of conserved motifs, gene expression in tissues and subcellular localization.