scispace - formally typeset
Search or ask a question

Showing papers on "Phylogenetic tree published in 2009"


Journal ArticleDOI
TL;DR: This complete mtDNA tree includes previously published as well as newly identified haplogroups, is easily navigable, will be continuously and regularly updated in the future, and is online available at http://www.phylotree.org.
Abstract: Human mitochondrial DNA is widely used as tool in many fields including evolutionary anthropology and population history, medical genetics, genetic genealogy, and forensic science. Many applications require detailed knowledge about the phylogenetic relationship of mtDNA variants. Although the phylogenetic resolution of global human mtDNA diversity has greatly improved as a result of increasing sequencing efforts of complete mtDNA genomes, an updated overall mtDNA tree is currently not available. In order to facilitate a better use of known mtDNA variation, we have constructed an updated comprehensive phylogeny of global human mtDNA variation, based on both coding- and control region mutations. This complete mtDNA tree includes previously published as well as newly identified haplogroups, is easily navigable, will be continuously and regularly updated in the future, and is online available at http://www.phylotree.org. © 2008 Wiley-Liss, Inc.

1,628 citations


Journal ArticleDOI
TL;DR: The complexities of genealogical discordance are discussed and the issues that new methods for multilocus species tree inference will need to address are reviewed to account successfully for naturally occurring genomic variability in evolutionary histories.
Abstract: The field of phylogenetics is entering a new era in which trees of historical relationships between species are increasingly inferred from multilocus and genomic data. A major challenge for incorporating such large amounts of data into inference of species trees is that conflicting genealogical histories often exist in different genes throughout the genome. Recent advances in genealogical modeling suggest that resolving close species relationships is not quite as simple as applying more data to the problem. Here we discuss the complexities of genealogical discordance and review the issues that new methods for multilocus species tree inference will need to address to account successfully for naturally occurring genomic variability in evolutionary histories.

1,593 citations


Journal ArticleDOI
TL;DR: An important adaptive role for metabolism diversification within group B2 and Shigella strains is found, but few or no extraint intestinal virulence-specific genes are identified, which could render difficult the development of a vaccine against extraintestinal infections.
Abstract: The Escherichia coli species represents one of the best-studied model organisms, but also encompasses a variety of commensal and pathogenic strains that diversify by high rates of genetic change. We uniformly (re-) annotated the genomes of 20 commensal and pathogenic E. coli strains and one strain of E. fergusonii (the closest E. coli related species), including seven that we sequenced to completion. Within the approximately 18,000 families of orthologous genes, we found approximately 2,000 common to all strains. Although recombination rates are much higher than mutation rates, we show, both theoretically and using phylogenetic inference, that this does not obscure the phylogenetic signal, which places the B2 phylogenetic group and one group D strain at the basal position. Based on this phylogeny, we inferred past evolutionary events of gain and loss of genes, identifying functional classes under opposite selection pressures. We found an important adaptive role for metabolism diversification within group B2 and Shigella strains, but identified few or no extraintestinal virulence-specific genes, which could render difficult the development of a vaccine against extraintestinal infections. Genome flux in E. coli is confined to a small number of conserved positions in the chromosome, which most often are not associated with integrases or tRNA genes. Core genes flanking some of these regions show higher rates of recombination, suggesting that a gene, once acquired by a strain, spreads within the species by homologous recombination at the flanking genes. Finally, the genome's long-scale structure of recombination indicates lower recombination rates, but not higher mutation rates, at the terminus of replication. The ensuing effect of background selection and biased gene conversion may thus explain why this region is A+T-rich and shows high sequence divergence but low sequence polymorphism. Overall, despite a very high gene flow, genes co-exist in an organised genome.

1,213 citations


Journal ArticleDOI
TL;DR: A comprehensive gene orientated phylogenetic resource, EnsemblCompara GeneTrees, based on a computational pipeline to handle clustering, multiple alignment, and tree generation, including the handling of large gene families, is developed.
Abstract: The use of phylogenetic trees to describe the evolution of biological processes was established in the 1950s (Hennig 1952) and remains a fundamental approach to understanding the evolution of individual genes through to complete genomes; for example, in the mouse (Mouse Genome Sequencing Consortium 2002), rat (Gibbs et al. 2004), chicken (International Chicken Genome Sequencing Consortium 2004), and monodelphis (Mikkelsen et al. 2007) genome papers, and numerous papers on individual sequences. Now routine, the determination of vertebrate genome sequences provides a rich data source to understand evolution, and using phylogenetic trees of the genes is one of the best ways to organize these data. However, the increased set of genomes makes the compute and engineering tasks to form all the gene trees progressively more complex and harder for individual groups to use. The Ensembl project provides an accurate and consistent protein-coding gene set for all vertebrate genomes (International Human Genome Sequencing Consortium 2001; Dehal et al. 2002; Mouse Genome Sequencing Consortium 2002; Gibbs et al. 2004; Xie et al. 2005; Mikkelsen et al. 2007; Rhesus Macaque Genome Sequencing and Analysis Consortium 2007). Previously (until April 2006), Ensembl provided a basic method for tracing orthologs via the Best Reciprocal BLAST method, similar to approaches used in other genome analyses, such as Drosophila melanogaster (Adams et al. 2000) or human (International Human Genome Sequencing Consortium 2001). In June 2006 (Hubbard et al. 2007), we replaced this system with a phylogenetically sound, gene tree-based approach, providing a complete set of phylogenetic trees spanning 91% of genes across vertebrates. In addition to the vertebrates we have included a few important non-vertebrate species (fly, worm, and yeast) to act both as out groups and provide links to these model organisms. In this paper we provide the motivation, implementation, and benchmarking of this method and document the display and access methods for these trees. There have been a number of methods proposed for routine generation of genomewide orthology descriptions, including Inparanoid (Remm et al. 2001), MSOAR (Fu et al. 2007), OrthoMCL (Li et al. 2003), HomoloGene (Wheeler et al. 2008), TreeFam (Li et al. 2006), PhyOP (Goodstadt and Ponting 2006), and PhiGs (Dehal and Boore 2006). The first four, Inparanoid, MSOAR, OrthoMCL, and HomoloGene, focus on providing clusters (or linked clusters) of genes, without an explicit tree topology. PhyOP (Goodstadt and Ponting 2006) uses a tree-based method, but between pairs of closely related species, resolving paralogs accurately by using neutral substitution (as measured by d S, the synonymous substitution rate). TreeFam provides an explicit gene tree across multiple species, using both d S, d N (nonsynonymous substitution rate), nucleotide and protein distance measures, and the standard species tree to balance duplications vs. deletions to inform the tree construction, using the program TreeBeST (http://treesoft.sourceforge.net/treebest.shtml; L. Heng, A.J. Vilella, E. Birney, and R. Durbin, in prep.). The PhiGs method (Dehal and Boore 2006) is a leading phylogenetic-based method that produced a comprehensive phylogenetic resource for the genomes at the time it was run, and the basic outline of its analysis, which was clustering of protein sequences, followed by phylogenetic trees, is similar to the method presented here. However, the PhiGs resource covered a smaller number of species (23 vs. 45) and has been difficult to keep up to date with the advances in gene sets and genomes. Another major difference between PhiG-based phylogenetic trees and the phylogenetic trees presented here is that the former was calculated using a single maximum likelihood method based on protein evolution. In contrast, the Ensembl gene trees are calculated using a new method, TreeBeST, which integrates multiple tree topologies, in particular both DNA level and protein level models and combines this with a species-tree aware penalization of topologies, which are inconsistent with known species relationships. We show in this paper that this method produces trees that are more consistent with synteny relationships and less anomalous topologies than single protein-based phylogenetic methods. There are also many single phylogenetic tree-building approaches, many of them based on maximum likelihood methods; one leading method is PhyML (Guindon and Gascuel 2003). It is unclear what is the best method to use, in particular in the context of genome-wide tree building with constraints on computational costs and the need to robustly handle many complex scenarios usually involving large families with heterogeneous phylogenetic depths. In this paper, we benchmark in vertebrates the tree programs TreeBeST and PhyML, and the resulting trees to basic best reciprocal hit (BRH) methods, and cluster frameworks, in particular Inparanoid and HomoloGene. We also benchmark to a recent PhyOP data set. The PhyOP pipeline has recently switched to use the same tree-building program (TreeBeST) that we use, but differs in its input clusters. Although we adopted this same tree-building method, we describe here considerable novel engineering in the deployment of these methods across all vertebrates. Similar to the PhiGs resource, we have used the dense coverage of genomes to provide topologically based timings (i.e., the standard use of outgroups vs. subsequent lineages to bracket a duplication), in order to label duplication events.

1,135 citations


Journal ArticleDOI
TL;DR: It is argued that to better deal with the large multilocus datasets brought on by phylogenomics, and to better align the fields of phylogeography and phylogenetics, the primacy of species trees should be embraced.
Abstract: The advent and maturation of algorithms for estimating species trees—phylogenetic trees that allow gene tree heterogeneity and whose tips represent lineages, populations and species, as opposed to genes—represent an exciting confluence of phylogenetics, phylogeography, and population genetics, and ushers in a new generation of concepts and challenges for the molecular systematist. In this essay I argue that to better deal with the large multilocus datasets brought on by phylogenomics, and to better align the fields of phylogeography and phylogenetics, we should embrace the primacy of species trees, not only as a new and useful practical tool for systematics, but also as a long-standing conceptual goal of systematics that, largely due to the lack of appropriate computational tools, has been eclipsed in the past few decades. I suggest that phylogenies as gene trees are a “local optimum” for systematics, and review recent advances that will bring us to the broader optimum inherent in species trees. In addition to adopting new methods of phylogenetic analysis (and ideally reserving the term “phylogeny” for species trees rather than gene trees), the new paradigm suggests shifts in a number of practices, such as sampling data to maximize not only the number of accumulated sites but also the number of independently segregating genes; routinely using coalescent or other models in computer simulations to allow gene tree heterogeneity; and understanding better the role of concatenation in influencing topologies and confidence in phylogenies. By building on the foundation laid by concepts of gene trees and coalescent theory, and by taking cues from recent trends in multilocus phylogeography, molecular systematics stands to be enriched. Many of the challenges and lessons learned for estimating gene trees will carry over to the challenge of estimating species trees, although adopting the species tree paradigm will clarify many issues (such as the nature of polytomies and the star tree paradox), raise conceptually new challenges, or provide new answers to old questions.

1,028 citations


Journal ArticleDOI
24 Dec 2009-Nature
TL;DR: The results strongly support the need for systematic ‘phylogenomic’ efforts to compile a phylogeny-driven ‘Genomic Encyclopedia of Bacteria and Archaea’ in order to derive maximum knowledge from existing microbial genome data as well as from genome sequences to come.
Abstract: Sequencing of bacterial and archaeal genomes has revolutionized our understanding of the many roles played by microorganisms. There are now nearly 1,000 completed bacterial and archaeal genomes available, most of which were chosen for sequencing on the basis of their physiology. As a result, the perspective provided by the currently available genomes is limited by a highly biased phylogenetic distribution. To explore the value added by choosing microbial genomes for sequencing on the basis of their evolutionary relationships, we have sequenced and analysed the genomes of 56 culturable species of Bacteria and Archaea selected to maximize phylogenetic coverage. Analysis of these genomes demonstrated pronounced benefits (compared to an equivalent set of genomes randomly selected from the existing database) in diverse areas including the reconstruction of phylogenetic history, the discovery of new protein families and biological properties, and the prediction of functions for known genes from other organisms. Our results strongly support the need for systematic phylogenomic efforts to compile a phylogeny-driven Genomic Encyclopedia of Bacteria and Archaea in order to derive maximum knowledge from existing microbial genome data as well as from genome sequences to come. © 2009 Macmillan Publishers Limited. All rights reserved.

928 citations


Journal ArticleDOI
20 Aug 2009-PLOS ONE
TL;DR: This is the deepest sequencing of single gastrointestinal samples reported to date, but microbial richness levels have still not leveled out, and correlations of sequence abundance and hybridization signal intensities were very high for lower-order ranks, but lower at family-level, which was probably due to ambiguous taxonomic groupings.
Abstract: Background: Variations in the composition of the human intestinal microbiota are linked to diverse health conditions. Highthroughput molecular technologies have recently elucidated microbial community structure at much higher resolution than was previously possible. Here we compare two such methods, pyrosequencing and a phylogenetic array, and evaluate classifications based on two variable 16S rRNA gene regions. Methods and Findings: Over 1.75 million amplicon sequences were generated from the V4 and V6 regions of 16S rRNA genes in bacterial DNA extracted from four fecal samples of elderly individuals. The phylotype richness, for individual samples, was 1,400–1,800 for V4 reads and 12,500 for V6 reads, and 5,200 unique phylotypes when combining V4 reads from all samples. The RDP-classifier was more efficient for the V4 than for the far less conserved and shorter V6 region, but differences in community structure also affected efficiency. Even when analyzing only 20% of the reads, the majority of the microbial diversity was captured in two samples tested. DNA from the four samples was hybridized against the Human Intestinal Tract (HIT) Chip, a phylogenetic microarray for community profiling. Comparison of clustering of genus counts from pyrosequencing and HITChip data revealed highly similar profiles. Furthermore, correlations of sequence abundance and hybridization signal intensities were very high for lower-order ranks, but lower at family-level, which was probably due to ambiguous taxonomic groupings. Conclusions: The RDP-classifier consistently assigned most V4 sequences from human intestinal samples down to genuslevel with good accuracy and speed. This is the deepest sequencing of single gastrointestinal samples reported to date, but microbial richness levels have still not leveled out. A majority of these diversities can also be captured with five times lower sampling-depth. HITChip hybridizations and resulting community profiles correlate well with pyrosequencing-based compositions, especially for lower-order ranks, indicating high robustness of both approaches. However, incompatible grouping schemes make exact comparison difficult.

782 citations


Journal ArticleDOI
TL;DR: A modified GMYC model is developed that allows for a variable transition from coalescent to speciation among lineages and provides a method of species discovery and biodiversity assessment using single-locus data from mixed or environmental samples while building a globally available taxonomic database for future identifications.
Abstract: High-throughput DNA sequencing has the potential to accelerate species discovery if it is able to recognize evolutionary entities from sequence data that are comparable to species. The general mixed Yule-coalescent (GMYC) model estimates the species boundary from DNA surveys by identifying independently evolving lineages as a transition from coalescent to speciation branching patterns on a phylogenetic tree. Applied here to 12 families from 4 orders of insects in Madagascar, we used the model to delineate 370 putative species from mitochondrial DNA sequence variation among 1614 individuals. These were compared with data from the nuclear genome and morphological identification and found to be highly congruent (98% and 94%). We developed a modified GMYC that allows for a variable transition from coalescent to speciation among lineages. This revised model increased the congruence with morphology (97%), suggesting that a variable threshold better reflects the clustering of sequence data into biological species. Local endemism was pronounced in all 5 insect groups. Most species (60-91%) and haplotypes (88-99%) were found at only 1 of the 5 study sites (40-1000 km apart). This pronounced endemism resulted in a 37% increase in species numbers using diagnostic nucleotides in a population aggregation analysis. Sample sizes between 7 and 10 individuals represented a threshold above which there was minimal increase in genetic diversity, broadly agreeing with coalescent theory and other empirical studies. Our results from >1.4 Mb of empirical data suggest that the GMYC model captures species boundaries comparable to those from traditional methods without the need for prior hypotheses of population coherence. This provides a method of species discovery and biodiversity assessment using single-locus data from mixed or environmental samples while building a globally available taxonomic database for future identifications. (Biodiversity; coalescent; DNA barcoding; DNA taxonomy; endemism; GMYC; Madagascar; turnover.)

652 citations


Journal ArticleDOI
TL;DR: A 6-gene, 420-species maximum-likelihood phylogeny of Ascomycota, the largest phylum of Fungi, and a phylogenetic informativeness analysis of all 6 genes and a series of ancestral character state reconstructions support a terrestrial, saprobic ecology as ancestral are presented.
Abstract: We present a 6-gene, 420-species maximum-likelihood phylogeny of Ascomycota, the largest phylum of Fungi. This analysis is the most taxonomically complete to date with species sampled from all 15 currently circumscribed classes. A number of superclass-level nodes that have previously evaded resolution and were unnamed in classifications of the Fungi are resolved for the first time. Based on the 6-gene phylogeny we conducted a phylogenetic informativeness analysis of all 6 genes and a series of ancestral character state reconstructions that focused on morphology of sporocarps, ascus dehiscence, and evolution of nutritional modes and ecologies. A gene-by-gene assessment of phylogenetic informativeness yielded higher levels of informativeness for protein genes (RPB1, RPB2, and TEF1) as compared with the ribosomal genes, which have been the standard bearer in fungal systematics. Our reconstruction of sporocarp characters is consistent with 2 origins for multicellular sexual reproductive structures in Ascomycota, once in the common ancestor of Pezizomycotina and once in the common ancestor of Neolectomycetes. This first report of dual origins of ascomycete sporocarps highlights the complicated nature of assessing homology of morphological traits across Fungi. Furthermore, ancestral reconstruction supports an open sporocarp with an exposed hymenium (apothecium) as the primitive morphology for Pezizomycotina with multiple derivations of the partially (perithecia) or completely enclosed (cleistothecia) sporocarps. Ascus dehiscence is most informative at the class level within Pezizomycotina with most superclass nodes reconstructed equivocally. Character-state reconstructions support a terrestrial, saprobic ecology as ancestral. In contrast to previous studies, these analyses support multiple origins of lichenization events with the loss of lichenization as less frequent and limited to terminal, closely related species.

592 citations


Journal ArticleDOI
TL;DR: This work reviews studies of the phylogenetic structure of communities of different major taxa and trophic levels, across different spatial and phylogenetic scales, and using different metrics and null models, and discusses the relationship between metrics of phylogenetic clustering and tree balance.
Abstract: The analysis of the phylogenetic structure of communities can help reveal contemporary ecological interactions, as well as link community ecology with biogeography and the study of character evolution. The number of studies employing this broad approach has increased to the point where comparison of their results can now be used to highlight successes and deficiencies in the approach, and to detect emerging patterns in community organization. We review studies of the phylogenetic structure of communities of different major taxa and trophic levels, across different spatial and phylogenetic scales, and using different metrics and null models. Twenty-three of 39 studies (59%) find evidence for phylogenetic clustering in contemporary communities, but terrestrial and/or plant systems are heavily over-represented among published studies. Experimental investigations, although uncommon at present, hold promise for unravelling mechanisms underlying the phylogenetic community structure patterns observed in community surveys. We discuss the relationship between metrics of phylogenetic clustering and tree balance and explore the various emerging biases in taxonomy and pitfalls of scale. Finally, we look beyond one-dimensional metrics of phylogenetic structure towards multivariate descriptors that better capture the variety of ecological behaviours likely to be exhibited in communities of species with hundreds of millions of years of independent evolution.

580 citations


Journal ArticleDOI
TL;DR: PhyloXML is an XML language defined by a complete schema in XSD that allows storing and exchanging the structures of evolutionary trees as well as associated data.
Abstract: Background Evolutionary trees are central to a wide range of biological studies. In many of these studies, tree nodes and branches need to be associated (or annotated) with various attributes. For example, in studies concerned with organismal relationships, tree nodes are associated with taxonomic names, whereas tree branches have lengths and oftentimes support values. Gene trees used in comparative genomics or phylogenomics are usually annotated with taxonomic information, genome-related data, such as gene names and functional annotations, as well as events such as gene duplications, speciations, or exon shufflings, combined with information related to the evolutionary tree itself. The data standards currently used for evolutionary trees have limited capacities to incorporate such annotations of different data types.

Journal ArticleDOI
TL;DR: 2 likelihood methods are developed that can be used to infer the effect of a trait on speciation and extinction without complete phylogenetic information, generalizing the recent binary-state speciationand extinction method.
Abstract: Species traits may influence rates of speciation and extinction, affecting both the patterns of diversification among lineages and the distribution of traits among species. Existing likelihood approaches for detecting differential diversification require complete phylogenies; that is, every extant species must be present in a well-resolved phylogeny. We developed 2 likelihood methods that can be used to infer the effect of a trait on speciation and extinction without complete phylogenetic information, generalizing the recent binary-state speciation and extinction method. Our approaches can be used where a phylogeny can be reasonably assumed to be a random sample of extant species or where all extant species are included but some are assigned only to terminal unresolved clades. We explored the effects of decreasing phylogenetic resolution on the ability of our approach to detect differential diversification within a Bayesian framework using simulated phylogenies. Differential diversification caused by an asymmetry in speciation rates was nearly as well detected with only 50% of extant species phylogenetically resolved as with complete phylogenetic knowledge. We demonstrate our unresolved clade method with an analysis of sexual dimorphism and diversification in shorebirds (Charadriiformes). Our methods allow for the direct estimation of the effect of a trait on speciation and extinction rates using incompletely resolved phylogenies.

Journal ArticleDOI
TL;DR: The phylogenetic and in some cases morphological evidence supports the monophyly of nine terminal taxa in the M. anisopliae complex, and it is proposed to recognize at species rank M. guizhouense, M. pingshaense and M. robertsii.
Abstract: Metarhizium anisopliae, the type species of the anamorph entomopathogenic genus Metarhizium, is currently composed of four varieties, including the type variety, and had been demonstrated to be closely related to M. taii, M. pingshaense and M. guizhouense. In this study we evaluate phylogenetic relationships within the M. anisopliae complex, identify monophyletic lineages and clarify the species taxonomy. To this end we have employed a multigene phylogenetic approach using near-complete sequences from nuclear encoded EF-1alpha, RPB1, RPB2 and beta-tubulin gene regions and evaluated the morphology of these taxa, including ex-type isolates whenever possible. The phylogenetic and in some cases morphological evidence supports the monophyly of nine terminal taxa in the M. anisopliae complex that we recognize as species. We propose to recognize at species rank M. anisopliae, M. guizhouense, M. pingshaense, M. acridum stat. nov., M. lepidiotae stat. nov. and M. majus stat. nov. In addition we describe the new species M. globosum and M. robertsii, resurrect the name M. brunneum and show that M. taii is a later synonym of M. guizhouense.

Journal ArticleDOI
19 Jun 2009-Science
TL;DR: SATé (simultaneous alignment and tree estimation), an automated method to quickly and accurately estimate both DNA alignments and trees with the maximum likelihood criterion, is presented, showing that coestimation can be both rapid and accurate in phylogenetic studies.
Abstract: Inferring an accurate evolutionary tree of life requires high-quality alignments of molecular sequence data sets from large numbers of species. However, this task is often difficult, slow, and idiosyncratic, especially when the sequences are highly diverged or include high rates of insertions and deletions (collectively known as indels). We present SATe (simultaneous alignment and tree estimation), an automated method to quickly and accurately estimate both DNA alignments and trees with the maximum likelihood criterion. In our study, it improved tree and alignment accuracy compared to the best two-phase methods currently available for data sets of up to 1000 sequences, showing that coestimation can be both rapid and accurate in phylogenetic studies.

Journal ArticleDOI
TL;DR: New 16S rRNA signature nucleotide patterns of taxa above the family level are presented and the affiliation of genera to families are indicated and the phylogenetic relationships of Actinobacteria at higher levels may need to be reconstructed.
Abstract: The higher ranks of the class Actinobacteria were proposed and described in 1997. At each rank, the taxa were delineated from each other solely on the basis of 16S rRNA gene sequence phylogenetic clustering and taxon-specific 16S rRNA signature nucleotides. In the past 10 years, many novel members have been assigned to this class while, at the same time, some members have been reclassified. The new 16S rRNA gene sequence information and the changes in phylogenetic positions of some taxa influence decisions about which 16S rRNA nucleotides to define as taxon-specific. As a consequence, the phylogenetic relationships of Actinobacteria at higher levels may need to be reconstructed. Here, we present new 16S rRNA signature nucleotide patterns of taxa above the family level and indicate the affiliation of genera to families. These sets replace the signatures published in 1997. In addition, Actinopolysporineae subord. nov. and Actinopolysporaceae fam. nov. are proposed to accommodate the genus Actinopolyspora, Kineosporiineae subord. nov. and Kineosporiaceae fam. nov. are proposed to accommodate the genera Kineococcus, Kineosporia and Quadrisphaera, Beutenbergiaceae fam. nov. is proposed to accommodate the genera Beutenbergia, Georgenia and Salana and Cryptosporangiaceae fam. nov. is proposed to accommodate the genus Cryptosporangium. The families Nocardiaceae and Gordoniaceae are proposed to be combined in an emended family Nocardiaceae. Emended descriptions are also proposed for most of the other higher taxa.

Journal ArticleDOI
TL;DR: The core of T 6SS is composed of 13 proteins, conserved in both pathogenic and non-pathogenic bacteria, suggesting that T6SS has evolved to adapt to various microenvironments and specialized functions.
Abstract: The availability of hundreds of bacterial genomes allowed a comparative genomic study of the Type VI Secretion System (T6SS), recently discovered as being involved in pathogenesis By combining comparative and phylogenetic approaches using more than 500 prokaryotic genomes, we characterized the global T6SS genetic structure in terms of conservation, evolution and genomic organization This genome wide analysis allowed the identification of a set of 13 proteins constituting the T6SS protein core and a set of conserved accessory proteins 176 T6SS loci (encompassing 92 different bacteria) were identified and their comparison revealed that T6SS-encoded genes have a specific conserved genetic organization Phylogenetic reconstruction based on the core genes showed that lateral transfer of the T6SS is probably its major way of dissemination among pathogenic and non-pathogenic bacteria Furthermore, the sequence analysis of the VgrG proteins, proposed to be exported in a T6SS-dependent way, confirmed that some C-terminal regions possess domains showing similarities with adhesins or proteins with enzymatic functions The core of T6SS is composed of 13 proteins, conserved in both pathogenic and non-pathogenic bacteria Subclasses of T6SS differ in regulatory and accessory protein content suggesting that T6SS has evolved to adapt to various microenvironments and specialized functions Based on these results, new functional hypotheses concerning the assembly and function of T6SS proteins are proposed

Journal ArticleDOI
TL;DR: The extensiveness of convergent evolution is one of the most striking phenomena observed in the phylogenetic tree presented here – it is hard to find a morphological, ecological or biological characteristic that has not arisen at least twice during nematode evolution.
Abstract: As a result of the scarcity of informative morphological and anatomical characters, nematode systematics have always been volatile. Differences in the appreciation of these characters have resulted in numerous classifications and this greatly confuses scientific communication. An advantage of the use of molecular data is that it allows for an enormous expansion of the number of characters. Here we present a phylogenetic tree based on 1215 small subunit ribosomal DNA sequences (ca 1700 bp each) covering a wide range of nematode taxa. Of the 19 nematode orders mentioned by De Ley et al. (2006) 15 are represented here. Compared with Holterman et al. (2006) the number of taxa analysed has been tripled. This did not result in major changes in the clade subdivision of the phylum, although a decrease in the number of well supported nodes was observed. Especially at the family level and below we observed a considerable congruence between morphology and ribosomal DNA-based nematode systematics and, in case of discrepancies, morphological or anatomical support could be found for the alternative grouping in most instances. The extensiveness of convergent evolution is one of the most striking phenomena observed in the phylogenetic tree presented here - it is hard to find a morphological, ecological or biological characteristic that has not arisen at least twice during nematode evolution. Convergent evolution appears to be an important additional explanation for the seemingly persistent volatility of nematode systematics.

Journal ArticleDOI
TL;DR: Plastome sequencing is now an efficient option for increasing phylogenetic resolution at lower taxonomic levels in plant phylogenetic and population genetic analyses, and with continuing improvements in sequencing capacity, the strategies herein should revolutionize efforts requiring dense taxon and character sampling.
Abstract: Molecular evolutionary studies share the common goal of elucidating historical relationships, and the common challenge of adequately sampling taxa and characters. Particularly at low taxonomic levels, recent divergence, rapid radiations, and conservative genome evolution yield limited sequence variation, and dense taxon sampling is often desirable. Recent advances in massively parallel sequencing make it possible to rapidly obtain large amounts of sequence data, and multiplexing makes extensive sampling of megabase sequences feasible. Is it possible to efficiently apply massively parallel sequencing to increase phylogenetic resolution at low taxonomic levels? We reconstruct the infrageneric phylogeny of Pinus from 37 nearly-complete chloroplast genomes (average 109 kilobases each of an approximately 120 kilobase genome) generated using multiplexed massively parallel sequencing. 30/33 ingroup nodes resolved with ≥ 95% bootstrap support; this is a substantial improvement relative to prior studies, and shows massively parallel sequencing-based strategies can produce sufficient high quality sequence to reach support levels originally proposed for the phylogenetic bootstrap. Resampling simulations show that at least the entire plastome is necessary to fully resolve Pinus, particularly in rapidly radiating clades. Meta-analysis of 99 published infrageneric phylogenies shows that whole plastome analysis should provide similar gains across a range of plant genera. A disproportionate amount of phylogenetic information resides in two loci (ycf1, ycf2), highlighting their unusual evolutionary properties. Plastome sequencing is now an efficient option for increasing phylogenetic resolution at lower taxonomic levels in plant phylogenetic and population genetic analyses. With continuing improvements in sequencing capacity, the strategies herein should revolutionize efforts requiring dense taxon and character sampling, such as phylogeographic analyses and species-level DNA barcoding.

Journal ArticleDOI
TL;DR: It can be shown that the 2 methods are statistically consistent under the multispecies coalescent model, and it is suggested that STAR consistently outperforms STEAC, SC, and GLASS when the substitution rates among lineages are highly variable.
Abstract: The estimation of species trees (phylogenies) is one of the most important problems in evolutionary biology, and recently, there has been greater appreciation of the need to estimate species trees directly rather than using gene trees as a surrogate. A Bayesian method constructed under the multispecies coalescent model can consistently estimate species trees but involves intensive computation, which can hinder its application to the phylogenetic analysis of large-scale genomic data. Many summary statistics-based approaches, such as shallowest coalescences (SC) and Global LAteSt Split (GLASS), have been developed to infer species phylogenies for multilocus data sets. In this paper, we propose 2 methods, species tree estimation using average ranks of coalescences (STAR) and species tree estimation using average coalescence times (STEAC), based on the summary statistics of coalescence times. It can be shown that the 2 methods are statistically consistent under the multispecies coalescent model. STAR uses the ranks of coalescences and is thus resistant to variable substitution rates along the branches in gene trees. A simulation study suggests that STAR consistently outperforms STEAC, SC, and GLASS when the substitution rates among lineages are highly variable. Two real genomic data sets were analyzed by the 2 methods and produced species trees that are consistent with previous results.

Journal ArticleDOI
12 Nov 2009-PLOS ONE
TL;DR: Strain lineages in MTBC should be defined based on phylogenetically robust markers such as single nucleotide polymorphisms or large sequence polymorphisms, and that for epidemiological purposes, MIRU-VNTR loci should be used in a lineage-dependent manner.
Abstract: Because genetically monomorphic bacterial pathogens harbour little DNA sequence diversity, most current genotyping techniques used to study the epidemiology of these organisms are based on mobile or repetitive genetic elements. Molecular markers commonly used in these bacteria include Clustered Regulatory Short Palindromic Repeats (CRISPR) and Variable Number Tandem Repeats (VNTR). These methods are also increasingly being applied to phylogenetic and population genetic studies. Using the Mycobacterium tuberculosis complex (MTBC) as a model, we evaluated the phylogenetic accuracy of CRISPR- and VNTR-based genotyping, which in MTBC are known as spoligotyping and Mycobacterial Interspersed Repetitive Units (MIRU)-VNTR-typing, respectively. We used as a gold standard the complete DNA sequences of 89 coding genes from a global strain collection. Our results showed that phylogenetic trees derived from these multilocus sequence data were highly congruent and statistically robust, irrespective of the phylogenetic methods used. By contrast, corresponding phylogenies inferred from spoligotyping or 15-loci-MIRU-VNTR were incongruent with respect to the sequence-based trees. Although 24-loci-MIRU-VNTR performed better, it was still unable to detect all strain lineages. The DNA sequence data showed virtually no homoplasy, but the opposite was true for spoligotyping and MIRU-VNTR, which was consistent with high rates of convergent evolution and the low statistical support obtained for phylogenetic groupings defined by these markers. Our results also revealed that the discriminatory power of the standard 24 MIRU-VNTR loci varied by strain lineage. Taken together, our findings suggest strain lineages in MTBC should be defined based on phylogenetically robust markers such as single nucleotide polymorphisms or large sequence polymorphisms, and that for epidemiological purposes, MIRU-VNTR loci should be used in a lineage-dependent manner. Our findings have implications for strain typing in other genetically monomorphic bacteria.

Journal ArticleDOI
TL;DR: Recent models to estimate phylogenetic trees under the multispecies coalescent review shows that species tree approaches are an appropriate goal for systematics, appear to work well in some cases where concatenation can be misleading, and suggest that sampling many independent loci will be paramount.

Journal ArticleDOI
TL;DR: DNA sequences obtained from a nearly complete taxon sampling of known species from Europe, Central America and North America demonstrate that Calochroi is an exclusively northern hemispheric lineage, where species follow their host trees throughout their natural ranges within and across continents.
Abstract: Section Calochroi is one of the most species-rich lineages in the genus Cortinarius (Agaricales, Basidiomycota) and is widely distributed across boreo-nemoral areas, with some extensions into meridional zones. Previous phylogenetic studies of Calochroi (incl. section Fulvi) have been geographically restricted; therefore, phylogenetic and biogeographic relationships within this lineage at a global scale have been largely unknown. In this study, we obtained DNA sequences from a nearly complete taxon sampling of known species from Europe, Central America and North America. We inferred intra- and interspecific phylogenetic relationships as well as major morphological evolutionary trends within section Calochroi based on 576 ITS sequences, 230 ITS + 5.8S + D1/D2 sequences, and a combined dataset of ITS + 5.8S + D1/D2 and RPB1 sequences of a representative subsampling of 58 species. More than 100 species were identified by integrating DNA sequences with morphological, macrochemical and ecological data. Cortinarius section Calochroi was consistently resolved with high branch support into at least seven major lineages: Calochroi, Caroviolacei, Dibaphi, Elegantiores, Napi, Pseudoglaucopodes and Splendentes; whereas Rufoolivacei and Sulfurini appeared polyphyletic. A close relationship between Dibaphi, Elegantiores, Napi and Splendentes was consistently supported. Combinations of specific morphological, pigmentation and molecular characters appear useful in circumscribing clades. Our analyses demonstrate that Calochroi is an exclusively northern hemispheric lineage, where species follow their host trees throughout their natural ranges within and across continents. Results of this study contribute substantially to defining European species in this group and will help to either identify or to name new species occurring across the northern hemisphere. Major groupings are in partial agreement with earlier morphology-based and molecular phylogenetic hypotheses, but some relationships were unexpected, based on external morphology. In such cases, their true affinities appear to have been obscured by the repeated appearance of similar features among distantly related species. Therefore, further taxonomic studies are needed to evaluate the consistency of species concepts and interpretations of morphological features in a more global context. Reconstruction of ancestral states yielded two major evolutionary trends within section Calochroi: (1) the development of bright pigments evolved independently multiple times, and (2) the evolution of abruptly marginate to flattened stipe bulbs represents an autapomorphy of the Calochroi clade.

Journal ArticleDOI
TL;DR: A new, broadly applicable measure of the spatial restriction of phylogenetic diversity, termed phylogenetic endemism (PE), which builds on previous phylogenetic analyses ofendemism, but provides a more general solution for mapping endemist of lineages.
Abstract: We present a new, broadly applicable measure of the spatial restriction of phylogenetic diversity, termed phylogenetic endemism (PE). PE combines the widely used phylogenetic diversity and weighted endemism measures to identify areas where substantial components of phylogenetic diversity are restricted. Such areas are likely to be of considerable importance for conservation. PE has a number of desirable properties not combined in previous approaches. It assesses endemism consistently, independent of taxonomic status or level, and independent of previously defined political or biological regions. The results can be directly compared between areas because they are based on equivalent spatial units. PE builds on previous phylogenetic analyses of endemism, but provides a more general solution for mapping endemism of lineages. We illustrate the broad applicability of PE using examples of Australian organisms having contrasting life histories: pea-flowered shrubs of the genus Daviesia (Fabaceae) and the Australian species of the Australo-Papuan tree frog radiation within the family Hylidae.

Journal ArticleDOI
TL;DR: Assessment of the performance of different tests used to measure community phylogenetic structure found that methods that were most sensitive to the effects of niche-based processes on community structure were more likely to find non-random patterns of community phylogenetics structure under dispersal assembly.
Abstract: Patterns of phylogenetic relatedness within communities have been widely used to infer the importance of different ecological and evolutionary processes during community assembly, but little is known about the relative ability of community phylogenetics methods and null models to detect the signature of processes such as dispersal, competition and filtering under different models of trait evolution. Using a metacommunity simulation incorporating quantitative models of trait evolution and community assembly, I assessed the performance of different tests that have been used to measure community phylogenetic structure. All tests were sensitive to the relative phylogenetic signal in species metacommunity abundances and traits; methods that were most sensitive to the effects of niche-based processes on community structure were also more likely to find non-random patterns of community phylogenetic structure under dispersal assembly. When used with a null model that maintained species occurrence frequency in random communities, several metrics could detect niche-based assembly when there was strong phylogenetic signal in species traits, when multiple traits were involved in community assembly, and in the presence of environmental heterogeneity. Interpretations of the causes of community phylogenetic structure should be modified to account for the influence of dispersal.

Journal Article
TL;DR: This paper proposed a polyphasic approach to the recognition and identification of species within Colletotrichum, matching genetic distinctness with informative morphological and biological characters, including morphology, pathogenicity, physiology, phylogenetics and secondary metabolite production.
Abstract: Colletotrichum is the causal agent of anthracnose and other diseases on leaves, stems and fruits of numerous plant species, including several important crops. Accurate species identification is critical to understand the epidemiology and to develop effective control of these diseases. Morphologically-based identification of Colletotrichum species has always been problematic, because there are few reliable characters and many of these characters are plastic, dependent upon methods and experimental conditions. Rapid progress in molecular phylogenetic methods is now making it possible to recognise stable and well-resolved clades within Colletotrichum. How these should be reflected in a classification system remains to be resolved. An important step in providing a stable taxonomy for the genus is to epitypify existing names, and in so doing link them to genetically defined clades. We recommend a polyphasic approach to the recognition and identification of species within Colletotrichum, matching genetic distinctness with informative morphological and biological characters. This paper reviews various approaches in the study of Colletotrichum complexes including morphology, pathogenicity, physiology, phylogenetics and secondary metabolite production. A backbone phylogenetic tree using ITS sequence data from 42 ex-type specimens has been generated. Phylogenetic analysis using ITS sequence data is a useful tool to give a preliminarily identification for Colletotrichum species or place them in species complexes. However, caution must be taken here as the majority of the ITS sequences deposited in GenBank are wrongly named. Multi-gene phylogenetic data provides much better understanding of the relationships within Colletotrichum and should be employed where possible. We propose that an ideal approach for Colletotrichum systematics should be based on a multi-gene phylogeny, with comparison made with type specimens, and a well-defined phylogenetic lineage should be in conjunction with recognisable polyphasic characters, such as morphology, physiology, pathogenicity, cultural characteristics and secondary metabolites. Finally a set of protocols and methodologies is provided as a guideline for future studies, epitypification and the description of new species.

Journal ArticleDOI
TL;DR: This comprehensive, time-calibrated tree provides a powerful evolutionary tool for broad-scale comparative studies of Cetacea.

Journal ArticleDOI
TL;DR: Analysis of the large-scale effects of plant extinctions and introductions on taxonomic and phylogenetic diversity of floras across Europe reveals that plant invasions since AD 1500 exceeded extinctions, resulting in increased taxonomic diversity but decreased phylogenetically diversity within European regions.
Abstract: Human activities have altered the composition of biotas through two fundamental processes: native extinctions and alien introductions. Both processes affect the taxonomic (i.e., species identity) and phylogenetic (i.e., species evolutionary history) structure of species assemblages. However, it is not known what the relative magnitude of these effects is at large spatial scales. Here we analyze the large-scale effects of plant extinctions and introductions on taxonomic and phylogenetic diversity of floras across Europe, using data from 23 regions. Considering both native losses and alien additions in concert reveals that plant invasions since AD 1500 exceeded extinctions, resulting in (i) increased taxonomic diversity (i.e., species richness) but decreased phylogenetic diversity within European regions, and (ii) increased taxonomic and phylogenetic similarity among European regions. Those extinct species were phylogenetically and taxonomically unique and typical of individual regions, and extinctions usually were not continent-wide and therefore led to differentiation. By contrast, because introduced alien species tended to be closely related to native species, the floristic differentiation due to species extinction was lessened by taxonomic and phylogenetic homogenization effects. This was especially due to species that are alien to a region but native to other parts of Europe. As a result, floras of many European regions have partly lost and will continue to lose their uniqueness. The results suggest that biodiversity needs to be assessed in terms of both species taxonomic and phylogenetic identity, but the latter is rarely used as a metric of the biodiversity dynamics.

Journal ArticleDOI
01 Aug 2009-Ecology
TL;DR: It is argued that while phylogenetic relatedness may be a good general multivariate proxy for ecological similarity, it may have a reduced capacity to depict the functional mechanisms behind species coexistence when coexisting species simultaneously converge and diverge in function.
Abstract: Species diversity is promoted and maintained by ecological and evolutionary processes operating on species attributes through space and time. The degree to which variability in species function regulates distribution and promotes coexistence of species has been debated. Previous work has attempted to quantify the relative importance of species function by using phylogenetic relatedness as a proxy for functional similarity. The key assumption of this approach is that function is phylogenetically conserved. If this assumption is supported, then the phylogenetic dispersion in a community should mirror the functional dispersion. Here we quantify functional trait dispersion along several key axes of tree life-history variation and on multiple spatial scales in a Neotropical dry-forest community. We next compare these results to previously reported patterns of phylogenetic dispersion in this same forest. We find that, at small spatial scales, coexisting species are typically more functionally clustered than expected, but traits related to adult and regeneration niches are overdispersed. This outcome was repeated when the analyses were stratified by size class. Some of the trait dispersion results stand in contrast to the previously reported phylogenetic dispersion results. In order to address this inconsistency we examined the strength of phylogenetic signal in traits at different depths in the phylogeny. We argue that: (1) while phylogenetic relatedness may be a good general multivariate proxy for ecological similarity, it may have a reduced capacity to depict the functional mechanisms behind species coexistence when coexisting species simultaneously converge and diverge in function; and (2) the previously used metric of phylogenetic signal provided erroneous inferences about trait dispersion when married with patterns of phylogenetic dispersion.

Journal ArticleDOI
01 Apr 2009-Genetica
TL;DR: The maturation of nuclear gene phylogeography and phylogenetics is traced and it is suggested that the abundant instances of gene tree heterogeneity beckon a new generation of phylogenetic methods that focus on estimating species trees as distinct from gene trees.
Abstract: We review recent trends in phylogeography and phylogenetics and argue that these two fields stand to be reunited by the common yardstick provided by sequence and SNP data and by new multilocus methods for phylogenetic analysis. Whereas the modern incarnation of both fields was spawned by PCR approaches applied to mitochondrial DNA in the late 1980s, the two fields diverged during the 1990s largely due to the adoption by phylogeographers of microsatellites, in contrast to the adoption of nuclear sequence data by phylogeneticists. Sequence-based markers possess a number of advantages over microsatellites, even on the recent time scales that are the purview of phylogeography. Using examples primarily from vertebrates, we trace the maturation of nuclear gene phylogeography and phylogenetics and suggest that the abundant instances of gene tree heterogeneity beckon a new generation of phylogenetic methods that focus on estimating species trees as distinct from gene trees. Whole genomes provide a powerful common yardstick on which both phylogeography and phylogenetics can assume their proper place as ends of a continuum.

Journal ArticleDOI
TL;DR: An illustrated discussion about quality control, pseudogenes, and sequence composition in COI-like accessions across Crustacea from published and unpublished studies is illustrated.
Abstract: The cytochrome c oxidase subunit I (COI) gene plays a pivotal role in a global effort to document biodiversity and continues to be a gene of choice in phylogenetic and phylogeographic studies. Due to increased attention on this gene as a species' barcode, quality control and sequence homology issues are re-emerging. Taylor and Knouft (2006) attempted to examine gonopod morphology in light of the subgeneric classification scheme within the freshwater crayfish genus Orconectes using COI sequences. However, their erroneous analyses were not only based on supposed mitochondrial sequences but also incorporated many questionable sequences due to the possible presence of numts and manual editing or sequencing errors. In fact, 22 of the 86 sequences were flagged as “COI-like” by GenBank due to the presence of stop codons and indels in what should be the open reading frame of a conservative protein-coding gene. A subsequent search of “COI-like” accessions in GenBank turned up a multitude of taxa across Cr...