scispace - formally typeset
Search or ask a question

Showing papers on "Phylogenetic tree published in 2007"


Journal ArticleDOI
TL;DR: PAML, currently in version 4, is a package of programs for phylogenetic analyses of DNA and protein sequences using maximum likelihood (ML), which can be used to estimate parameters in models of sequence evolution and to test interesting biological hypotheses.
Abstract: PAML, currently in version 4, is a package of programs for phylogenetic analyses of DNA and protein sequences using maximum likelihood (ML). The programs may be used to compare and test phylogenetic trees, but their main strengths lie in the rich repertoire of evolutionary models implemented, which can be used to estimate parameters in models of sequence evolution and to test interesting biological hypotheses. Uses of the programs include estimation of synonymous and nonsynonymous rates (d(N) and d(S)) between two protein-coding DNA sequences, inference of positive Darwinian selection through phylogenetic comparison of protein-coding genes, reconstruction of ancestral genes and proteins for molecular restoration studies of extinct life forms, combined analysis of heterogeneous data sets from multiple gene loci, and estimation of species divergence times incorporating uncertainties in fossil calibrations. This note discusses some of the major applications of the package, which includes example data sets to demonstrate their use. The package is written in ANSI C, and runs under Windows, Mac OSX, and UNIX systems. It is available at -- (http://abacus.gene.ucl.ac.uk/software/paml.html).

10,773 citations


Journal ArticleDOI
TL;DR: iTOL is a web-based tool for the display, manipulation and annotation of phylogenetic trees that can be interactively pruned and re-rooted.
Abstract: Summary: Interactive Tree Of Life (iTOL) is a web-based tool for the display, manipulation and annotation of phylogenetic trees. Trees can be interactively pruned and re-rooted. Various types of data such as genome sizes or protein domain repertoires can be mapped onto the tree. Export to several bitmap and vector graphics formats is supported. Availability: iTOL is available at http://itol.embl.de Contact: [email protected]

2,648 citations


Journal ArticleDOI
TL;DR: Nine newly explored regions of the chloroplast genome offer levels of variation better than the best regions identified in an earlier study and are therefore likely to be the best choices for molecular studies at low taxonomic levels.
Abstract: Although the chloroplast genome contains many noncoding regions, relatively few have been exploited for interspecific phylogenetic and intraspecific phylogeographic studies. In our recent evaluation of the phylogenetic utility of 21 noncoding chloroplast regions, we found the most widely used noncoding regions are among the least variable, but the more variable regions have rarely been employed. That study led us to conclude that there may be unexplored regions of the chloroplast genome that have even higher relative levels of variability. To explore the potential variability of previously unexplored regions, we compared three pairs of single-copy chloroplast genome sequences in three disparate angiosperm lineages: Atropa vs. Nicotiana (asterids); Lotus vs. Medicago (rosids); and Saccharum vs. Oryza (monocots). These three separate sequence alignments highlighted 13 mutational hotspots that may be more variable than the best regions of our former study. These 13 regions were then selected for a more detailed analysis. Here we show that nine of these newly explored regions (rpl32-trnL((UAG)), trnQ((UUG))-5'rps16, 3'trnV((UAC))-ndhC, ndhF-rpl32, psbD-trnT((GGU)), psbJ-petA, 3'rps16-5'trnK((UUU)), atpI-atpH, and petL-psbE) offer levels of variation better than the best regions identified in our earlier study and are therefore likely to be the best choices for molecular studies at low taxonomic levels.

1,840 citations


Journal ArticleDOI
TL;DR: The most comprehensive analysis of the environmental distribution of bacteria to date, based on 21,752 16S rRNA sequences compiled from 111 studies of diverse physical environments, is reported in this article.
Abstract: Microbes are difficult to culture. Consequently, the primary source of information about a fundamental evolutionary topic, life's diversity, is the environmental distribution of gene sequences. We report the most comprehensive analysis of the environmental distribution of bacteria to date, based on 21,752 16S rRNA sequences compiled from 111 studies of diverse physical environments. We clustered the samples based on similarities in the phylogenetic lineages that they contain and found that, surprisingly, the major environmental determinant of microbial community composition is salinity rather than extremes of temperature, pH, or other physical and chemical factors represented in our samples. We find that sediments are more phylogenetically diverse than any other environment type. Surprisingly, soil, which has high species-level diversity, has below-average phylogenetic diversity. This work provides a framework for understanding the impact of environmental factors on bacterial evolution and for the direction of future sequencing efforts to discover new lineages.

1,440 citations


Journal ArticleDOI
TL;DR: Simulation is used to examine the performance of the concatenation approach under conditions in which the coalescent produces a high level of discord among individual gene trees and show that it leads to statistically inconsistent estimation in this setting.
Abstract: Although multiple gene sequences are becoming increasingly available for molecular phylogenetic inference, the analysis of such data has largely relied on inference methods designed for single genes. One of the common approaches to analyzing data from multiple genes is concatenation of the individual gene data to form a single supergene to which traditional phylogenetic inference procedures - e.g., maximum parsimony (MP) or maximum likelihood (ML) - are applied. Recent empirical studies have demonstrated that concatenation of sequences from multiple genes prior to phylogenetic analysis often results in inference of a single, well-supported phylogeny. Theoretical work, however, has shown that the coalescent can produce substantial variation in single-gene histories. Using simulation, we combine these ideas to examine the performance of the concatenation approach under conditions in which the coalescent produces a high level of discord among individual gene trees and show that it leads to statistically inconsistent estimation in this setting. Furthermore, use of the bootstrap to measure support for the inferred phylogeny can result in moderate to strong support for an incorrect tree under these conditions. These results highlight the importance of incorporating variation in gene histories into multilocus phylogenetics.

963 citations


Journal ArticleDOI
TL;DR: Phylogenetic trees from multiple methods provide strong support for the position of Amborella as the earliest diverging lineage of flowering plants, followed by Nymphaeales and Austrobaileyales, and the plastid genome trees also provide strongSupport for a sister relationship between eudicots and monocots, and this group is sister to a clade that includes Chloranthales and magnoliids.
Abstract: Angiosperms are the largest and most successful clade of land plants with >250,000 species distributed in nearly every terrestrial habitat. Many phylogenetic studies have been based on DNA sequences of one to several genes, but, despite decades of intensive efforts, relationships among early diverging lineages and several of the major clades remain either incompletely resolved or weakly supported. We performed phylogenetic analyses of 81 plastid genes in 64 sequenced genomes, including 13 new genomes, to estimate relationships among the major angiosperm clades, and the resulting trees are used to examine the evolution of gene and intron content. Phylogenetic trees from multiple methods, including model-based approaches, provide strong support for the position of Amborella as the earliest diverging lineage of flowering plants, followed by Nymphaeales and Austrobaileyales. The plastid genome trees also provide strong support for a sister relationship between eudicots and monocots, and this group is sister to a clade that includes Chloranthales and magnoliids. Resolution of relationships among the major clades of angiosperms provides the necessary framework for addressing numerous evolutionary questions regarding the rapid diversification of angiosperms. Gene and intron content are highly conserved among the early diverging angiosperms and basal eudicots, but 62 independent gene and intron losses are limited to the more derived monocot and eudicot clades. Moreover, a lineage-specific correlation was detected between rates of nucleotide substitutions, indels, and genomic rearrangements.

943 citations


Journal ArticleDOI
TL;DR: A strain selection framework is proposed, based on robust phylogenetic markers, which will allow for systematic and comprehensive evaluation of new tools for tuberculosis control and suggest strain-specific differences in virulence and immunogenicity.
Abstract: New tools for controlling tuberculosis are urgently needed. Despite our emerging understanding of the biogeography of Mycobacterium tuberculosis, the implications for development of new diagnostics, drugs, and vaccines is unknown. M tuberculosis has a clonal genetic population structure that is geographically constrained. Evidence suggests strain-specific differences in virulence and immunogenicity in light of this global phylogeography. We propose a strain selection framework, based on robust phylogenetic markers, which will allow for systematic and comprehensive evaluation of new tools for tuberculosis control.

693 citations


Journal ArticleDOI
TL;DR: A simulation framework to model trait evolution, assemble communities (via competition, habitat filtering, or neutral assembly), and test the phylogenetic pattern of the resulting communities found that phylogenetic community structure is greatest when traits are highly conserved and when multiple traits influence species membership in communities.
Abstract: Taxa co-occurring in communities often represent a non- random sample, in phenotypic or phylogenetic terms, of the regional species pool. While heuristic arguments have identified processes that create community phylogenetic patterns, further progress hinges on a more comprehensive understanding of the interactions between underlying ecological and evolutionary processes. We created a sim- ulation framework to model trait evolution, assemble communities (via competition, habitat filtering, or neutral assembly), and test the phylogenetic pattern of the resulting communities. We found that phylogenetic community structure is greatest when traits are highly conserved and when multiple traits influence species membership in communities. Habitat filtering produces stronger phylogenetic struc- ture when taxa with derived (as opposed to ancestral) traits are favored in the community. Nearest-relative tests have greater power to detect patterns due to competition, while total community relat- edness tests perform better with habitat filtering. The size of the local community relative to the regional pool strongly influences statistical power; in general, power increases with larger pool sizes for com- munities created by filtering but decreases for communities created by competition. Our results deepen our understanding of processes that contribute to phylogenetic community structure and provide guidance for the design and interpretation of empirical research.

655 citations


Journal ArticleDOI
TL;DR: A Bayesian model for estimating species trees that accounts for the stochastic variation expected for gene trees from multiple unlinked loci sampled from a single species history after a coalescent process is analyzed.
Abstract: The vast majority of phylogenetic models focus on resolution of gene trees, despite the fact that phylogenies of species in which gene trees are embedded are of primary interest. We analyze a Bayesian model for estimating species trees that accounts for the stochastic variation expected for gene trees from multiple unlinked loci sampled from a single species history after a coalescent process. Application of the model to a 106-gene data set from yeast shows that the set of gene trees recovered by statistically acknowledging the shared but unknown species tree from which gene trees are sampled is much reduced compared with treating the history of each locus independently of an overarching species tree. The analysis also yields a concentrated posterior distribution of the yeast species tree whose mode is congruent with the concatenated gene tree but can do so with less than half the loci required by the concatenation method. Using simulations, we show that, with large numbers of loci, highly resolved species trees can be estimated under conditions in which concatenation of sequence data will positively mislead phylogeny, and when the proportion of gene trees matching the species tree is <10%. However, when gene tree/species tree congruence is high, species trees can be resolved with just two or three loci. These results make accessible an alternative paradigm for combining data in phylogenomics that focuses attention on the singularity of species histories and away from the idiosyncrasies and multiplicities of individual gene histories.

626 citations


Journal ArticleDOI
TL;DR: The single-copy gene rpoB provided comparable phylogenetic resolution to that of the 16S rRNA gene at all taxonomic levels, except between closely related organisms (species and subspecies levels), for which it provided better resolution.
Abstract: Several characteristics of the 16S rRNA gene, such as its essential function, ubiquity, and evolutionary properties, have allowed it to become the most commonly used molecular marker in microbial ecology However, one fact that has been overlooked is that multiple copies of this gene are often present in a given bacterium These intragenomic copies can differ in sequence, leading to identification of multiple ribotypes for a single organism To evaluate the impact of such intragenomic heterogeneity on the performance of the 16S rRNA gene as a molecular marker, we compared its phylogenetic and evolutionary characteristics to those of the single-copy gene rpoB Full-length gene sequences and gene fragments commonly used for denaturing gradient gel electrophoresis were compared at various taxonomic levels Heterogeneity found between intragenomic 16S rRNA gene copies was concentrated in specific regions of rRNA secondary structure Such “heterogeneity hot spots” occurred within all gene fragments commonly used in molecular microbial ecology This intragenomic heterogeneity influenced 16S rRNA gene tree topology, phylogenetic resolution, and operational taxonomic unit estimates at the species level or below rpoB provided comparable phylogenetic resolution to that of the 16S rRNA gene at all taxonomic levels, except between closely related organisms (species and subspecies levels), for which it provided better resolution This is particularly relevant in the context of a growing number of studies focusing on subspecies diversity, in which single-copy protein-encoding genes such as rpoB could complement the information provided by the 16S rRNA gene

604 citations


Journal ArticleDOI
TL;DR: Parsimony analyses of combined and partitioned data sets varied in the placement of several taxa, particularly Ceratophyllum, whereas maximum-likelihood (ML) trees were more topologically stable, and ML bootstrap and Bayesian support values for these relationships were generally high, although approximately unbiased topology tests could not reject several alternative topologies.
Abstract: Although great progress has been made in clarifying deep-level angiosperm relationships, several early nodes in the angiosperm branch of the Tree of Life have proved difficult to resolve. Perhaps the last great question remaining in basal angiosperm phylogeny involves the branching order among the five major clades of mesangiosperms (Ceratophyllum, Chloranthaceae, eudicots, magnoliids, and monocots). Previous analyses have found no consistent support for relationships among these clades. In an effort to resolve these relationships, we performed phylogenetic analyses of 61 plastid genes (≈42,000 bp) for 45 taxa, including members of all major basal angiosperm lineages. We also report the complete plastid genome sequence of Ceratophyllum demersum. Parsimony analyses of combined and partitioned data sets varied in the placement of several taxa, particularly Ceratophyllum, whereas maximum-likelihood (ML) trees were more topologically stable. Total evidence ML analyses recovered a clade of Chloranthaceae + magnoliids as sister to a well supported clade of monocots + (Ceratophyllum + eudicots). ML bootstrap and Bayesian support values for these relationships were generally high, although approximately unbiased topology tests could not reject several alternative topologies. The extremely short branches separating these five lineages imply a rapid diversification estimated to have occurred between 143.8 ± 4.8 and 140.3 ± 4.8 Mya.

Journal ArticleDOI
TL;DR: A complete phylogeny for all 271 extant species of the Garnivora is derived, providing a ‘consensus’ estimate of carnivore phylogeny and showing that some lineages within the Mustelinae and Canidae contain significantly more species than expected for their age, illustrating the tree's utility for studies of macroevolution.
Abstract: One way to build larger, more comprehensive phylogenies is to combine the vast amount of phylogenetic information already available. We review the two main strategies for accomplishing this (combining raw data versus combining trees), but employ a relatively new variant of the latter: supertree construction. The utility of one supertree technique, matrix representation using parsimony analysis (MRP), is demonstrated by deriving a complete phylogeny for all 271 extant species of the Carnivora from 177 literature sources. Beyond providing a ‘consensus’ estimate of carnivore phylogeny, the tree also indicates taxa for which the relationships remain controversial (e.g. the red panda; within canids, felids, and hyaenids) or have not been studied in any great detail (e.g. herpestids, viverrids, and intrageneric relationships in the procyonids). Times of divergence throughout the tree were also estimated from 74 literature sources based on both fossil and molecular data. We use the phylogeny to show that some lineages within the Mustelinae and Canidae contain significantly more species than expected for their age, illustrating the tree’s utility for studies of macroevolution. It will also provide a useful foundation for comparative and conservational studies involving the carnivores.

Journal ArticleDOI
TL;DR: A Bayesian hierarchical model to estimate the phylogeny of a group of species using multiple estimated gene tree distributions, such as those that arise in a Bayesian analysis of DNA sequence data, is proposed and applied to two multilocus data sets of DNA sequences.
Abstract: The desire to infer the evolutionary history of a group of species should be more viable now that a considerable amount of multilocus molecular data is available. However, the current molecular phylogenetic paradigm still reconstructs gene trees to represent the species tree. Further, commonly used methods of combining data, such as the concatenation method, are known to be inconsistent in some circumstances. In this paper, we propose a Bayesian hierarchical model to estimate the phylogeny of a group of species using multiple estimated gene tree distributions, such as those that arise in a Bayesian analysis of DNA sequence data. Our model employs substitution models used in traditional phylogenetics but also uses coalescent theory to explain genealogical signals from species trees to gene trees and from gene trees to sequence data, thereby forming a complete stochastic model to estimate gene trees, species trees, ancestral population sizes, and species divergence times simultaneously. Our model is founded on the assumption that gene trees, even of unlinked loci, are correlated due to being derived from a single species tree and therefore should be estimated jointly. We apply the method to two multilocus data sets of DNA sequences. The estimates of the species tree topology and divergence times appear to be robust to the prior of the population size, whereas the estimates of effective population sizes are sensitive to the prior used in the analysis. These analyses also suggest that the model is superior to the concatenation method in fitting these data sets and thus provides a more realistic assessment of the variability in the distribution of the species tree that may have produced the molecular information at hand. Future improvements of our model and algorithm should include consideration of other factors that can cause discordance of gene trees and species trees, such as horizontal transfer or gene duplication.

Journal ArticleDOI
23 Aug 2007-Nature
TL;DR: This work uses phylogenetic methods to show that the phylogenetic relationships of species predict the number of interactions they exhibit in more than one-third of the networks, and the identity of the species with which they interact in about half of the Networks.
Abstract: Plants and their pollinators and seed dispersers form complex networks of interdependences. These networks have a well-defined architecture that strongly affects biodiversity maintenance. Using a phylogenetic approach, Rezende et al. show that past evolutionary history of plants and animals partly explains the network patterns. Closely related species tend to play similar roles in the network. As a result, coextinction cascades following a species extinction affect taxonomically related species, resulting in a non-random pruning of the evolutionary tree. From a conservation standpoint, this means that cascades of coextinction may spread across related species, further increasing the erosion of taxonomic diversity. A phylogenetic approach is used to show that past evolutionary history partly explains network patterns that link plants and their pollinators and seed dispersers. Species close in the phylogeny tend to play similar roles in the network. As a result, co-extinction cascades following the extinction of a species affect taxonomically related species, resulting in a non-random pruning of the evolutionary tree. The interactions between plants and their animal pollinators and seed dispersers have moulded much of Earth’s biodiversity1,2,3. Recently, it has been shown that these mutually beneficial interactions form complex networks with a well-defined architecture that may contribute to biodiversity persistence4,5,6,7,8. Little is known, however, about which ecological and evolutionary processes generate these network patterns3,9. Here we use phylogenetic methods10,11 to show that the phylogenetic relationships of species predict the number of interactions they exhibit in more than one-third of the networks, and the identity of the species with which they interact in about half of the networks. As a consequence of the phylogenetic effects on interaction patterns, simulated extinction events tend to trigger coextinction cascades of related species. This results in a non-random pruning of the evolutionary tree12,13 and a more pronounced loss of taxonomic diversity than expected in the absence of a phylogenetic signal. Our results emphasize how the simultaneous consideration of phylogenetic information and network architecture can contribute to our understanding of the structure and fate of species-rich communities.

Journal ArticleDOI
TL;DR: A theoretical framework based on phylogenetic comparative methods to integrate phylogeny into three measures of biodiversity: species variability, richness, and evenness is developed, which should aid with the incorporation of phylogenetic information into strategies for understanding biodiversity and its conservation.
Abstract: We developed a theoretical framework based on phylogenetic comparative methods to integrate phylogeny into three measures of biodiversity: species variability, richness, and evenness. These metrics can be used in conjunction with permutation procedures to test for phylogenetic community structure. As an illustration, we analyzed data on the composition of 58 lake fish communities in Wisconsin. The fish communities showed phylogenetic underdispersion, with communities more likely to contain closely related species. Using information about differences in environmental characteristics among lakes, we demonstrated that phylogenetic underdispersion in fish communities was associated with environmental factors. For example, lakes with low pH were more likely to contain species in the same clade of acid-tolerant species. Our metrics differ from existing metrics used to calculate phylogenetic community structure, such as net relatedness index and Faith's phylogenetic diversity. Our metrics have the advantage of providing an integrated and easy-to-understand package of phylogenetic measures of species variability, richness, and evenness with well-defined statistical properties. Furthermore, they allow the easy evaluation of contributions of individual species to different aspects of the phylogenetic organization of communities. Therefore, these metrics should aid with the incorporation of phylogenetic information into strategies for understanding biodiversity and its conservation.

Journal ArticleDOI
TL;DR: The genome sequence assemblies of human, armadillo, elephant, and opossum are analyzed to identify informative coding indels that would serve as rare genomic changes to infer early events in placental mammal phylogeny and suggest Afrotheria and Xenarthra diverged from other placental mammals approximately 103 (95-114) million years ago.
Abstract: The phylogeny of placental mammals is a critical framework for choosing future genome sequencing targets and for resolving the ancestral mammalian genome at the nucleotide level. Despite considerable recent progress defining superordinal relationships, several branches remain poorly resolved, including the root of the placental tree. Here we analyzed the genome sequence assemblies of human, armadillo, elephant, and opossum to identify informative coding indels that would serve as rare genomic changes to infer early events in placental mammal phylogeny. We also expanded our species sampling by including sequence data from >30 ongoing genome projects, followed by PCR and sequencing validation of each indel in additional taxa. Our data provide support for a sister-group relationship between Afrotheria and Xenarthra (the Atlantogenata hypothesis), which is in turn the sister-taxon to Boreoeutheria. We failed to recover any indels in support of a basal position for Xenarthra (Epitheria), which is suggested by morphology and a recent retroposon analysis, or a hypothesis with Afrotheria basal (Exafricoplacentalia), which is favored by phylogenetic analysis of large nuclear gene data sets. In addition, we identified two retroposon insertions that also support Atlantogenata and none for the alternative hypotheses. A revised molecular timescale based on these phylogenetic inferences suggests Afrotheria and Xenarthra diverged from other placental mammals ∼103 (95–114) million years ago. We discuss the impacts of this topology on earlier phylogenetic reconstructions and repeat-based inferences of phylogeny.

Journal ArticleDOI
TL;DR: Endophytic fungi in asymptomatic foliage of loblolly pine (Pinus taeda) in North Carolina, U.S.A., are examined to evaluate morphotaxa, BLAST matches and groups based on sequence similarity as functional taxonomic units, and the utility of PD relative to traditional ecological indices is investigated.
Abstract: We examined endophytic fungi in asymptomatic foliage of loblolly pine (Pinus taeda) in North Carolina, USA, with four goals: (i) to evaluate morphotaxa, BLAST matches and groups based on sequence similarity as functional taxonomic units; (ii) to explore methods to maximize phylogenetic signal for environmental datasets, which typically contain many taxa but few characters; (iii) to compare culturing vs. culture-free methods (environmental PCR of surface sterilized foliage) for estimating endophyte diversity and species composition; and (iv) to investigate the relationships between traditional ecological indices (e.g. Shannon index) and phylogenetic diversity (PD) in estimating endophyte diversity and spatial heterogeneity. Endophytes were recovered in culture from 87 of 90 P. taeda leaves sampled, yielding 439 isolates that represented 24 morphotaxa. Sequence data from the nuclear ribosomal internal transcribed spacer (ITS) for 150 isolates revealed 59 distinct ITS genotypes that represented 24 and 37 uni...

Journal ArticleDOI
TL;DR: The results suggest that CB analyses provide a more consistent estimate of nodal support than PBS and that combining heterogeneous gene partitions, which individually support a limited number of nodes, results in increased support for overall tree topology.

Journal ArticleDOI
TL;DR: It is shown that incorporating a model of the stochastic loss of gene lineages by genetic drift into the phylogenetic estimation procedure can provide a robust estimate of species relationships, despite widespread incomplete sorting of ancestral polymorphism.
Abstract: Estimating phylogenetic relationships among closely related species can be extremely difficult when there is incongruence among gene trees and between the gene trees and the species tree. Here we show that incorporating a model of the stochastic loss of gene lineages by genetic drift into the phylogenetic estimation procedure can provide a robust estimate of species relationships, despite widespread incomplete sorting of ancestral polymorphism. This approach is applied to a group of montane Melanoplus grasshoppers for which genealogical discordance among loci and incomplete lineage sorting obscures any obvious phylogenetic relationships among species. Unlike traditional treatments where gene trees estimated using standard phylogenetic methods are implicitly equated with the species tree, with the coalescent-based approach the species tree is modeled probabilistically from the estimated gene trees. The estimated species phylogeny (the ESP) is calculated for the grasshoppers from multiple gene trees reconstructed for nuclear loci and a mitochondrial gene. This empirical application is coupled with a simulation study to explore the performance of the coalescent-based approach. Specifically, we test the accuracy of the ESP given the data based on analyses of simulated data matching the multilocus data collected in Melanoplus (i.e., data were simulated for each locus with the same number of base pairs and locus-specific mutational models). The results of the study show that ESPs can be computed using the coalescent-based approach long before reciprocal monophyly has been achieved, and that these statistical estimates are accurate. This contrasts with analyses of the empirical data collected in Melanoplus and simulated data based on concatenation of multiple loci, for which the incomplete lineage sorting of recently diverged species posed significant problems. The strengths and potential challenges associated with incorporating an explicit model of gene-lineage coalescence into the phylogenetic procedure to obtain an ESP, as illustrated by application to Melanoplus, versus concatenation and consensus approaches are discussed. This study represents a fundamental shift in how species relationships are estimated—the relationship between the gene trees and the species phylogeny is modeled probabilistically rather than equating gene trees with a species tree. (Coalescent; gene trees; incomplete lineage sorting; species phylogeny.)

Journal ArticleDOI
TL;DR: A practical approach that systematically compares whole genome sequences to identify single-copy nuclear gene markers for inferring phylogeny is presented and is an improvement over traditional approaches because it uses genomic information and automates the process to identify large numbers of candidate makers.
Abstract: Molecular systematics occupies one of the central stages in biology in the genomic era, ushered in by unprecedented progress in DNA technology. The inference of organismal phylogeny is now based on many independent genetic loci, a widely accepted approach to assemble the tree of life. Surprisingly, this approach is hindered by lack of appropriate nuclear gene markers for many taxonomic groups especially at high taxonomic level, partially due to the lack of tools for efficiently developing new phylogenetic makers. We report here a genome-comparison strategy to identifying nuclear gene markers for phylogenetic inference and apply it to the ray-finned fishes – the largest vertebrate clade in need of phylogenetic resolution. A total of 154 candidate molecular markers – relatively well conserved, putatively single-copy gene fragments with long, uninterrupted exons – were obtained by comparing whole genome sequences of two model organisms, Danio rerio and Takifugu rubripes. Experimental tests of 15 of these (randomly picked) markers on 36 taxa (representing two-thirds of the ray-finned fish orders) demonstrate the feasibility of amplifying by PCR and directly sequencing most of these candidates from whole genomic DNA in a vast diversity of fish species. Preliminary phylogenetic analyses of sequence data obtained for 14 taxa and 10 markers (total of 7,872 bp for each species) are encouraging, suggesting that the markers obtained will make significant contributions to future fish phylogenetic studies. We present a practical approach that systematically compares whole genome sequences to identify single-copy nuclear gene markers for inferring phylogeny. Our method is an improvement over traditional approaches (e.g., manually picking genes for testing) because it uses genomic information and automates the process to identify large numbers of candidate makers. This approach is shown here to be successful for fishes, but also could be applied to other groups of organisms for which two or more complete genome sequences exist, which has important implications for assembling the tree of life.

Journal ArticleDOI
TL;DR: This approach is expected to result in superior accuracy compared to single-gene or phylogenomic analyses because the orthology problem is resolved and a strong determinant not depending on any technical uncertainties is incorporated, the class distribution.
Abstract: The evolutionary history of organisms is expressed in phylogenetic trees. The most widely used phylogenetic trees describing the evolution of all organisms have been constructed based on single-gene phylogenies that, however, often produce conflicting results. Incongruence between phylogenetic trees can result from the violation of the orthology assumption and stochastic and systematic errors. Here, we have reconstructed the tree of eukaryotic life based on the analysis of 2,269 myosin motor domains from 328 organisms. All sequences were manually annotated and verified, and were grouped into 35 myosin classes, of which 16 have not been proposed previously. The resultant phylogenetic tree confirms some accepted relationships of major taxa and resolves disputed and preliminary classifications. We place the Viridiplantae after the separation of Euglenozoa, Alveolata, and Stramenopiles, we suggest a monophyletic origin of Entamoebidae, Acanthamoebidae, and Dictyosteliida, and provide evidence for the asynchronous evolution of the Mammalia and Fungi. Our analysis of the myosins allowed combining phylogenetic information derived from class-specific trees with the information of myosin class evolution and distribution. This approach is expected to result in superior accuracy compared to single-gene or phylogenomic analyses because the orthology problem is resolved and a strong determinant not depending on any technical uncertainties is incorporated, the class distribution. Combining our analysis of the myosins with high quality analyses of other protein families, for example, that of the kinesins, could help in resolving still questionable dependencies at the origin of eukaryotic life.

Journal ArticleDOI
TL;DR: This work uses a phylogenetic framework and embryo expression data from Drosophila to show that grouping genes by their phylogenetic origin can uncover footprints of important adaptive events in evolution.

Journal ArticleDOI
20 Jan 2007-Virology
TL;DR: The cap gene was the more suitable phylogenetic and epidemiological marker for PCV2, despite the fact that the virus can undergo recombination mainly within the first part of the rep region.

Journal ArticleDOI
TL;DR: In this article, a phylogenetic supertree of all species of three communities driven by facilitation showed that nurse species facilitated distantly related species and increased phylogenetic diversity, and the regeneration niches were strongly conserved across evolutionary history.
Abstract: With the advent of molecular phylogenies the assessment of community assembly processes has become a central topic in community ecology. These processes have focused almost exclusively on habitat filtering and competitive exclusion. Recent evidence, however, indicates that facilitation has been important in preserving biodiversity over evolutionary time, with recent lineages conserving the regeneration niches of older, distant lineages. Here we test whether, if facilitation among distant-related species has preserved the regeneration niche of plant lineages, this has increased the phylogenetic diversity of communities. By analyzing a large worldwide database of species, we showed that the regeneration niches were strongly conserved across evolutionary history. Likewise, a phylogenetic supertree of all species of three communities driven by facilitation showed that nurse species facilitated distantly related species and increased phylogenetic diversity.

Journal ArticleDOI
TL;DR: This measure of phylogenetic informativeness provides a quantitative measure of the capacity of a gene to resolve soft polytomies and conveys the utility of the addition of characters a phylogenetic study and provides a basis for deciding whether appropriate phylogenetic power has been applied to a polytomy that is proposed to be a rapid radiation.
Abstract: The resolution of four controversial topics in phylogenetic experimental design hinges upon the informativeness of characters about the historical relationships among taxa. These controversies regard the power of different classes of phylogenetic character, the relative utility of increased taxonomic versus character sampling, the differentiation between lack of phylogenetic signal and a historical rapid radiation, and the design of taxonomically broad phylogenetic studies optimized by taxonomically sparse genome-scale data. Quantification of the informativeness of characters for resolution of phylogenetic hypotheses during specified historical epochs is key to the resolution of these controversies. Here, such a measure of phylogenetic informativeness is formulated. The optimal rate of evolution of a character to resolve a dated four-taxon polytomy is derived. By scaling the asymptotic informativeness of a character evolving at a nonoptimal rate by the derived asymptotic optimum, and by normalizing so that net phylogenetic informativeness is equivalent for all rates when integrated across all of history, an informativeness profile across history is derived. Calculation of the informativeness per base pair allows estimation of the cost-effectiveness of character sampling. Calculation of the informativeness per million years allows comparison across historical radiations of the utility of a gene for the inference of rapid adaptive radiation. The theory is applied to profile the phylogenetic informativeness of the genes BRCA1, RAG1, GHR, and c-myc from a muroid rodent sequence data set. Bounded integrations of the phylogenetic profile of these genes over four epochs comprising the diversifications of the muroid rodents, the mammals, the lobe-limbed vertebrates, and the early metazoans demonstrate the differential power of these genes to resolve the branching order among ancestral lineages. This measure of phylogenetic informativeness yields a new kind of information for evaluation of phylogenetic experiments. It conveys the utility of the addition of characters a phylogenetic study and it provides a basis for deciding whether appropriate phylogenetic power has been applied to a polytomy that is proposed to be a rapid radiation. Moreover, it provides a quantitative measure of the capacity of a gene to resolve soft polytomies.

01 Jan 2007
TL;DR: Analysis of a large worldwide database of species showed that the regeneration niches were strongly conserved across evolutionary history and a phylogenetic supertree of all species of three communities driven by facilitation showed that nurse species facilitated distantly related species and increased phylogenetic diversity.
Abstract: With the advent of molecular phylogenies the assessment of community assembly processes has become a central topic in community ecology. These processes have focused almost exclusively on habitat filtering and competitive exclusion. Recent evidence, however, indicates that facilitation has been important in preserving biodiversity over evolutionary time, with recent lineages conserving the regeneration niches of older, distant lineages. Here we test whether, if facilitation among distantrelated species has preserved the regeneration niche of plant lineages, this has increased the phylogenetic diversity of communities. By analyzing a large worldwide database of species, we showed that the regeneration niches were strongly conserved across evolutionary history. Likewise, a phylogenetic supertree of all species of three communities driven by facilitation showed that nurse species facilitated distantly related species and increased phylogenetic diversity.

Journal ArticleDOI
08 Aug 2007-PLOS ONE
TL;DR: It appears that the number of OR genes is determined primarily by the functional requirement for each species, but once the number reaches the required level, it fluctuates by random duplication and deletion of genes, aided by the stochastic nature of OR gene expression.
Abstract: Odor perception in mammals is mediated by a large multigene family of olfactory receptor (OR) genes. The number of OR genes varies extensively among different species of mammals, and most species have a substantial number of pseudogenes. To gain some insight into the evolutionary dynamics of mammalian OR genes, we identified the entire set of OR genes in platypuses, opossums, cows, dogs, rats, and macaques and studied the evolutionary change of the genes together with those of humans and mice. We found that platypuses and primates have ,400 functional OR genes while the other species have 800–1,200 functional OR genes. We then estimated the numbers of gains and losses of OR genes for each branch of the phylogenetic tree of mammals. This analysis showed that (i) gene expansion occurred in the placental lineage each time after it diverged from monotremes and from marsupials and (ii) hundreds of gains and losses of OR genes have occurred in an orderspecific manner, making the gene repertoires highly variable among different orders. It appears that the number of OR genes is determined primarily by the functional requirement for each species, but once the number reaches the required level, it fluctuates by random duplication and deletion of genes. This fluctuation seems to have been aided by the stochastic nature of OR gene expression.

Journal ArticleDOI
TL;DR: Small subunit ribosomal DNA (SSU rDNA) sequences for 100 previously un-sequenced species of nematodes, including 46 marine taxa are added and phylogenies provide support for the re-classification of the Secernentea as the order Rhabditida that derived from a common ancestor of chromadorean orders Araeolaimida, Chromadoria, Desmodorida, Desmoscolecida, and Monhysterida.

Journal ArticleDOI
TL;DR: A common evolutionary ancestor for the family Tymoviridae and the two distinct evolutionary clusters of the Flexiviridae is postulate, i.e., a plant virus with a polyadenylated genome, filamentous virions, and a triple gene block of movement proteins, which would have generated a very diverse group of plant and fungal viruses.
Abstract: The plant virus family Flexiviridae includes the definitive genera Potexvirus, Mandarivirus, Allexivirus, Carlavirus, Foveavirus, Capillovirus, Vitivirus, Trichovirus, the putative genus Citrivirus, and some unassigned species. Its establishment was based on similarities in virion morphology, common features in genome type and organization, and strong phylogenetic relationships between replicational and structural proteins. In this review, we provide a brief account of the main biological and molecular properties of the members of the family, with special emphasis on the relationships within and among the genera. In phylogenetic analyses the potexvirus-like replicases were more closely related to tymoviruses than to carlaviruses. We postulate a common evolutionary ancestor for the family Tymoviridae and the two distinct evolutionary clusters of the Flexiviridae, i.e., a plant virus with a polyadenylated genome, filamentous virions, and a triple gene block of movement proteins. Subsequent recombina...

Journal ArticleDOI
TL;DR: A high-resolution map of the origin of the laboratory mouse is developed by generating 25,400 phylogenetic trees at 100-kb intervals spanning the genome, and it is found that most of the genome has intermediate levels of variation of intrasubspecific origin.
Abstract: The genome of the laboratory mouse is thought to be a mosaic of regions with distinct subspecific origins. We have developed a high-resolution map of the origin of the laboratory mouse by generating 25,400 phylogenetic trees at 100-kb intervals spanning the genome. On average, 92% of the genome is of Mus musculus domesticus origin, and the distribution of diversity is markedly nonrandom among the chromosomes. There are large regions of extremely low diversity, which represent blind spots for studies of natural variation and complex traits, and hot spots of diversity. In contrast with the mosaic model, we found that most of the genome has intermediate levels of variation of intrasubspecific origin. Finally, mouse strains derived from the wild that are supposed to represent different mouse subspecies show substantial intersubspecific introgression, which has strong implications for evolutionary studies that assume these are pure representatives of a given subspecies.