scispace - formally typeset
Search or ask a question

Showing papers in "Systematic Biology in 2009"


Journal ArticleDOI
TL;DR: The estimation of phylogenetic divergence times from sequence data is an important component of many molecular evolutionary studies, and a variety of local- and relaxed-clock methods have been proposed and implemented.
Abstract: The estimation of phylogenetic divergence times from sequence data is an important component of many molecular evolutionary studies. There is now a general appreciation that the procedure of divergence dating is considerably more complex than that initially described in the 1960s by Zuckerkandl and Pauling (1962, 1965). In particular, there has been much critical attention toward the assumption of a global molecular clock, resulting in the development of increasingly sophisticated techniques for inferring divergence times from sequence data. In response to the documentation of widespread departures from clocklike behavior, a variety of local- and relaxed-clock methods have been proposed and implemented. Local-clock methods permit different molecular clocks in different parts of the phylogenetic tree, thereby retaining the advantages of the classical molecular clock while casting off the restrictive assumption of a single, global rate of substitution (Rambaut and Bromham 1998; Yoder and Yang 2000).

770 citations


Journal ArticleDOI
TL;DR: A modified GMYC model is developed that allows for a variable transition from coalescent to speciation among lineages and provides a method of species discovery and biodiversity assessment using single-locus data from mixed or environmental samples while building a globally available taxonomic database for future identifications.
Abstract: High-throughput DNA sequencing has the potential to accelerate species discovery if it is able to recognize evolutionary entities from sequence data that are comparable to species. The general mixed Yule-coalescent (GMYC) model estimates the species boundary from DNA surveys by identifying independently evolving lineages as a transition from coalescent to speciation branching patterns on a phylogenetic tree. Applied here to 12 families from 4 orders of insects in Madagascar, we used the model to delineate 370 putative species from mitochondrial DNA sequence variation among 1614 individuals. These were compared with data from the nuclear genome and morphological identification and found to be highly congruent (98% and 94%). We developed a modified GMYC that allows for a variable transition from coalescent to speciation among lineages. This revised model increased the congruence with morphology (97%), suggesting that a variable threshold better reflects the clustering of sequence data into biological species. Local endemism was pronounced in all 5 insect groups. Most species (60-91%) and haplotypes (88-99%) were found at only 1 of the 5 study sites (40-1000 km apart). This pronounced endemism resulted in a 37% increase in species numbers using diagnostic nucleotides in a population aggregation analysis. Sample sizes between 7 and 10 individuals represented a threshold above which there was minimal increase in genetic diversity, broadly agreeing with coalescent theory and other empirical studies. Our results from >1.4 Mb of empirical data suggest that the GMYC model captures species boundaries comparable to those from traditional methods without the need for prior hypotheses of population coherence. This provides a method of species discovery and biodiversity assessment using single-locus data from mixed or environmental samples while building a globally available taxonomic database for future identifications. (Biodiversity; coalescent; DNA barcoding; DNA taxonomy; endemism; GMYC; Madagascar; turnover.)

652 citations


Journal ArticleDOI
TL;DR: A 6-gene, 420-species maximum-likelihood phylogeny of Ascomycota, the largest phylum of Fungi, and a phylogenetic informativeness analysis of all 6 genes and a series of ancestral character state reconstructions support a terrestrial, saprobic ecology as ancestral are presented.
Abstract: We present a 6-gene, 420-species maximum-likelihood phylogeny of Ascomycota, the largest phylum of Fungi. This analysis is the most taxonomically complete to date with species sampled from all 15 currently circumscribed classes. A number of superclass-level nodes that have previously evaded resolution and were unnamed in classifications of the Fungi are resolved for the first time. Based on the 6-gene phylogeny we conducted a phylogenetic informativeness analysis of all 6 genes and a series of ancestral character state reconstructions that focused on morphology of sporocarps, ascus dehiscence, and evolution of nutritional modes and ecologies. A gene-by-gene assessment of phylogenetic informativeness yielded higher levels of informativeness for protein genes (RPB1, RPB2, and TEF1) as compared with the ribosomal genes, which have been the standard bearer in fungal systematics. Our reconstruction of sporocarp characters is consistent with 2 origins for multicellular sexual reproductive structures in Ascomycota, once in the common ancestor of Pezizomycotina and once in the common ancestor of Neolectomycetes. This first report of dual origins of ascomycete sporocarps highlights the complicated nature of assessing homology of morphological traits across Fungi. Furthermore, ancestral reconstruction supports an open sporocarp with an exposed hymenium (apothecium) as the primitive morphology for Pezizomycotina with multiple derivations of the partially (perithecia) or completely enclosed (cleistothecia) sporocarps. Ascus dehiscence is most informative at the class level within Pezizomycotina with most superclass nodes reconstructed equivocally. Character-state reconstructions support a terrestrial, saprobic ecology as ancestral. In contrast to previous studies, these analyses support multiple origins of lichenization events with the loss of lichenization as less frequent and limited to terminal, closely related species.

592 citations


Journal ArticleDOI
TL;DR: 2 likelihood methods are developed that can be used to infer the effect of a trait on speciation and extinction without complete phylogenetic information, generalizing the recent binary-state speciationand extinction method.
Abstract: Species traits may influence rates of speciation and extinction, affecting both the patterns of diversification among lineages and the distribution of traits among species. Existing likelihood approaches for detecting differential diversification require complete phylogenies; that is, every extant species must be present in a well-resolved phylogeny. We developed 2 likelihood methods that can be used to infer the effect of a trait on speciation and extinction without complete phylogenetic information, generalizing the recent binary-state speciation and extinction method. Our approaches can be used where a phylogeny can be reasonably assumed to be a random sample of extant species or where all extant species are included but some are assigned only to terminal unresolved clades. We explored the effects of decreasing phylogenetic resolution on the ability of our approach to detect differential diversification within a Bayesian framework using simulated phylogenies. Differential diversification caused by an asymmetry in speciation rates was nearly as well detected with only 50% of extant species phylogenetically resolved as with complete phylogenetic knowledge. We demonstrate our unresolved clade method with an analysis of sexual dimorphism and diversification in shorebirds (Charadriiformes). Our methods allow for the direct estimation of the effect of a trait on speciation and extinction rates using incompletely resolved phylogenies.

527 citations


Journal ArticleDOI
TL;DR: It can be shown that the 2 methods are statistically consistent under the multispecies coalescent model, and it is suggested that STAR consistently outperforms STEAC, SC, and GLASS when the substitution rates among lineages are highly variable.
Abstract: The estimation of species trees (phylogenies) is one of the most important problems in evolutionary biology, and recently, there has been greater appreciation of the need to estimate species trees directly rather than using gene trees as a surrogate. A Bayesian method constructed under the multispecies coalescent model can consistently estimate species trees but involves intensive computation, which can hinder its application to the phylogenetic analysis of large-scale genomic data. Many summary statistics-based approaches, such as shallowest coalescences (SC) and Global LAteSt Split (GLASS), have been developed to infer species phylogenies for multilocus data sets. In this paper, we propose 2 methods, species tree estimation using average ranks of coalescences (STAR) and species tree estimation using average coalescence times (STEAC), based on the summary statistics of coalescence times. It can be shown that the 2 methods are statistically consistent under the multispecies coalescent model. STAR uses the ranks of coalescences and is thus resistant to variable substitution rates along the branches in gene trees. A simulation study suggests that STAR consistently outperforms STEAC, SC, and GLASS when the substitution rates among lineages are highly variable. Two real genomic data sets were analyzed by the 2 methods and produced species trees that are consistent with previous results.

440 citations


Journal ArticleDOI
TL;DR: The results of this study have major implications for all analyses that rely on accurate estimates of topology or branch lengths, including divergence time estimation, ancestral state reconstruction, tree-dependent comparative methods, rate variation analysis, phylogenetic hypothesis testing, and phylogeographic analysis.
Abstract: Although an increasing number of phylogenetic data sets are incomplete, the effect of ambiguous data on phy- logenetic accuracy is not well understood. We use 4-taxon simulations to study the effects of ambiguous data (i.e., missing characters or gaps) in maximum likelihood (ML) and Bayesian frameworks. By introducing ambiguous data in a way that removes confounding factors, we provide the first clear understanding of 1 mechanism by which ambiguous data can mislead phylogenetic analyses. We find that in both ML and Bayesian frameworks, among-site rate variation can interact with ambiguous data to produce misleading estimates of topology and branch lengths. Furthermore, within a Bayesian framework, priors on branch lengths and rate heterogeneity parameters can exacerbate the effects of ambiguous data, re- sulting in strongly misleading bipartition posterior probabilities. The magnitude and direction of the ambiguous data bias are a function of the number and taxonomic distribution of ambiguous characters, the strength of topological support, and whether or not the model is correctly specified. The results of this study have major implications for all analyses that rely on accurate estimates of topology or branch lengths, including divergence time estimation, ancestral state reconstruc- tion, tree-dependent comparative methods, rate variation analysis, phylogenetic hypothesis testing, and phylogeographic analysis. (Ambiguous characters; ambiguous data; Bayesian; bias; maximum likelihood; missing data; model misspecifica- tion; phylogenetics; posterior probabilities; prior.)

395 citations


Journal ArticleDOI
TL;DR: An alternative method that can identify random similarity within multiple sequence alignments (MSAs) based on Monte Carlo resampling within a sliding window is proposed and appears to be a powerful tool to identify possible biases of tree reconstructions or gene identification.
Abstract: Random similarity of sequences or sequence sections can impede phylogenetic analyses or the identification of gene homologies. Additionally, randomly similar sequences or ambiguously aligned sequence sections can negatively interfere with the estimation of substitution model parameters. Phylogenomic studies have shown that biases in model estimation and tree reconstructions do not disappear even with large data sets. In fact, these biases can become pronounced with more data. It is therefore important to identify possible random similarity within sequence alignments in advance of model estimation and tree reconstructions. Different approaches have been already suggested to identify and treat problematic alignment sections. We propose an alternative method that can identify random similarity within multiple sequence alignments (MSAs) based on Monte Carlo resampling within a sliding window. The method infers similarity profiles from pairwise sequence comparisons and subsequently calculates a consensus profile. This consensus profile represents a summary of all calculated single similarity profiles. In consequence, consensus profiles identify dominating patterns of nonrandom similarity or randomness within sections of MSAs. We show that the approach clearly identifies randomness in simulated and real data. After the exclusion of putative random sections, node support drastically improves in tree reconstructions of both data. It thus appears to be a powerful tool to identify possible biases of tree reconstructions or gene identification. The method is currently restricted to nucleotide data but will be extended to protein data in the near future.

332 citations


Journal ArticleDOI
TL;DR: It is found that the toothed whales are monophyletic, suggesting that echolocation evolved only once early in that lineage some 36–34 Ma, and support is found for increased diversification rates during periods of pronounced physical restructuring of the oceans.
Abstract: The remarkable fossil record of whales and dolphins (Cetacea) has made them an exemplar of macroevolution. Although their overall adaptive transition from terrestrial to fully aquatic organisms is well known, this is not true for the radiation of modern whales. Here, we explore the diversification of extant cetaceans by constructing a robust molecular phylogeny that includes 87 of 89 extant species. The phylogeny and divergence times are derived from nuclear and mitochondrial markers, calibrated with fossils. We find that the toothed whales are monophyletic, suggesting that echolocation evolved only once early in that lineage some 36-34 Ma. The rorqual family (Balaenopteridae) is restored with the exclusion of the gray whale, suggesting that gulp feeding evolved 18-16 Ma. Delphinida, comprising all living dolphins and porpoises other than the Ganges/Indus dolphins, originated about 26 Ma; it contains the taxonomically rich delphinids, which began diversifying less than 11 Ma. We tested 2 hypothesized drivers of the extant cetacean radiation by assessing the tempo of lineage accumulation through time. We find no support for a rapid burst of speciation early in the history of extant whales, contrasting with expectations of an adaptive radiation model. However, we do find support for increased diversification rates during periods of pronounced physical restructuring of the oceans. The results imply that paleogeographic and paleoceanographic changes, such as closure of major seaways, have influenced the dynamics of radiation in extant cetaceans.

329 citations


Journal ArticleDOI
TL;DR: Use of phylogenetic analyses that incorporate these newly recovered fungi and ancestral state reconstructions that take into account phylogenetic uncertainty provide the basis for estimating trophic transition networks in the Ascomycota and provide a first set of hypotheses regarding the evolution of symbiotrophy and saprotrophy in the most species-rich fungal phylum.
Abstract: Fungi associated with photosynthetic organisms are major determinants of terrestrial biomass, nutrient cycling, and ecosystem productivity from the poles to the equator. Whereas most fungi are known because of their fruit bodies (e.g., saprotrophs), symptoms (e.g., pathogens), or emergent properties as symbionts (e.g., lichens), the majority of fungal diversity is thought to occur among species that rarely manifest their presence with visual cues on their substrate (e.g., the apparently hyperdiverse fungal endophytes associated with foliage of plants). Fungal endophytes are ubiquitous among all lineages of land plants and live within overtly healthy tissues without causing disease, but the evolutionary origins of these highly diverse symbionts have not been explored. Here, we show that a key to understanding both the evolution of endophytism and the diversification of the most species-rich phylum of Fungi (Ascomycota) lies in endophyte-like fungi that can be isolated from the interior of apparently healthy lichens. These "endolichenic" fungi are distinct from lichen my- cobionts or any other previously recognized fungal associates of lichens, represent the same major lineages of Ascomycota as do endophytes, largely parallel the high diversity of endophytes from the arctic to the tropics, and preferentially associate with green algal photobionts in lichen thalli. Using phylogenetic analyses that incorporate these newly recovered fungi and ancestral state reconstructions that take into account phylogenetic uncertainty, we show that endolichenism is an incubator for the evolution of endophytism. In turn, endophytism is evolutionarily transient, with endophytic lineages frequently transitioning to and from pathogenicity. Although symbiotrophic lineages frequently give rise to free-living saprotrophs, reversions to symbiosis are rare. Together, these results provide the basis for estimating trophic transition networks in the Ascomycota and provide a first set of hypotheses regarding the evolution of symbiotrophy and saprotrophy in the most species-rich fungal phylum. (Ancestral state reconstruction; Ascomycota; Bayesian analysis; endolichenic fungi; fungal endophytes; lichens; pathogens; phylogeny; saprotrophy; symbiotrophy; trophic transition network.)

320 citations


Journal ArticleDOI
TL;DR: This study demonstrates that supertree and supermatrix methods can provide effective, explicit, and complimentary mechanisms for synthesizing disjointed phylogenetic evidence while emphasizing the need for further refinement of supertree methods.
Abstract: Supertree and supermatrix methods have great potential in the quest to build the tree of life and yet they remain controversial, with most workers opting for one approach or the other, but rarely both. Here, we employed both methods to construct phylogenetic trees of all genera of palms (Arecaceae/Palmae), an iconic angiosperm family of great economic importance. We assembled a supermatrix consisting of 16 partitions, comprising DNA sequence data, plastid restriction fragment length polymorphism data, and morphological data for all genera, from which a highly resolved and well-supported phylogenetic tree was built despite abundant missing data. To construct supertrees, we used variants of matrix representation with parsimony (MRP) analysis based on input trees generated directly from subsamples of the supermatrix. All supertrees were highly resolved. Standard MRP with bootstrap-weighted matrix elements performed most effectively in this case, generating trees with the greatest congruence with the supermatrix tree and fewest clades unsupported by any input tree. Nonindependence due to input trees based on combinations of data partitions was an acceptable trade-off for improvements in supertree performance. Irreversible MRP and the use of strictly independent input trees only provided no obvious benefits. Contrary to previous claims, we found that unsupported clades are not infrequent under some MRP implementations, with up to 13% of clades lacking support from any input tree in some irreversible MRP supertrees. To build a formal synthesis, we assessed the cross-corroboration between supermatrix trees and the variant supertrees using semistrict consensus, enumerating shared clades and compatible clades. The semistrict consensus of the supermatrix tree and the most congruent supertree contained 160 clades (of a maximum of 204), 137 of which were present in both trees. The relationships recovered by these trees strongly support the current phylogenetic classification of palms. We evaluate 2 composite supertree support measures (rQS and V) and conclude that it is more informative to report numbers of input trees that support or conflict with a given supertree clade. This study demonstrates that supertree and supermatrix methods can provide effective, explicit, and complimentary mechanisms for synthesizing disjointed phylogenetic evidence while emphasizing the need for further refinement of supertree methods.

213 citations


Journal ArticleDOI
TL;DR: This model allows examination of the evidence for hybridization in the presence of incomplete lineage sorting due to deep coalescence via model selection using standard information criteria (e.g., Akaike information criterion and Bayesian information criterion) and is evaluated using simulated data.
Abstract: As DNA sequences have become more readily available, it has become increasingly desirable to infer species phylogenies from multigene data sets. Much recent work has centered around the recognition that substantial incongruence in single-gene phylogenies necessitates the development of statistical procedures to estimate species phylogenies that appropriately model the process of evolution at the level of the individual genes. One process that gives rise to variation in the histories of individual genes is incomplete lineage sorting, which is commonly modeled by the coalescent, and thus much current work is focused on proper estimation of species phylogenies under the coalescent model. A second common source of discord in single-gene phylogenies is hybridization, a process that is ubiquitous in many groups of plants and animals. Although methods to incorporate hybridization into phylogenetic estimation have also been developed, only a handful of methods that address both coalescence and hybridization have been proposed. Here, I propose an extension of an existing model that incorporates both of these processes simultaneously by utilizing gene trees for inference in a likelihood framework. The model allows examination of the evidence for hybridization in the presence of incomplete lineage sorting due to deep coalescence via model selection using standard information criteria (e.g., Akaike information criterion and Bayesian information criterion). The potential of the method is evaluated using simulated data.

Journal ArticleDOI
TL;DR: New phylogenetic approaches explicitly consider the relationships between gene trees and the underlying history of species divergence, providing direct estimates of species trees (Fig. 1).
Abstract: Discord among the gene trees of multilocus data has motivated the development of phylogenetic approaches that account for gene-tree heterogeneity in the estimation procedure. Rather than equating a gene tree with the phylogenetic history, the new approaches explicitly consider the relationships between gene trees and the underlying history of species divergence, providing direct estimates of species trees (Fig. 1). The inherent appeal of these approaches is 2-fold. Incorporating information contained in the distribution of gene trees not only extracts phylogenetic signal, but modeling the relationship between the gene trees embedded in a species tree also reveals the biological processes that have influenced the diversification history and shaped organismal genomes. In contrast, ignoring the variance in genealogical histories (e.g., concatenating loci into a single supermatrix) disregards an inescapable biological reality—gene trees differ for a variety of reasons (reviewed in Maddison 1997; Degnan and Rosenberg 2009). As such, when the natural variation in gene trees is not taken into account during phylogenetic estimation, the reliability of inferences from such approaches is drawn into question (Degnan and Rosenberg 2006; Kubatko and Degnan 2007; Huang and Knowles 2009), historical scenarios with recently diverged taxa that have not reached reciprocal monophyly has become intractable (Carstens and Knowles 2007), and interpretation of the support for bipartitions across taxa is problematic (Mossel and Vigoda 2005).

Journal ArticleDOI
TL;DR: A likelihood method is proposed that circumvents this "barcoding gap" by testing for clustering in ultrametric trees and can infer the elusive species boundary directly from the transition in branching rate.
Abstract: The question of how DNA barcodes can and should be used in taxonomy has been debated for some time (Lipscomb et al. 2003; Tautz et al. 2003; Blaxter 2004; Vogler and Monaghan 2007; Wiens 2007). Although few doubt that they are a valuable molecular tool for match ing unidentified specimens to described taxa, this has little to do with the question of whether barcodes can be used to delimit species in the first place. The most radi cal turn in this debate has been the plea for a DNA-based taxonomy (Tautz et al. 2003; Blaxter 2004; Pons et al. 2006; Vogler and Monaghan 2007). Its proponents ar gue that "the vast majority of sequence variation in na ture is partitioned into clearly defined clusters" (Vogler and Monaghan 2007, p. 4), which " [... ] broadly mirror the species category" (Papadopoulou et al. 2008, p. 1) and could thus serve as basic taxonomic units. Initial attempts to employ this "barcoding gap" have relied on defining cutoff values of sequence divergence a pri ori (e.g., Blaxter 2004). Considering that the amount of genetic diversity within species can vary by orders of magnitude, it is clear that such an approach is arbi trary at best. Pons et al. (2006) have recently proposed a likelihood method that circumvents this problem by testing for clustering in ultrametric trees. They argue that "these new quantitative approaches can infer the elusive species boundary directly from the transition in branching rate and constitute an exciting possibility to define species from sequence variation [...]" (Vogler and Monaghan 2007, p. 6). Given such claims, it is not surprising that this method enjoys increasing popular ity, having been applied to a number of mitochondrial DNA (mtDNA) data sets (e.g., Pons et al. 2006; Ahrens et al. 2007; Fontaneto et al. 2007; Papadopoulou et al. 2008).

Journal ArticleDOI
TL;DR: Investigating the impacts of phylogeographic sampling decisions on species tree estimation in the Sceloporus undulatus species group suggests that inaccurate species assignments can result in inferred phylogenetic relationships that are dependent upon which particular populations are used as exemplars to represent species and can lead to increased estimates of effective population size.
Abstract: I investigated the impacts of phylogeographic sampling decisions on species tree estimation in the Sceloporus undulatus species group, a recent radiation of small, insectivorous lizards connected by parapatric and peripatric distribu- tion across North America, using a variety of species tree inference methods (Bayesian estimation of species trees, Bayesian untangling of concordance knots, and minimize deep coalescences). Phylogenetic analyses of 16 specimens representing 4 putative species within S. "undulatus" using complete (8 loci, >5.5 kb) and incomplete (29 loci, >23.6 kb) nuclear data sets result in species trees that share features with the mitochondrial DNA (mtDNA) genealogy at the phylogeographic level but provide new insights into the evolutionary history of the species group. The concatenated nuclear data and mtDNA data both recover 4 major clades connecting populations across North America; however, instances of discordance are localized at the contact zones between adjacent phylogeographic groups. A random sub-sampling experiment designed to vary the phylogeographic samples included across hundreds of replicate species tree inferences suggests that inaccurate species assignments can result in inferred phylogenetic relationships that are dependent upon which particular populations are used as exemplars to represent species and can lead to increased estimates of effective population size (Θ). For the phylogeographic data presented here, reassigning specimens with introgressed mtDNA genomes to their prospective species, or excluding them from the analysis altogether, produces species tree topologies that are distinctly different from analyses that utilize mtDNA-based species assignments. Evolutionary biologists working at the interface of phylogeography and phylogenetics are likely to encounter multiple processes influencing gene trees congruence, which increases the relevance of estimating species trees with multilocus nuclear data and models that accommodate deep coalescence. (Bayesian analysis; deep coalescence; gene flow; gene trees; introgression; lineage sorting; species delimitation; taxon sampling.) The growing ease of acquiring genomic scale data sets to study nonmodel organisms enables systematic biologists to assemble the tree of life with increasing de- tail. However, it is widely recognized that the stochastic process of lineage sorting can cause discordance among gene trees inferred from independent loci, which may

Journal ArticleDOI
TL;DR: The results show that for some combinations of species-tree branch lengths, increasing the number of independent loci can make the rooted majority-rule consensus tree more likely to be at least partially unresolved, and suggest a method for using multiple loci to infer the species- tree topology, even when it is discordant with the most likely gene tree.
Abstract: Consensus methods provide a useful strategy for summarizing information from a collection of gene trees. An important application of consensus methods is to combine gene trees to estimate a species tree. To investigate the theoretical properties of consensus trees that would be obtained from large numbers of loci evolving according to a basic evolutionary model, we construct consensus trees from rooted gene trees that occur in proportion to gene-tree probabilities derived from coalescent theory. We consider majority-rule, rooted triple (R ∗ ), and greedy consensus trees obtained from known, rooted gene trees, both in the asymptotic case as numbers of gene trees approach infinity and for finite numbers of genes. Our results show that for some combinations of species-tree branch lengths, increasing the number of independent loci can make the rooted majority-rule consensus tree more likely to be at least partially unresolved. However, the probability that the R ∗ consensus tree has the species-tree topology approaches 1 as the number of gene trees approaches ∞. Although the greedy consensus algorithm can be the quickest to converge on the correct species-tree topology when increasing the number of gene trees, it can also be positively misleading. The majority-rule consensus tree is not a misleading estimator of the species-tree topology, and the R ∗ consensus tree is a statistically consistent estimator of the species-tree topology. Our results therefore suggest a method for using multiple loci to infer the species-tree topology, even when it is discordant with the most likely gene tree. (Anomalous gene tree; coalescence; discordance; lineage sorting; phylogenetics; statistical consistency.)

Journal ArticleDOI
TL;DR: A broad range of phylogenetic methods are applied, including some traditional stationary models of evolution and all the more recent nonstationary models, and some of the newer software packages are found more appropriate for data of this nature.
Abstract: Many published phylogenies are based on methods that assume equal nucleotide composition among taxa. Studies have shown, however, that this assumption is often not accurate, particularly in divergent lineages. Nonstationary sequence evolution, when taxa in different lineages evolve in different ways, can lead to unequal nucleotide composition. This can cause inference methods to fail and phylogenies to be inaccurate. Recent advancements in phylogenetic theory have proposed new models of nonstationary sequence evolution; these models often outperform equivalent stationary models. A variety of new phylogenetic software implementing such models has been developed, but the studies employing the new methodology are still few. We discovered convergence of nucleotide composition within mitochondrial genomes of the insect order Coleoptera (beetles). We found variation in base content both among species and among genes in the genome. To this data set, we have applied a broad range of phylogenetic methods, including some traditional stationary models of evolution and all the more recent nonstationary models. We compare 8 inference methods applied to the same data set. Although the more commonly used methods universally fail to recover established clades, we find that some of the newer software packages are more appropriate for data of this nature. The software packages p4, PHASE, and nhPhyML were able to overcome the systematic bias in our data set, but parsimony, MrBayes, NJ, LogDet, and PhyloBayes were not.

Journal ArticleDOI
TL;DR: A new maximum-likelihood method that incorporates stochastic models of both nucleotide substitution and lineage sorting for species-tree estimation results in more accurate species trees than a summary-statistic based approach, and demonstrates that information contained in discordant gene trees can be effectively extracted using a full probabilistic model.
Abstract: The understanding that gene trees are often in discord with each other and with the species trees that contain them has led researchers to methods that incorporate the inherent stochasticity of genetic processes in the phylogenetic estimation procedure. Recently developed methods for species-tree estimation that not only consider the retention and sorting of ancestral polymorphism but also quantify the actual probabilities of incomplete lineage sorting are expected to provide an improvement over earlier summary-statistic based approaches that discard much of the information content of gene trees. However, these new methods have yet to be tested on truly challenging evolutionary histories such as those marked by recent rapid speciation where high levels of incomplete lineage sorting and discord among gene trees predominate. Here, we test a new maximum-likelihood method that incorporates stochastic models of both nucleotide substitution and lineage sorting for species-tree estimation. Using a simulation approach, we consider a broad range of species-tree topologies under 2 scenarios representing moderate and severe incomplete lineage sorting. We show that the maximum-likelihood method results in more accurate species trees than a summary-statistic based approach, demonstrating that information contained in discordant gene trees can be effectively extracted using a full probabilistic model. Moreover, we demonstrate that the shape of the original species tree (i.e., the relative lengths of internal branches) has a significant impact on whether the species tree is estimated accurately. In the speciation histories explored here, it is not just the recent origin of species that affects the accuracy of the estimates but the variance in relative species divergence times as well. Additionally, we show that sampling effort (number of individuals and/or loci) and sampling design (ratio of individuals to loci) are both important factors affecting the accuracy of species-tree estimates, which is again affected by the relative timing of divergence among species. The inherent difficulties of estimating relationships when species have undergone a recent radiation are discussed, and in particular, the limitations with maximum-likelihood estimates of species trees that do not consider uncertainty in the estimated gene trees of individual loci. Thus, despite substantial improvements over current summary-statistic based approaches, and the increased sophistication of procedures that incorporate the process of gene lineage coalescence, recent radiations still appear to pose daunting challenges for phylogenetics. (Coalescence; gene tree; lineage sorting; phylogenetics; species tree.)

Journal ArticleDOI
TL;DR: The results indicate that introgression characterizes the history of diversification in the E. spectabile species clade and may be relatively common among clades comprising the species-rich North American freshwater fauna.
Abstract: Phylogenies of closely related animal species are often inferred using mitochondrial DNA (mtDNA) gene se- quences. The accuracy of mtDNA gene trees is compromised through hybridization that leads to introgression of mito- chondrial genomes. Using DNA sequences from 6 single-copy nuclear genes and 2 regions of the mitochondrial genome, we investigated the temporal and geographic signature of mitochondrial and nuclear introgression in the Etheostoma spectabile darter clade. Phylogenetic analyses of the nuclear genes result in the monophyly of the E. spectabile clade; however, with respect to sampled specimens of 5 species (Etheostoma fragi, Etheostoma uniporum, Etheostoma pulchellum, Etheostoma burri, and E. spectabile), the mitochondrial phylogeny is inconsistent with E. spectabile clade monophyly. Etheostoma uniporum and E. fragi are both fixed for heterospecific mitochondrial genomes. Limited nuclear introgression is restricted to E. uniporum. Our analyses show that the pattern of introgression is consistently asymmetric, with movement of heterospecific mitochon- drial haplotypes and nuclear alleles into E. spectabile clade species; introgressive hybridization spans broad temporal scales; and introgression is restricted to species and populations in the Ozarks. The introgressed mitochondrial genome observed in E. fragi has an obscure phylogenetic placement among darters, an ancient age, and is possibly a mitochondrial fossil from an Etheostoma species that has subsequently gone extinct. These results indicate that introgression, both ancient and more contemporaneous, characterizes the history of diversification in the E. spectabile species clade and may be relatively common among clades comprising the species-rich North American freshwater fauna. (Gene phylogeny; hybridization; introgression; mtDNA; nuclear; molecular clock; reproductive isolation; species tree.)

Journal ArticleDOI
TL;DR: A recently developed likelihood method is applied to the angiosperm family Simaroubaceae, a geographically widespread and ecologically diverse clade of pantropical and temperate trees and shrubs, which exhibits an early history of range expansion between major continental areas in the Northern Hemisphere.
Abstract: Detailed biogeographic studies of pantropical clades are still relatively few, and those conducted to date typi- cally use parsimony or event-based methods to reconstruct ancestral areas. In this study, a recently developed likelihood method for reconstructing ancestral areas (the dispersal-extinction cladogenesis (DEC) model) is applied to the angiosperm family Simaroubaceae, a geographically widespread and ecologically diverse clade of pantropical and temperate trees and shrubs. To estimate divergence dates in the family, Bayesian uncorrelated rates analyses and robust fossil calibrations are applied to the well-sampled and strongly supported phylogeny. For biogeographic analyses, the effects of parameter con- figurations in the DEC model are assessed for different possible ancestral ranges, and the likelihood method is compared with dispersal-vicariance analysis (DIVA). Regardless of the parameters used, likelihood analyses show a common pattern of multiple recent range shifts that overshadow reconstruction of events deeper in the family's history. DIVA produced results similar to the DEC model when ancestral ranges were restricted to two areas, but some improbable ancestral ranges were also observed. Simaroubaceae exhibit an early history of range expansion between major continental areas in the Northern Hemisphere, but reconstruction of ancestral areas for lineages diverging in the early Tertiary are sensitive to the parameters of the model used. A North American origin is suggested for the family, with migration via Beringia by an- cestral taxa. In contrast to traditional views, long-distance dispersal events are common, particularly in the Late Oligocene and later. Notable dispersals are inferred to have occurred across the Atlantic Ocean in both directions, as well as between Africa and Asia, and around the Indian Ocean basin and Pacific islands. (DEC model; divergence date estimation; historical biogeography; long-distance dispersal; range evolution; Simaroubaceae.)

Journal ArticleDOI
TL;DR: It is argued that the invasion of Eurasia by Emys orbicularis occurred about 16 Ma via a trans-Beringian land bridge, which explains this discrepancy between mitochondrial and nuclear gene trees.
Abstract: Understanding the mechanisms by which widely disjunct members of a clade came to occupy their current distribution is one of the fundamental challenges of biogeography. Here, we used data from 7 nuclear and 1 mitochondrial gene to examine the phylogenetic and biogeographic history of Emys, a clade of turtles that is broadly disjunct in western and eastern North America and Europe. We found strong disagreement between mitochondrial and nuclear gene trees, with mitochondrial DNA supporting the monophyly of the North American taxa (marmorata + blandingii) to the exclusion of the European orbicularis, and nuclear genes supporting the monophyly of (blandingii + orbicularis) to the exclusion of marmorata. We used fossil-calibrated molecular chronograms, in combination with supporting evidence from the fossil record and paleoclimatology, to identify a potential example of ancient hybridization and mitochondrial gene capture 12 million years ago, which explains this discrepancy. Based on the weight of evidence, we argue that the invasion of Eurasia by Emys orbicularis occurred about 16 Ma via a trans-Beringian land bridge. The case of Emys emphasizes how single-gene trees can be strongly affected by population processes, including hybridization, and that the effects of these processes can persist through long periods of evolutionary history. Given the chaotic state of the current taxonomy of these turtles, our work also emphasizes the care that should be used in implementing taxonomic changes based on 1 or a few gene trees and the importance of taking a conservative approach in renaming or splitting higher taxa based on apparent nonmonophyly.

Journal ArticleDOI
TL;DR: The observed incongruent relationships among the three major lineages of Heliosperma are better explained by homoploid hybridization than by gene duplication/losses because species branching events exceed gene coalescence times under biologically reasonable population sizes and generation times, making lineage sorting an unlikely explanation.
Abstract: We used four potentially unlinked nuclear DNA regions from the gene family encoding the second largest subunit of the RNA polymerases, as well as the psbE-petG spacer and the rps16 intron from the chloroplast genome, to evaluate the origin of and relationships within Heliosperma (Sileneae, Caryophyllaceae). Relative dates of divergence times are used to discriminate between hybridization and gene duplication/loss as alternative explanations for topological conflicts between gene trees. The observed incongruent relationships among the three major lineages of Heliosperma are better explained by homoploid hybridization than by gene duplication/losses because species branching events exceed gene coalescence times under biologically reasonable population sizes and generation times, making lineage sorting an unlikely explanation. The origin of Heliosperma is complex and the gene trees likely reflect both reticulate evolution and sorting events. At least two lineages have been involved in the origin of Heliosperma, one most closely related to the ancestor of Viscaria and Atocion and the other to Eudianthe and/or Petrocoptis.

Journal ArticleDOI
TL;DR: Simulation studies performed with a molecular clock on rooted gene and species trees within the variation realized in stochastic simulations of DNA sequences under the Jukes and Cantor (1969) model of nucleotide substitution applied the ML method with a clock to recover phylogenetic trees from their simulated concatenated data sets.
Abstract: The concatenation method has been widely used as a means of combining data to estimate phylogenetic trees (Huelsenbeck et al. 1996a, 1996b; Glazko and Nei 2003). However, simulation studies have shown that the maximum likelihood (ML) estimate of the species tree for concatenated sequences may be statistically incon sistent if the gene trees are highly heterogeneous (Ko laczkowski and Thornton 2004; Kubatko and Degnan 2007). Recently, Degnan and Rosenberg (2006) defined an "anomaly zone"?a set of short internal branches in species trees that will generate gene trees that are discor dant with the species tree more often than gene trees that are concordant. Kubatko and Degnan (2007) went on to show that when DNA sequences are generated from gene trees simulated from species trees in the anomaly zone, as well as from species trees slightly outside this zone but still with short internal branches, the ML esti mate of the species tree for the concatenated sequences can be inconsistent, resulting in increasing certainty in the wrong species tree. These studies were all performed with a molecular clock on rooted gene and species trees within the variation realized in stochastic simulations of DNA sequences under the Jukes and Cantor (1969) model of nucleotide substitution. They applied the ML method with a clock to recover phylogenetic trees from their simulated concatenated data sets.

Journal ArticleDOI
TL;DR: A simulation approach is taken to investigate the prevalence of AGTs, among estimated gene trees, thereby characterizing the boundaries of the anomaly zone taking into account both coalescent and mutational variances, and shows that mutational variance can indeed expand the parameter space where AGTs might be observed in empirical data.
Abstract: The increasing number of observations of gene trees with discordant topologies in phylogenetic studies has raised awareness about the problems of incongruence between species trees and gene trees. Moreover, theoretical treatments focusing on the impact of coalescent variance on phylogenetic study have also identified situations where the most probable gene trees are ones that do not match the underlying species tree (i.e., anomalous gene trees [AGTs]). However, although the theoretical proof of the existence of AGTs is alarming, the actual risk that AGTs pose to empirical phylogenetic study is far from clear. Establishing the conditions (i.e., the branch lengths in a species tree) for which AGTs are possible does not address the critical issue of how prevalent they might be. Furthermore, theoretical characterization of the species trees for which AGTs may pose a problem (i.e., the anomaly zone or the species histories for which AGTs are theoretically possible) is based on consideration of just one source of variance that contributes to species tree and gene tree discord-gene lineage coalescence. Yet, empirical data contain another important stochastic component-mutational variance. Estimated gene trees will differ from the underlying gene trees (i.e., the actual genealogy) because of the random process of mutation. Here, we take a simulation approach to investigate the prevalence of AGTs, among estimated gene trees, thereby characterizing the boundaries of the anomaly zone taking into account both coalescent and mutational variances. We also determine the frequency of realized AGTs, which is critical to putting the theoretical work on AGTs into a realistic biological context. Two salient results emerge from this investigation. First, our results show that mutational variance can indeed expand the parameter space (i.e., the relative branch lengths in a species tree) where AGTs might be observed in empirical data. By exploring the underlying cause for the expanded anomaly zone, we identify aspects of empirical data relevant to avoiding the problems that AGTs pose for species tree inference from multilocus data. Second, for the empirical species histories where AGTs are possible, unresolved trees-not AGTs-predominate the pool of estimated gene trees. This result suggests that the risk of AGTs, while they exist in theory, may rarely be realized in practice. By considering the biological realities of both mutational and coalescent variances, the study has refined, and redefined, what the actual challenges are for empirical phylogenetic study of recently diverged taxa that have speciated rapidly-AGTs themselves are unlikely to pose a significant danger to empirical phylogenetic study.

Journal ArticleDOI
TL;DR: 2 methods to infer multilocus phylogenies by incorporating information from topological incongruence of the individual genes are investigated, Bayesian concordance analysis and Bayesian estimation of species trees.
Abstract: Several methods have recently been developed to infer multilocus phylogenies by incorporating information from topological incongruence of the individual genes. In this study, we investigate 2 such methods, Bayesian concordance analysis and Bayesian estimation of species trees. Our test data are a collection of genes from cultivated rice (genus Oryza) and the most closely related wild species, generated using a high-throughput sequencing protocol and bioinformatics pipeline. Trees inferred from independent genes display levels of topological incongruence that far exceed that seen in previous data sets analyzed with these species tree methods. We identify differences in phylogenetic results between inference methods that incorporate gene tree incongruence. Finally, we discuss the challenges of scaling these analyses for data sets with thousands of gene trees and extensive levels of missing data.

Journal ArticleDOI
TL;DR: It is tested by reviewing the current practices for multiple sequence alignment in published phylogenetic analyses and providing suggestions as to why phylogeneticists are apparently dissatisfied with computerized sequence alignment and how to deal with it.
Abstract: In phylogenetic analyses, multiple sequence align ment often seems to be the poor cousin to tree build ing. It has long been recognized that both primary homology assessment (alignment) and secondary ho mology assessment (tree building) can have important effects on phylogenetic analyses (Morrison and Ellis 1997), and this has been repeatedly demonstrated for both empirical and simulated data (see references cited by Morrison 2008; also Wong et al. 2008). However, most theoretical contributions to phylogenetic analysis continue to involve tree building alone. Indeed, even most review articles continue to treat alignment as being about automated bioinformatics procedures (Wallace et al. 2005; Edgar and Batzoglou 2006; Kumar and Filipski 2007; Notredame 2007) rather than about phy logenetics (Morrison 2006; Phillips 2006). This suggests the possibility that there have been no (or few) theoretical contributions to multiple sequence alignment that practitioners considered to be useful in phylogenetics. In other words, most phylogeneticists are prepared to do tree building in a fully computerized manner but not (yet) alignment. Here, I test this hypoth esis by reviewing the current practices for multiple se quence alignment in published phylogenetic analyses, particularly with respect to differences between disci plines in the way that phylogenetic analyses are used. I also provide suggestions as to why phylogeneticists are apparently dissatisfied with computerized sequence alignment and how we might deal with it. My inten tion is to highlight why phylogeneticists are dissatisfied with similarity-based alignment procedures and what specifically they are doing about it in practice. I con clude that there is currently no bioinformatics approach that is acceptable for phylogeneticists and that there is thus a gaping hole that needs to be filled.

Journal ArticleDOI
TL;DR: Variation in the absolute rate of lineage turnover through time, in conjunction with phylogenetically nonrandom extinction, may underlie the apparent diversity-dependent speciation observed in molecular phylogenies.
Abstract: Time-calibrated molecular phylogenies provide a valuable window into the tempo and mode of species diversification, especially for the large number of groups that lack adequate fossil records. Molecular phylogenetic data frequently suggest an initial "explosive speciation" phase, leading to widespread speculation that ecological niche-filling processes might govern the dynamics of species diversification during evolutionary radiations. However, these patterns are difficult to reconcile with the fossil record. The fossil record strongly suggests that extinction rates have been high relative to speciation rates, but such elevated background extinction should erase the signal of early, rapid speciation from molecular phylogenies. For this reason, extinction rates in molecular phylogenies are frequently estimated as zero under the widely used birth-death model. Here, I construct a simple model that combines phylogenetically patterned extinction with pulsed turnover dynamics and constant diversity through time. Using approximate Bayesian methods, I show that heritable extinction can easily explain the phenomenon of explosive early diversification, even when net diversification rates are equal to zero. Several assumptions of the model are more consistent with both the fossil record and neontological data than the standard birth-death model and it may thus represent a viable alternative interpretation of phylogenetic diversification patterns. These results suggest that variation in the absolute rate of lineage turnover through time, in conjunction with phylogenetically nonrandom extinction, may underlie the apparent diversity-dependent speciation observed in molecular phylogenies.

Journal ArticleDOI
TL;DR: Hypotheses of species selection are proposed to explain the apparent long-term stability of these life history traits despite a high frequency of character change in hydrozoan cnidarians.
Abstract: Two fundamental life cycle types are recognized among hydrozoan cnidarians, the benthic (generally colonial) polyp stage either producing pelagic sexual medusae or directly releasing gametes elaborated from an attached gonophore. The existence of intermediate forms, with polyps producing simple medusoids, has been classically considered compelling evidence in favor of phyletic gradualism. In order to gain insights about the evolution of hydrozoan life history traits, we inferred phylogenetic relationships of 142 species of Thecata (= Leptothecata, Leptomedusae), the most species-rich hydrozoan group, using 3 different ribosomal RNA markers (16S, 18S, and 28S). In conflict with morphology-derived classifications, most thecate species fell in 2 well-supported clades named here Statocysta and Macrocolonia. We inferred many independent medusa losses among Statocysta. Several instances of secondary regain of medusoids (but not of full medusa) from medusa-less ancestors were supported among Macrocolonia. Furthermore, life cycle character changes were significantly correlated with changes affecting colony shape. For both traits, changes did not reflect graded and progressive loss or gain of complexity. They were concentrated in recent branches, with intermediate character states being relatively short lived at a large evolutionary scale. This punctuational pattern supports the existence of 2 alternative stable evolutionary strategies: simple stolonal colonies with medusae (the ancestral strategy, seen in most Statocysta species) versus large complex colonies with fixed gonophores (the derived strategy, seen in most Macrocolonia species). Hypotheses of species selection are proposed to explain the apparent long-term stability of these life history traits despite a high frequency of character change. Notably, maintenance of the medusa across geological time in Statocysta might be due to higher extinction rates for species that have lost this dispersive stage.

Journal ArticleDOI
TL;DR: A role for Pliocene/Pleistocene climatic fluctuations in species-level diversification of Brookesia is strongly rejected, and the central role of phylogeny is highlighted in any meaningful tests of species- level diversification theories.
Abstract: Madagascar's flora and fauna are remarkable both for their diversity and supraspecific endemism. Moreover, many taxa contain large numbers of species with limited distributions. Several hypotheses have been proposed to explain this high level of microendemism, including 1) riverine barrier, 2) mountain refuge, and 3) watershed contraction hypotheses, the latter 2 of which center on fragmentation due to climatic shifts associated with Pliocene/Pleistocene glaciations. The Malagasy leaf chameleon genus Brookesia is a speciose group with a high proportion of microendemic taxa, thus making it an excellent candidate to test these vicariance scenarios. We used mitochondrial and nuclear sequence data to construct a Brookesia phylogeny, and temporal concordance with Pliocene/Pleistocene speciation scenarios was tested by estimating divergence dates using a relaxed-clock Bayesian method. We strongly reject a role for Pliocene/Pleistocene climatic fluctuations in species-level diversification of Brookesia. We also used simulations to test the spatial predictions of the watershed contraction model in a phylogenetic context, independent of its temporal component, and found no statistical support for this model. The riverine barrier model is likewise a qualitatively poor fit to our data, but some relationships support a more ancient mountain refuge effect. We assessed support for the 3 hypotheses in a nonphylogenetic context by examining altitude and species richness and found a significant positive correlation between these variables. This is consistent with a mountain refuge effect but does not support the watershed contraction or riverine barrier models. Finally, we find repeated higher level east-west divergence patterns 1) between the 2 sister clades comprising the Brookesia minima group and 2) within the clade of larger leaf chameleons, which shows a basal divergence between western and eastern/northern sister clades. Our results highlight the central role of phylogeny in any meaningful tests of species-level diversification theories.

Journal ArticleDOI
TL;DR: It is shown how interpretation of contradictory gene trees can lead to conflicting inferences of both morphological evolution and biogeographic history, using the example of the pampas grasses, Cortaderia, and is urged to use approaches that take multiple gene trees into account.
Abstract: We explore the potential impact of conflicting gene trees on inferences of evolutionary history above the species level. When conflict between gene trees is discovered, it is common practice either to analyze the data separately or to combine the data having excluded the conflicting taxa or data partitions for those taxa (which are then recoded as missing). We demonstrate an alternative approach, which involves duplicating conflicting taxa in the matrix, such that each duplicate is represented by one partition only. This allows the combination of all available data in standard phylogenetic analyses, despite reticulations. We show how interpretation of contradictory gene trees can lead to conflicting inferences of both morphological evolution and biogeographic history, using the example of the pampas grasses, Cortaderia. The characteristic morphological syndrome of Cortaderia can be inferred as having arisen multiple times (chloroplast DNA [cpDNA]) or just once (nuclear ribosomal DNA [nrDNA]). The distributions of species of Cortaderia and related genera in Australia/New Guinea, New Zealand, and South America can be explained by few (nrDNA) or several (cpDNA) dispersals between the southern continents. These contradictions can be explained by past hybridization events, which have linked gains of complex morphologies with unrelated chloroplast lineages and have erased evidence of dispersals from the nuclear genome. Given the discrepancies between inferences based on the gene trees individually, we urge the use of approaches such as ours that take multiple gene trees into account.

Journal ArticleDOI
TL;DR: The results suggest that most surveyed polyploids originated via hybridization and that 2 taxonomic species formed recurrently from different progenitors, findings that are congruent with the expectations of speciation via secondary contact.
Abstract: Although polyploidy plays a fundamental role in plant evolution, the elucidation of polyploid origins is fraught with methodological challenges. For example, allopolyploid species may confound phylogenetic reconstruction because commonly used methods are designed to trace divergent, rather than reticulate patterns. Recently developed techniques of phylogenetic network estimation allow for a more effective identification of incongruence among trees. However, in- congruence can also be caused by incomplete lineage sorting, paralogy, concerted evolution, and recombination. Thus, initial hypotheses of hybridization need to be examined via additional sources of evidence, including the partitioning of infraspecific genetic polymorphisms, morphological characteristics, chromosome numbers, crossing experiments, and dis- tributional patterns. Primula sect. Aleuritia subsect. Aleuritia (Aleuritia) represents an ideal case study to examine reticulation because specific hypotheses have been derived from morphology, karyology, interfertility, and distribution to explain the observed variation of ploidy levels, ranging from diploidy to 14-ploidy. Sequences from 5 chloroplast and 1 nuclear riboso- mal DNA (nrDNA) markers were analyzed to generate the respective phylogenies and consensus networks. Furthermore, extensive cloning of the nrDNA marker allowed for the identification of shared nucleotides at polymorphic sites, inves- tigation of infraspecific genetic polymorphisms via principal coordinate analyses PCoAs, and detection of recombination between putative progenitor sequences. The results suggest that most surveyed polyploids originated via hybridization and that 2 taxonomic species formed recurrently from different progenitors, findings that are congruent with the expectations of speciation via secondary contact. Overall, the study highlights the importance of using multiple experimental and analyt- ical approaches to disentangle complex patterns of reticulation. (Concerted evolution; consensus network; hybridization; phylogenetic incongruence; Primula; recombination; ribosomal DNA polymorphism; secondary contact model.)