scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Is it better to add taxa or characters to a difficult phylogenetic problem

01 Mar 1998-Systematic Biology (Oxford University Press)-Vol. 47, Iss: 1, pp 9-17
TL;DR: The effects on phylogenetic accuracy of adding characters and/or taxa were explored using data generated by computer simulation using a four-taxon tree representing a difficult phylogenetic problem with an extreme situation of long branch attraction.
Abstract: The effects on phylogenetic accuracy of adding characters and/or taxa were explored using data generated by computer simulation. The conditions of this study were constrained but allowed for systematic investigation of certain parameters. The starting point for the study was a four-taxon tree in the ``Felsenstein zone,'' representing a dif® cult phylogenetic problem with an extreme situation of long branch attraction. Taxa were added sequentially to this tree in a manner speci® cally designed to break up the long branches, and for each tree data matrices of different sizes were simulated. Phylogenetic trees were reconstructed from these data using the criteria of parsimony and maximum likelihood. Phylogenetic accuracy was measured in three ways: (1) proportion of trees that are completely correct, (2) proportion of correctly reconstructed branches in all trees, and (3) proportion of trees in which the original four-taxon statement is correctly reconstructed. Accuracy improved dramatically with the addition of taxa and much more slowly with the addition of characters. If taxa can be added to break up long branches, it is much more preferable to add taxa than characters. (Long branch attraction; parsimony; phylogenetic recon- struction; simulation; taxon sampling.)
Citations
More filters
Journal ArticleDOI
23 Oct 2003-Nature
TL;DR: The results suggest that data sets consisting of single or a small number of concatenated genes have a significant probability of supporting conflicting topologies, and have important implications for resolving branches of the tree of life.
Abstract: One of the most pervasive challenges in molecular phylogenetics is the incongruence between phylogenies obtained using different data sets, such as individual genes. To systematically investigate the degree of incongruence, and potential methods for resolving it, we screened the genome sequences of eight yeast species and selected 106 widely distributed orthologous genes for phylogenetic analyses, singly and by concatenation. Our results suggest that data sets consisting of single or a small number of concatenated genes have a significant probability of supporting conflicting topologies. By contrast, analyses of the entire data set of concatenated genes yielded a single, fully resolved species tree with maximum support. Comparable results were obtained with a concatenation of a minimum of 20 genes; substantially more genes than commonly used but a small fraction of any genome. These results have important implications for resolving branches of the tree of life.

1,490 citations

Journal ArticleDOI
TL;DR: A new large-scale phylogeny of squamate reptiles is presented that includes new, resurrected, and modified subfamilies within gymnophthalmid and scincid lizards, and boid, colubrid, and lamprophiid snakes.
Abstract: The extant squamates (>9400 known species of lizards and snakes) are one of the most diverse and conspicuous radiations of terrestrial vertebrates, but no studies have attempted to reconstruct a phylogeny for the group with large-scale taxon sampling. Such an estimate is invaluable for comparative evolutionary studies, and to address their classification. Here, we present the first large-scale phylogenetic estimate for Squamata. The estimated phylogeny contains 4161 species, representing all currently recognized families and subfamilies. The analysis is based on up to 12896 base pairs of sequence data per species (average = 2497 bp) from 12 genes, including seven nuclear loci (BDNF, c-mos, NT3, PDC, R35, RAG-1, and RAG-2), and five mitochondrial genes (12S, 16S, cytochrome b, ND2, and ND4). The tree provides important confirmation for recent estimates of higher-level squamate phylogeny based on molecular data (but with more limited taxon sampling), estimates that are very different from previous morphology-based hypotheses. The tree also includes many relationships that differ from previous molecular estimates and many that differ from traditional taxonomy. We present a new large-scale phylogeny of squamate reptiles that should be a valuable resource for future comparative studies. We also present a revised classification of squamates at the family and subfamily level to bring the taxonomy more in line with the new phylogenetic hypothesis. This classification includes new, resurrected, and modified subfamilies within gymnophthalmid and scincid lizards, and boid, colubrid, and lamprophiid snakes.

1,381 citations


Cites background from "Is it better to add taxa or charact..."

  • ...In addition, limited taxon sampling is potentially a serious issue for phylogenetic accuracy [25-28]....

    [...]

  • ...In the best-case scenario, these conflicts may be resolved because our results are correct, possibly reflecting the beneficial effects of adding taxa and the associated subdivision of long branches [25,26,28,87, 192-194]....

    [...]

Journal ArticleDOI
TL;DR: A phylogenetic analysis of a combined data set for 560 angiosperms and seven outgroups based on three genes, 18S rDNA, rbcL, and atpB representing a total of 4733 bp is presented, resulting in the most highly resolved and strongly supported topology yet obtained for angiosPerms.

1,288 citations

Journal ArticleDOI
TL;DR: This work has demonstrated the power of the phylogenomics approach, which has the potential to provide answers to several fundamental evolutionary questions, but challenges for the future have also been revealed.
Abstract: As more complete genomes are sequenced, phylogenetic analysis is entering a new era — that of phylogenomics. One branch of this expanding field aims to reconstruct the evolutionary history of organisms on the basis of the analysis of their genomes. Recent studies have demonstrated the power of this approach, which has the potential to provide answers to several fundamental evolutionary questions. However, challenges for the future have also been revealed. The very nature of the evolutionary history of organisms and the limitations of current phylogenetic reconstruction methods mean that part of the tree of life might prove difficult, if not impossible, to resolve with confidence.

1,165 citations

Journal ArticleDOI
22 Oct 2015-Nature
TL;DR: The results of the divergence time analyses are congruent with the palaeontological record, supporting a major radiation of crown birds in the wake of the Cretaceous–Palaeogene (K–Pg) mass extinction.
Abstract: Although reconstruction of the phylogeny of living birds has progressed tremendously in the last decade, the evolutionary history of Neoaves--a clade that encompasses nearly all living bird species--remains the greatest unresolved challenge in dinosaur systematics. Here we investigate avian phylogeny with an unprecedented scale of data: >390,000 bases of genomic sequence data from each of 198 species of living birds, representing all major avian lineages, and two crocodilian outgroups. Sequence data were collected using anchored hybrid enrichment, yielding 259 nuclear loci with an average length of 1,523 bases for a total data set of over 7.8 × 10(7) bases. Bayesian and maximum likelihood analyses yielded highly supported and nearly identical phylogenetic trees for all major avian lineages. Five major clades form successive sister groups to the rest of Neoaves: (1) a clade including nightjars, other caprimulgiforms, swifts, and hummingbirds; (2) a clade uniting cuckoos, bustards, and turacos with pigeons, mesites, and sandgrouse; (3) cranes and their relatives; (4) a comprehensive waterbird clade, including all diving, wading, and shorebirds; and (5) a comprehensive landbird clade with the enigmatic hoatzin (Opisthocomus hoazin) as the sister group to the rest. Neither of the two main, recently proposed Neoavian clades--Columbea and Passerea--were supported as monophyletic. The results of our divergence time analyses are congruent with the palaeontological record, supporting a major radiation of crown birds in the wake of the Cretaceous-Palaeogene (K-Pg) mass extinction.

1,094 citations

References
More filters
Journal ArticleDOI
TL;DR: Parsimony or minimum evolution methods were first introduced into phylogenetic inference by Camin and Sokal (1965), and a number of other parsimony methods have since appeared in the systematic literature and found widespread use in studies of molecular evolution.
Abstract: Felsenstein, J. (Department of Genetics, University of Washington, Seattle, WA 98195) 1978. Cases in which parsimony or compatibility methods will be positively misleading. Syst. Zool. 27:401-410.-For some simple threeand four-species cases involving a character with two states, it is determined under what conditions several methods of phylogenetic inference will fail to converge to the true phylogeny as mo,re and more data are accumulated. The methods are the Camin-Sokal parsimony method, the compatibility method, and Farris's unrooted Wagner tree parsimony method. In all cases the conditions for this failure (which is the failure to be statistically consistent) are essentially that parallel changes exceed informative, nonparallel changes. It is possible for these methods to be inconsistent even when change is improbable a priori, provided that evolutionary rates in different lineages are sufficiently unequal. It is by extension of this approach that we may provide a sound methodology for evaluating methods of phylogenetic inference. [Numerical cladistics; phylogenetic inference; maximum likelihood estimation; parsimony; compatibility.] Parsimony or minimum evolution methods were first introduced into phylogenetic inference by Camin and Sokal (1965). This class of methods for inferring an evolutionary tree from discrete-character data involves making a reconstruction of the changes in a given set of characters on a given tree, counting the smallest number of times that a given kind of event need have happened, and using this as the measure of the adequacy of the evolutionary tree. (Alternatively, one can compute the weighted sum of the numbers of times several different kinds of events have occurred.) One attempts to find that evolutionary tree which requires the fewest of these evolutionary events to explain the observed data. Camin and Sokal treated the case of irreversible changes along a character state tree, minimizing the number of changes I This report was prepared as an account of work sponsored by the United States Government. Neither the United States nor the United States Department of Energy, nor any of their employees, nor any of their contractors, subcontractors, or their employees, makes any warranty, express or implied, or assumes any legal liability or responsibility for the accuracy, completeness or usefulness of any information, apparatus, product or process disclosed, or represents that its use would not infringe privatelyowned rights. of character states required. A number of other parsimony methods have since appeared in the systematic literature (Kluge and Farris, 1969; Farris, 1969, 1970, 1972, 1977; Farris, Kluge, and Eckhardt, 1970) and parsimony methods have also found widespread use in studies of molecular evolution (Fitch and Margoliash, 1967, 1970; Dayhoff and Eck, 1968; see also Fitch, 1973). Cavalli-Sforza and Edwards (1967; Edwards and Cavalli-Sforza, 1964) earlier formulated a minimum evolution method for continuous-character data. An alternative methodology for phylogenetic inference is the compatibility method, introduced by Le Quesne (1969, 1972). He suggested that phylogenetic inference be based on finding the largest possible set of characters -which could simultaneously have all states be uniquely derived -on the same ftree. The estimate of the phylogeny is then takento be that tree.-While Le Quesne's specific suggestions as to how this might be done have been criticized by Farris (1969), his general approach, which is based on Camin and Sokal's (1965) concept of the compatibility of two characters, has been made rigorous and extended in a series of papers by G. F. Estabrook, C. S. Johnson, Jr., and F. R. McMorris (Estabrook,

3,220 citations

Journal ArticleDOI
TL;DR: Two exploratory parsimony analyses of DNA sequences from 475 and 499 species of seed plants, respectively, representing all major taxonomic groups indicate that rbcL sequence variation contains historical evidence appropriate for phylogenetic analysis at this taxonomic level of sampling.
Abstract: We present the results of two exploratory parsimony analyses of DNA sequences from 475 and 499 species of seed plants, respectively, representing all major taxonomic groups. The data are exclusively from the chloroplast gene rbcL, which codes for the large subunit of ribulose-1,5-bisphosphate carboxylase/oxygenase (RuBisCO or RuBPCase). We used two different state-transformation assumptions resulting in two sets of cladograms: (i) equal-weighting for the 499-taxon analysis; and (ii) a procedure that differentially weights transversions over transitions within characters and codon positions among characters for the 475-taxon analysis. The degree of congruence between these results and other molecular, as well as morphological, cladistic studies indicates that rbcL sequence variation contains historical evidence appropriate for phylogenetic analysis at this taxonomic level of sampling. Because the topologies presented are necessarily approximate and cannot be evaluated adequately for internal support, these results should be assessed from the perspective of their predictive value and used to direct future studies, both molecular and morphological. In both analyses, the three genera of Gnetales are placed together as the sister group of the flowering plants, and the anomalous aquatic Ceratophyllum (Ceratophyllaceae) is sister to all other flowering plants. Several major lineages identified correspond well with at least some recent taxonomic schemes for angiosperms, particularly those of Dahlgren and Thorne. The basalmost clades within the angiosperms are orders of the apparently polyphyletic subclass Magnoliidae sensu Cronquist. The most conspicuous feature of the topology is that the major division is not monocot versus dicot, but rather one correlated with general pollen type: uniaperturate versus triaperturate. The Dilleniidae and Hamamelidae are the only subclasses that are grossly polyphyletic; an examination of the latter is presented as an example of the use of these broad analyses to focus more restricted studies. A broadly circumscribed Rosidae is paraphyletic to Asteridae and Dilleniidae. Subclass Caryophyllidae is monophyletic and derived from within Rosidae in the 475-taxon analysis but is sister to a group composed of broadly delineated Asteridae and Rosidae in the 499-taxon study.

1,976 citations

Journal ArticleDOI
TL;DR: The overall conclusions from this study are that irregular A,C,G,T compositions are an important and possible general cause of patterns that can mislead tree-reconstruction methods, even when high bootstrap values are obtained.
Abstract: We report a new transformation, the LogDet, that is consistent for sequences with differing nucleotide composition and that have arisen under simple but asymmetric stochastic models of evolution. This transformation is required because existing methods tend to group sequences on the basis of their nucleotide composition, irrespective of their evolutionary history. This effect of differing nucleotide frequencies is illustrated by using a tree-selection criterion on a simple distance measure defined solely on the basis of base composition, independent of the actual sequences. The new LogDet transformation uses determinants of the observed divergence matrices and works because multiplication of determinants (real numbers) is commutative, whereas multiplication of matrices is not,except in special symmetric cases. The use of determinants thus allows more general models of evolution with a symmetric rates of nucleotide change. The transformation is illustrated on a theoretical data set (where existing methods select the wrong tree) and with three biological data sets: chloroplasts, birds/mammals (nuclear), and honeybees ( mitochondrial ) . The LogDet transformation reinforces the logical distinction between transformations on the data and tree-selection criteria. The overall conclusions from this study are that irregular A,C,G,T compositions are an important and possible general cause of patterns that can mislead tree-reconstruction methods, even when high bootstrap values are obtained. Consequently, many published studies may need to be reexamined.

980 citations


"Is it better to add taxa or charact..." refers methods in this paper

  • ...For example, workers have deter? mined that long-branch attraction can mis? lead phylogenetic reconstruction methods, as can base composition bias or other forms of nonrandom convergence in the data (Felsenstein, 1978; Huelsenbeck and Hillis, 1993; Lockhart et al., 1994; Swofford et al., 1996)....

    [...]

Journal ArticleDOI
TL;DR: The importance of the critical fossils seems to reside in their relative primitive‐ness, and the simplest explanation for their more conservative nature is that they have had less time to evolve.

974 citations

Journal ArticleDOI
TL;DR: Parsimony and compatibility had particular difficulty with inaccuracy and bias when substitution rates varied among different branches, and maximum likelihood was the most successful method overall, although for short sequences Fitch-Margoliash and neighbor joining were sometimes better.
Abstract: Using simulated data, we compared five methods of phylogenetic tree estimation: parsimony, compatibility, maximum likelihood, Fitch-Margoliash, and neighbor joining. For each combination of substitution rates and sequence length, 100 data sets were generated for each of 50 trees, for a total of 5,000 replications per condition. Accuracy was measured by two measures of the distance between the true tree and the estimate of the tree, one measure sensitive to accuracy of branch lengths and the other not. The distance-matrix methods (Fitch-Margoliash and neighbor joining) performed best when they were constrained from estimating negative branch lengths; all comparisons with other methods used this constraint. Parsimony and compatibility had similar results, with compatibility generally inferior; Fitch-Margoliash and neighbor joining had similar results, with neighbor joining generally slightly inferior. Maximum likelihood was the most successful method overall, although for short sequences Fitch-Margoliash and neighbor joining were sometimes better. Bias of the estimates was inferred by measuring whether the independent estimates of a tree for different data sets were closer to the true tree than to each other. Parsimony and compatibility had particular difficulty with inaccuracy and bias when substitution rates varied among different branches. When rates of evolution varied among different sites, all methods showed signs of inaccuracy and bias.

875 citations