scispace - formally typeset
Search or ask a question
Author

Ying Cao

Other affiliations: Tokyo Institute of Technology
Bio: Ying Cao is an academic researcher from Graduate University for Advanced Studies. The author has contributed to research in topics: Phylogenetic tree & Phylogenetics. The author has an hindex of 24, co-authored 31 publications receiving 2495 citations. Previous affiliations of Ying Cao include Tokyo Institute of Technology.

Papers
More filters
Journal ArticleDOI
TL;DR: The combination of SINE and flanking sequence analysis suggests a topology and set of divergence times for odontocete relationships, offering alternative explanations for several long-standing problems in cetacean evolution.
Abstract: SINE (short interspersed element) insertion analysis elucidates contentious aspects in the phylogeny of toothed whales and dolphins (Odontoceti), especially river dolphins. Here, we characterize 25 informative SINEs inserted into unique genomic loci during evolution of odontocetes to construct a cladogram, and determine a total of 2.8 kb per taxon of the flanking sequences of these SINE loci to estimate divergence times among lineages. We demonstrate that: (i) Odontocetes are monophyletic; (ii) Ganges River dolphins, beaked whales, and ocean dolphins diverged (in this order) after sperm whales; (iii) three other river dolphin taxa, namely the Amazon, La Plata, and Yangtze river dolphins, form a monophyletic group with Yangtze River dolphins being the most basal; and (iv) the rapid radiation of extant cetacean lineages occurred some 28–33 million years B.P., in strong accord with the fossil record. The combination of SINE and flanking sequence analysis suggests a topology and set of divergence times for odontocete relationships, offering alternative explanations for several long-standing problems in cetacean evolution.

228 citations

Journal ArticleDOI
TL;DR: The overall evidence of the maximum likelihood analysis suggests that Rodentia is an outgroup to the other four eutherian orders and that Cetacea and Artiodactyla form a clade with Carnivora as a sister taxon irrespective of the assumed model for amino acid substitutions.
Abstract: The phylogenetic relationships among Primates (human), Artiodactyla (cow), Cetacea (whale), Carnivora (seal), and Rodentia (mouse and rat) were estimated from the inferred amino acid sequences of the mitochondrial genomes using Marsupialia (opossum), Aves (chicken), and Amphibia (Xenopus) as an outgroup. The overall evidence of the maximum likelihood analysis suggests that Rodentia is an outgroup to the other four eutherian orders and that Cetacea and Artiodactyla form a clade with Carnivora as a sister taxon irrespective of the assumed model for amino acid substitutions. Although there remains an uncertainty concerning the relation among Artiodactyla, Cetacea, and Carnivora, the existence of a clade formed by these three orders and the outgroup status of Rodentia to the other eutherian orders seems to be firmly established. However, analyses of individual genes do not necessarily conform to this conclusion, and some of the genes reject the putatively correct tree with nearly 5% significance. Although this discrepancy can be due to convergent or parallel evolution in the specific genes, it was pointed out that, even without a particular reason, such a discrepancy can occur in 5% of the cases if the branching among the orders in question occurred within a short period. Due to uncertainty about the assumed model underlying the phylogenetic inference, this can occur even more frequently. This demonstrates the importance of analyzing enough sequences to avoid the danger of concluding an erroneous tree.

227 citations

Journal ArticleDOI
TL;DR: The results and a site-by-site examination of the sequences clearly suggest that convergent or parallel evolution has occurred in ND1 between primates and rodents and/or between ferungulates and the outgroup.
Abstract: The phylogenetic relationship among primates, ferungulates (artiodactyls + cetaceans + perissodactyls + carnivores), and rodents was examined using proteins encoded by the H strand of mtDNA, with marsupials and monotremes as the outgroup Trees estimated from individual proteins were compared in detail with the tree estimated from all 12 proteins (either concatenated or summing up log-likelihood scores for each gene) Although the overall evidence strongly suggests ((primates, ferungulates), rodents), the ND1 data clearly support another tree, ((primates, rodents), ferungulates) To clarify whether this contradiction is due to (1) a stochastic (sampling) error; (2) minor model-based errors (eg, ignoring site rate variability), or (3) convergent and parallel evolution (specifically between either primates and rodents or ferungulates and the outgroup), the ND1 genes from many additional species of primates, rodents, other eutherian orders, and the outgroup (marsupials + monotremes) were sequenced The phylogenetic analyses were extensive and aimed to eliminate the following artifacts as possible causes of the aberrant result: base composition biases, unequal site substitution rates, or the cumulative effects of both Neither more sophisticated evolutionary analyses nor the addition of species changed the previous conclusion That is, the statistical support for grouping rodents and primates to the exclusion of all other taxa fluctuates upward or downward in quite a tight range centered near 95% confidence These results and a site-by-site examination of the sequences clearly suggest that convergent or parallel evolution has occurred in ND1 between primates and rodents and/or between ferungulates and the outgroup While the primate/rodent grouping is strange, ND1 also throws some interesting light on the relationships of some eutherian orders, marsupials, and montremes In these parts of the tree, ND1 shows no apparent tendency for unexplained convergences

221 citations

Journal ArticleDOI
TL;DR: Congruence arguments to support elephant and armadillo together are striking, suggesting a superordinal group composed of Xenarthra and African endemic mammals, which in turn may be near the root of the placental subtree.
Abstract: We look at the higher-order phylogeny of mammals, analyzing in detail the complete mtDNAsequencesofmorethan40species. Wetestthesupportforseveral proposedsuperordinalre- lationships. Tothis end,we apply anumberof recently programmed methods and approaches, plus better-established methods. New pairwise tests show highly signiecant evidence that amino acid frequencies are changing among nearly all the genomes studied when unvaried sites are ignored. LogDet amino acid distances, with modiecations to take into account invariant sites, are combined with bootstrapping and the Neighbor Joining algorithm to account for these violations of standard models. To weight the more slowly evolving sites, we exclude the more rapidly evolving sites from the data by using "site stripping". This leads to changing optimal trees with nearly all methods. The bootstrap support for many hypotheses varies widely between methods, and few hypotheses can claim unanimous support from these data. Rather, we uncover good evidence that many of the earlier branching patterns in theplacental subtreecould beincorrect, including theplacement ofthe root.ThetRNAgenes,forexample,favor asplitbetween thegrouphedgehog,rodents,andprimates versus all other sequenced placentals. Such a grouping is not ruled out by the amino acid sequence data. A grouping of all rodents plus rabbit, the old Glires hypothesis, is also feasible with stripped amino acid data, and rodent monophyly is also common. The elephant sequence allows conedent rejection of the older taxon Ferungulata (Simpson, 1945). In its place, the new taxa Scrotifera and Fereuungulata are deened. A new likelihood ratio test is used to detect differences between the op- timal tree for tRNA versus that for amino acids. While not clearly signiecant as made, some results indicate the test is tending towards signiecance with more general models of evolution. Individual placement tests suggestalternative positions for hedgehogand elephant. Congruencearguments to support elephant and armadillo together are striking, suggesting a superordinal group composed of Xenarthra and African endemic mammals, which in turn may be near the root of the placental subtree. Thus,while casting doubton somerecentconclusions, theanalyses are also unveiling some interesting new possibilities. (amino acid composition; invariant sites; LogDeterminant distances; mammal phylogeny; mitochondrial DNA genomes; Proboscidea; statistical tests; tRNA.)

173 citations

Journal ArticleDOI
TL;DR: The mtDNA analysis suggests that four lineages exist within the clade of Eschrichtiidae + Balaenopteridae, including a sister relationship between the humpback and fin whales, and a monophyletic group formed by the blue, sei, and Bryde's whales, each of which represents a newly recognized phylogenetic relationship in Mysticeti.
Abstract: The phylogenetic relationships among baleen whales (Order: Cetacea) remain uncertain despite extensive research in cetacean molecular phylogenetics and a potential morphological sample size of over 2 million animals harvested. Questions remain regarding the number of species and the monophyly of genera, as well as higher order relationships. Here, we approach mysticete phylogeny with complete mitochondrial genome sequence analysis. We determined complete mtDNA sequences of 10 extant Mysticeti species, inferred their phylogenetic relationships, and estimated node divergence times. The mtDNA sequence analysis concurs with previous molecular studies in the ordering of the principal branches, with Balaenidae (right whales) as sister to all other mysticetes base, followed by Neobalaenidae (pygmy right whale), Eschrichtiidae (gray whale), and finally Balaenopteridae (rorquals + humpback whale). The mtDNA analysis further suggests that four lineages exist within the clade of Eschrichtiidae + Balaenopteridae, including a sister relationship between the humpback and fin whales, and a monophyletic group formed by the blue, sei, and Bryde's whales, each of which represents a newly recognized phylogenetic relationship in Mysticeti. We also estimated the divergence times of all extant mysticete species, accounting for evolutionary rate heterogeneity among lineages. When the mtDNA divergence estimates are compared with the mysticete fossil record, several lineages have molecular divergence estimates strikingly older than indicated by paleontological data. We suggest this discrepancy reflects both a large amount of ancestral polymorphism and long generation times of ancestral baleen whale populations.

158 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: PAML, currently in version 4, is a package of programs for phylogenetic analyses of DNA and protein sequences using maximum likelihood (ML), which can be used to estimate parameters in models of sequence evolution and to test interesting biological hypotheses.
Abstract: PAML, currently in version 4, is a package of programs for phylogenetic analyses of DNA and protein sequences using maximum likelihood (ML). The programs may be used to compare and test phylogenetic trees, but their main strengths lie in the rich repertoire of evolutionary models implemented, which can be used to estimate parameters in models of sequence evolution and to test interesting biological hypotheses. Uses of the programs include estimation of synonymous and nonsynonymous rates (d(N) and d(S)) between two protein-coding DNA sequences, inference of positive Darwinian selection through phylogenetic comparison of protein-coding genes, reconstruction of ancestral genes and proteins for molecular restoration studies of extinct life forms, combined analysis of heterogeneous data sets from multiple gene loci, and estimation of species divergence times incorporating uncertainties in fossil calibrations. This note discusses some of the major applications of the package, which includes example data sets to demonstrate their use. The package is written in ANSI C, and runs under Windows, Mac OSX, and UNIX systems. It is available at -- (http://abacus.gene.ucl.ac.uk/software/paml.html).

10,773 citations

Journal ArticleDOI
TL;DR: A modification of the KH test to take into account a multiplicity of testings is presented, which shows how the test was designed for comparing two topologies but is often used for comparing many topologies.
Abstract: The maximum-likelihood method for inferring mo-lecular phylogeny (Felsenstein 1981) is being widelyused. The probabilistic model for generating the molec-ular sequences is specified by the substitution processand the tree topology. The parameters for the substitu-tion process and the branch lengths are estimated bymaximizing the likelihood, and then the tree topology isestimated by maximizing the maximized likelihood. Toobtain the confidence limit of the topology, the test ofKishino and Hasegawa (1989), referred to as the KHtest, is often used in practice. The same idea that is thebasis for the KH test is also found in the statistical lit-erature (Linhart 1988; Vuong 1989). The KH test wasdesigned for comparing two topologies but is often usedfor comparing many topologies. This use of the KH testleads to overconfidence for a wrong tree, because thesampling error due to the selection of the topology isoverlooked in it. In this note, we present a modificationof the KH test to take into account a multiplicity oftestings.Let a index the topologies and L

4,049 citations

Journal ArticleDOI
TL;DR: It is argued that the most commonly implemented model selection approach, the hierarchical likelihood ratio test, is not the optimal strategy for model selection in phylogenetics, and that approaches like the Akaike Information Criterion (AIC) and Bayesian methods offer important advantages.
Abstract: Model selection is a topic of special relevance in molecular phylogenetics that affects many, if not all, stages of phylogenetic inference. Here we discuss some fundamental concepts and techniques of model selection in the context of phylogenetics. We start by reviewing different aspects of the selection of substitution models in phylogenetics from a theoretical, philosophical and practical point of view, and summarize this comparison in table format. We argue that the most commonly implemented model selection approach, the hierarchical likelihood ratio test, is not the optimal strategy for model selection in phylogenetics, and that approaches like the Akaike Information Criterion (AIC) and Bayesian methods offer important advantages. In particular, the latter two methods are able to simultaneously compare multiple nested or nonnested models, assess model selection uncertainty, and allow for the estimation of phylogenies and model parameters using all available models (model-averaged inference or multimodel inference). We also describe how the relative importance of the different parameters included in substitution models can be depicted. To illustrate some of these points, we have applied AIC-based model averaging to 37 mitochondrial DNA sequences from the subgenus Ohomopterus (genus Carabus) ground beetles described by Sota and Vogler (2001). (AIC; Bayes factors; BIC; likelihood ratio tests; model averaging; model uncertainty; model selection; multimodel inference.) It is clear that models of nucleotide substitution (henceforth models of evolution) play a significant role in molecular phylogenetics, particularly in the context of distance, maximum likelihood (ML), and Bayesian es- timation. We know that the use of one or other model affects many, if not all, stages of phylogenetic inference. For example, estimates of phylogeny, substitution rates, bootstrap values, posterior probabilities, or tests of the molecular clock are clearly influenced by the model of evolution used in the analysis (Buckley, 2002; Buckley

3,712 citations

Journal ArticleDOI
TL;DR: This work has built a tool for the selection of the best-fit model of evolution, among a set of candidate models, for a given protein sequence alignment in order to study protein evolution and phylogenetic inference.
Abstract: Summary: Using an appropriate model of amino acid replacement is very important for the study of protein evolution and phylogenetic inference. We have built a tool for the selection of the best-fit model of evolution, among a set of candidate models, for a given protein sequence alignment. Availability: ProtTest is available under the GNU license from http://darwin.uvigo.es Contact: fabascal@uvigo.es

3,150 citations

Journal ArticleDOI
TL;DR: The Dual Organellar GenoMe Annotator (DOGMA) automates the annotation of organellar genomes and allows the use of BLAST searches against a custom database, and conservation of basepairing in the secondary structure of animal mitochondrial tRNAs to identify and annotate genes.
Abstract: Summary: The Dual Organellar GenoMe Annotator (DOGMA) automates the annotation of organellar (plant chloroplast and animal mitochondrial) genomes. It is a Web-based package that allows the use of BLAST searches against a custom database, and conservation of basepairing in the secondary structure of animal mitochondrial tRNAs to identify and annotate genes. DOGMA provides a graphical user interface for viewing and editing annotations. Annotations are stored on our password-protected server to enable repeated sessions of working on the same genome. Finished annotations can be extracted for direct submission to GenBank. Availability: http://phylocluster.biosci.utexas.edu/dogma/ Supplementary information: Detailed documentation and tutorials for annotating both animal mitochondrial and plant chloroplast genomes can be found on the DOGMA home page.

2,754 citations