scispace - formally typeset
Search or ask a question

Showing papers in "Molecular Biology and Evolution in 2001"


Journal ArticleDOI
TL;DR: The new model outperforms the Dayhoff and JTT models with respect to maximum-likelihood values for a large majority of the protein families in the authors' database and suggests that it provides a better overall fit to the evolutionary process in globular proteins and may lead to more accurate phylogenetic tree estimates.
Abstract: Phylogenetic inference from amino acid sequence data uses mainly empirical models of amino acid replacement and is therefore dependent on those models. Two of the more widely used models, the Dayhoff and JTT models, are estimated using similar methods that can utilize large numbers of sequences from many unrelated protein families but are somewhat unsatisfactory because they rely on assumptions that may lead to systematic error and discard a large amount of the information within the sequences. The alternative method of maximum-likelihood estimation may utilize the information in the sequence data more efficiently and suffers from no systematic error, but it has previously been applicable to relatively few sequences related by a single phylogenetic tree. Here, we combine the best attributes of these two methods using an approximate maximum-likelihood method. We implemented this approach to estimate a new model of amino acid replacement from a database of globular protein sequences comprising 3,905 amino acid sequences split into 182 protein families. While the new model has an overall structure similar to those of other commonly used models, there are significant differences. The new model outperforms the Dayhoff and JTT models with respect to maximum-likelihood values for a large majority of the protein families in our database. This suggests that it provides a better overall fit to the evolutionary process in globular proteins and may lead to more accurate phylogenetic tree estimates. Potentially, this matrix, and the methods used to generate it, may also be useful in other areas of research, such as biological sequence database searching, sequence alignment, and protein structure prediction, for which an accurate description of amino acid replacement is required.

2,647 citations


Journal ArticleDOI
TL;DR: A reversible jump Markov chain Monte Carlo approach to estimating the posterior distribution of phylogenies based on aligned DNA/RNA sequences under several hierarchical evolutionary models is developed and found that the Kimura model is too restrictive, and the Hasegawa, Kishino, and Yano model can be rejected for some data sets.
Abstract: We develop a reversible jump Markov chain Monte Carlo approach to estimating the posterior distribution of phylogenies based on aligned DNA/RNA sequences under several hierarchical evolutionary models. Using a proper, yet nontruncated and uninformative prior, we demonstrate the advantages of the Bayesian approach to hypothesis testing and estimation in phylogenetics by comparing different models for the infinitesimal rates of change among nucleotides, for the number of rate classes, and for the relationships among branch lengths. We compare the relative probabilities of these models and the appropriateness of a molecular clock using Bayes factors. Our most general model, first proposed by Tamura and Nei, parameterizes the infinitesimal change probabilities among nucleotides (A, G, C, T/U) into six parameters, consisting of three parameters for the nucleotide stationary distribution, two rate parameters for nucleotide transitions, and another parameter for nucleotide transversions. Nested models include the Hasegawa, Kishino, and Yano model with equal transition rates and the Kimura model with a uniform stationary distribution and equal transition rates. To illustrate our methods, we examine simulated data, 16S rRNA sequences from 15 contemporary eubacteria, halobacteria, eocytes, and eukaryotes, 9 primates, and the entire HIV genome of 11 isolates. We find that the Kimura model is too restrictive, that the Hasegawa, Kishino, and Yano model can be rejected for some data sets, that there is evidence for more than one rate class and a molecular clock among similar taxa, and that a molecular clock can be rejected for more distantly related taxa.

791 citations


Journal ArticleDOI
TL;DR: Computer simulation is used to investigate the accuracy and power of the likelihood ratio test (LRT) in detecting positive selection at amino acid sites and it is found that use of the chi(2) distribution makes the test conservative, especially when the data contain very short and highly similar sequences.
Abstract: The selective pressure at the protein level is usually measured by the nonsynonymous/synonymous rate ratio (omega = dN/dS), with omega 1 indicating purifying (or negative) selection, neutral evolution, and diversifying (or positive) selection, respectively. The omega ratio is commonly calculated as an average over sites. As every functional protein has some amino acid sites under selective constraints, averaging rates across sites leads to low power to detect positive selection. Recently developed models of codon substitution allow the omega ratio to vary among sites and appear to be powerful in detecting positive selection in empirical data analysis. In this study, we used computer simulation to investigate the accuracy and power of the likelihood ratio test (LRT) in detecting positive selection at amino acid sites. The test compares two nested models: one that allows for sites under positive selection (with omega > 1), and another that does not, with the chi2 distribution used for significance testing. We found that use of the chi(2) distribution makes the test conservative, especially when the data contain very short and highly similar sequences. Nevertheless, the LRT is powerful. Although the power can be low with only 5 or 6 sequences in the data, it was nearly 100% in data sets of 17 sequences. Sequence length, sequence divergence, and the strength of positive selection also were found to affect the power of the LRT. The exact distribution assumed for the omega ratio over sites was found not to affect the effectiveness of the LRT.

671 citations


Journal ArticleDOI
TL;DR: Simulation results demonstrate that the accuracy of divergence time estimation is substantially enhanced when constraints are included and new parameterization more effectively captures the phylogenetic structure of rate evolution on a tree.
Abstract: Rates of molecular evolution vary over time and, hence, among lineages. In contrast, widely used methods for estimating divergence times from molecular sequence data assume constancy of rates. Therefore, methods for estimation of divergence times that incorporate rate variation are attractive. Improvements on a previously proposed Bayesian technique for divergence time estimation are described. New parameterization more effectively captures the phylogenetic structure of rate evolution on a tree. Fossil information and other evidence can now be included in Bayesian analyses in the form of constraints on divergence times. Simulation results demonstrate that the accuracy of divergence time estimation is substantially enhanced when constraints are included.

617 citations


Journal ArticleDOI
TL;DR: The structure and evolution of the protein interaction network of the yeast Saccharomyces cerevisiae is analyzed and it is shown that the persistence of redundant interaction partners is the exception rather than the rule.
Abstract: In this paper, the structure and evolution of the protein interaction network of the yeast Saccharomyces cerevisiae is analyzed. The network is viewed as a graph whose nodes correspond to proteins. Two proteins are connected by an edge if they interact. The network resembles a random graph in that it consists of many small subnets (groups of proteins that interact with each other but do not interact with any other protein) and one large connected subnet comprising more than half of all interacting proteins. The number of interactions per protein appears to follow a power law distribution. Within approximately 200 Myr after a duplication, the products of duplicate genes become almost equally likely to (1) have common protein interaction partners and (2) be part of the same subnetwork as two proteins chosen at random from within the network. This indicates that the persistence of redundant interaction partners is the exception rather than the rule. After gene duplication, the likelihood that an interaction gets lost exceeds 2.2 x 10(-3)/Myr. New interactions are estimated to evolve at a rate that is approximately three orders of magnitude smaller. Every 300 Myr, as many as half of all interactions may be replaced by new interactions.

535 citations


Journal ArticleDOI
TL;DR: Analysis of complete genome coding DNA sequences of Drosophila, C. elegans, and yeast indicated that expansions of codon repeat corresponding to small hydrophilic amino acids are tolerated more, while strong selection pressures probably eliminate codon repeats encoding hydrophobic and basic amino acids.
Abstract: Complete chromosome/genome sequences available from humans, Drosophila melanogaster, Caenorhabditis elegans, Arabidopsis thaliana, and Saccharomyces cerevisiae were analyzed for the occurrence of mono-, di-, tri-, and tetranucleotide repeats. In all of the genomes studied, dinucleotide repeat stretches tended to be longer than other repeats. Additionally, tetranucleotide repeats in humans and trinucleotide repeats in Drosophila also seemed to be longer. Although the trends for different repeats are similar between different chromosomes within a genome, the density of repeats may vary between different chromosomes of the same species. The abundance or rarity of various di- and trinucleotide repeats in different genomes cannot be explained by nucleotide composition of a sequence or potential of repeated motifs to form alternative DNA structures. This suggests that in addition to nucleotide composition of repeat motifs, characteristic DNA replication/repair/recombination machinery might play an important role in the genesis of repeats. Moreover, analysis of complete genome coding DNA sequences of Drosophila, C. elegans, and yeast indicated that expansions of codon repeats corresponding to small hydrophilic amino acids are tolerated more, while strong selection pressures probably eliminate codon repeats encoding hydrophobic and basic amino acids. The locations and sequences of all of the repeat loci detected in genome sequences and coding DNA sequences are available at http://www.ncl-india.org/ssr and could be useful for further studies.

522 citations


Journal ArticleDOI
TL;DR: Network integration and multitasking using trans-acting RNA molecules produced in parallel with protein-coding sequences may underpin both the evolution of developmentally sophisticated multicellular organisms and the rapid expansion of phenotypic complexity into uncontested environments such as those initiated in the Cambrian radiation and those seen after major extinction events.
Abstract: Eukaryotic phenotypic diversity arises from multitasking of a core proteome of limited size. Multitasking is routine in computers, as well as in other sophisticated information systems, and requires multiple inputs and outputs to control and integrate network activity. Higher eukaryotes have a mosaic gene structure with a dual output, mRNA (protein-coding) sequences and introns, which are released from the pre-mRNA by posttranscriptional processing. Introns have been enormously successful as a class of sequences and comprise up to 95% of the primary transcripts of protein-coding genes in mammals. In addition, many other transcripts (perhaps more than half) do not encode proteins at all, but appear both to be developmentally regulated and to have genetic function. We suggest that these RNAs (eRNAs) have evolved to function as endogenous network control molecules which enable direct gene-gene communication and multitasking of eukaryotic genomes. Analysis of a range of complex genetic phenomena in which RNA is involved or implicated, including co-suppression, transgene silencing, RNA interference, imprinting, methylation, and transvection, suggests that a higher-order regulatory system based on RNA signals operates in the higher eukaryotes and involves chromatin remodeling as well as other RNA-DNA, RNA-RNA, and RNA-protein interactions. The evolution of densely connected gene networks would be expected to result in a relatively stable core proteome due to the multiple reuse of components, implying that cellular differentiation and phenotypic variation in the higher eukaryotes results primarily from variation in the control architecture. Thus, network integration and multitasking using trans-acting RNA molecules produced in parallel with protein-coding sequences may underpin both the evolution of developmentally sophisticated multicellular organisms and the rapid expansion of phenotypic complexity into uncontested environments such as those initiated in the Cambrian radiation and those seen after major extinction events.

459 citations


Journal ArticleDOI
TL;DR: Results address several outstanding issues and indicate that apicomplexan and dinoflagellate plastids appear to be the result of a single endosymbiotic event which occurred relatively early in eukaryotic evolution, also giving rise to the plastid-targeted GAPDH genes of heterokonts and cryptomonads.
Abstract: The phylum Apicomplexa encompasses a large number of intracellular protozoan parasites, including the causative agents of malaria (Plasmodium), toxoplasmosis (Toxoplasma), and many other human and animal diseases. Apicomplexa have recently been found to contain a relic, nonphotosynthetic plastid that has attracted considerable interest as a possible target for therapeutics. This plastid is known to have been acquired by secondary endosymbiosis, but when this occurred and from which type of alga it was acquired remain uncertain. Based on the molecular phylogeny of glyceraldehyde-3-phosphate dehydrogenase (GAPDH) genes, we provide evidence that the apicomplexan plastid is homologous to plastids found in dinoflagellates-close relatives of apicomplexa that contain secondary plastids of red algal origin. Surprisingly, apicomplexan and dinoflagellate plastid-targeted GAPDH sequences were also found to be closely related to the plastid-targeted GAPDH genes of heterokonts and cryptomonads, two other groups that contain secondary plastids of red algal origin. These results address several outstanding issues: (1) apicomplexan and dinoflagellate plastids appear to be the result of a single endosymbiotic event which occurred relatively early in eukaryotic evolution, also giving rise to the plastids of heterokonts and perhaps cryptomonads; (2) apicomplexan plastids are derived from a red algal ancestor; and (3) the ancestral state of apicomplexan parasites was photosynthetic.

413 citations


Journal ArticleDOI
TL;DR: It is concluded that purposeful higher-density taxonomic sampling, subsequent sequencing efforts, and phylogenetic analyses of their mitogenomes may be decisive in resolving persistent controversies over higher-level relationships of teleosts, the most diversified group of all vertebrates, comprising over 23,500 extant species.
Abstract: Although adequate resolution of higher-level relationships of organisms apparently requires longer DNA sequences than those currently being analyzed, limitations of time and resources present difficulties in obtaining such sequences from many taxa. For fishes, these difficulties have been overcome by the development of a PCR-based approach for sequencing the complete mitochondrial genome (mitogenome), which employs a long PCR technique and many fish-versatile PCR primers. In addition, recent studies have demonstrated that such mitogenomic data are useful and decisive in resolving persistent controversies over higher-level relationships of teleosts. As a first step toward resolution of higher teleostean relationships, which have been described as the "(unresolved) bush at the top of the tree," we investigated relationships using mitogenomic data from 48 purposefully chosen teleosts, of which those from 38 were newly determined during the present study (a total of 632,315 bp), using the above method. Maximum-parsimony and maximum-likelihood analyses were conducted with the data set that comprised concatenated nucleotide sequences from 12 protein-coding genes (excluding the ND6 gene and third codon positions) and 22 transfer RNA (tRNA) genes (stem regions only) from the 48 species. The resultant two trees from the two methods were well resolved and largely congruent, with many internal branches supported by high statistical values. The tree topologies themselves, however, exhibited considerable variation from the previous morphology-based cladistic hypotheses, with most of the latter being confidently rejected by the mitogenomic data. Such incongruence resulted largely from the phylogenetic positions or limits of long-standing problematic taxa, which were quite unexpected from previous morphological and molecular analyses. We concluded that the present study provided a basis of and guidelines for future investigations of teleostean evolutionary mitogenomics and that purposeful higher-density taxonomic sampling, subsequent sequencing efforts, and phylogenetic analyses of their mitogenomes may be decisive in resolving persistent controversies over higher-level relationships of teleosts, the most diversified group of all vertebrates, comprising over 23,500 extant species.

379 citations


Journal ArticleDOI
TL;DR: A nested cladistic analysis (NCA) demonstrated that both population structure processes (recurrent gene flow restricted by isolation by distance and long-distance dispersals) and population history events were instrumental in explaining this tripartite division of global NRY diversity.
Abstract: We examined 43 biallelic polymorphisms on the nonrecombining portion of the Y chromosome (NRY) in 50 human populations encompassing a total of 2,858 males to study the geographic structure of Y-chromosome variation. Patterns of NRY diversity varied according to geographic region and method/level of comparison. For example, populations from Central Asia had the highest levels of heterozygosity, while African populations exhibited a higher level of mean pairwise differences among haplotypes. At the global level, 36% of the total variance of NRY haplotypes was attributable to differences among populations (i.e., Phi(ST) = 0.36). When a series of AMOVA analyses was performed on different groupings of the 50 populations, high levels of among-groups variance (Phi(CT)) were found between Africans, Native Americans, and a single group containing all 36 remaining populations. The same three population groupings formed distinct clusters in multidimensional scaling plots. A nested cladistic analysis (NCA) demonstrated that both population structure processes (recurrent gene flow restricted by isolation by distance and long-distance dispersals) and population history events (contiguous range expansions and long-distance colonizations) were instrumental in explaining this tripartite division of global NRY diversity. As in our previous analyses of smaller NRY data sets, the NCA detected a global contiguous range expansion out of Africa at the level of the total cladogram. Our new results support a general scenario in which, after an early out-of-Africa range expansion, global-scale patterns of NRY variation were mainly influenced by migrations out of Asia. Two other notable findings of the NCA were (1) Europe as a "receiver" of intercontinental signals primarily from Asia, and (2) the large number of intracontinental signals within Africa. Our AMOVA analyses also supported the hypothesis that patrilocality effects are evident at local and regional scales, rather than at intercontinental and global levels. Finally, our results underscore the importance of subdivision of the human paternal gene pool and imply that caution should be exercised when using models and experimental strategies based on the assumption of panmixia.

309 citations



Journal ArticleDOI
TL;DR: This is the first report of ITS regions from a single cyanobacterial isolate not only different in configuration, but also, within one configuration, different in sequence.
Abstract: We amplified, TA-cloned, and sequenced the 16S-23S internal transcribed spacer (ITS) regions from single isolates of several cyanobacterial species, Calothrix parietina, Scytonema hyalinum, Coelodesmium wrangelii, Tolypothrix distorta, and a putative new genus (isolates SRS6 and SRS70), to investigate the potential of this DNA sequence for phylogenetic and population genetic studies. All isolates carried ITS regions containing the sequences coding for two tRNA molecules (tRNA and tRNA). We retrieved additional sequences without tRNA features from both C. parietina and S. hyalinum. Furthermore, in S. hyalinum, we found two of these non-tRNA-encoding regions to be identical in length but different in sequence. This is the first report of ITS regions from a single cyanobacterial isolate not only different in configuration, but also, within one configuration, different in sequence. The potential of the ITS region as a tool for studying molecular systematics and population genetics is significant, but the presence of multiple nonidentical rRNA operons poses problems. Multiple nonidentical rRNA operons may impact both studies that depend on comparisons of phylogenetically homologous sequences and those that employ restriction enzyme digests of PCR products. We review current knowledge of the numbers and kinds of 16S-23S ITS regions present across bacterial groups and plastids, and we discuss broad patterns congruent with higher-level systematics of prokaryotes.

Journal ArticleDOI
TL;DR: The results indicate that many Acropora species belong to a species complex or syngameon and that morphology has little predictive value with regard to syngAMEon composition.
Abstract: This study examines molecular relationships across a wide range of species in the mass spawning scleractinian coral genus Acropora. Molecular phylogenies were obtained for 28 species using DNA sequence analyses of two independent markers, a nuclear intron and the mtDNA putative control region. Although the compositions of the major clades in the phylogenies based on these two markers were similar, there were several important differences. This, in combination with the fact that many species were not monophyletic, suggests either that introgressive hybridization is occurring or that lineage sorting is incomplete. The molecular tree topologies bear little similarity to the results of a recent cladistic analysis based on skeletal morphology and are at odds with the fossil record. We hypothesize that these conflicting results may be due to the same morphology having evolved independently more than once in Acropora and/or the occurrence of extensive interspecific hybridization and introgression in combination with morphology being determined by a small number of genes. Our results indicate that many Acropora species belong to a species complex or syngameon and that morphology has little predictive value with regard to syngameon composition. Morphological species in the genus often do not correspond to genetically distinct evolutionary units. Instead, species that differ in timing of gamete release tend to constitute genetically distinct clades.

Journal ArticleDOI
Xun Gu1
TL;DR: The subtree likelihood provides a solution that is computationally feasible and robust against the uncertainty of the phylogeny in the large gene family with many member genes (clusters), which appears to be a normal case in postgenomics.
Abstract: According to the observed alignment pattern (i.e., amino acid configuration), we studied two basic types of functional divergence of a protein family. Type I functional divergence after gene duplication results in altered functional constraints (i.e., different evolutionary rate) between duplicate genes, whereas type II results in no altered functional constraints but radical change in amino acid property between them (e.g., charge, hydrophobicity, etc.). Two statistical approaches, i.e., the subtree likelihood and the whole-tree likelihood, were developed for estimating the coefficients of (type I or type II) functional divergence. Numerical algorithms for obtaining maximum-likelihood estimates are also provided. Moreover, a posterior-based site-specific profile is implemented to predict critical amino acid residues that are responsible for type I and/or type II functional divergence after gene duplication. We compared the current likelihood with a fast method developed previously by examples; both show similar results. For handling altered functional constraints (type I functional divergence) in the large gene family with many member genes (clusters), which appears to be a normal case in postgenomics, the subtree likelihood provides a solution that is computationally feasible and robust against the uncertainty of the phylogeny. The cost of this feasibility is the approximation when frequencies of amino acids are very skewed. The potential bias and correction are discussed.

Journal ArticleDOI
TL;DR: Exons of growth hormone receptor (GHR) and breast cancer susceptibility (BRCA1) genes were sequenced for a wide diversity of rodents and other mammals and sequences of the mitochondrial 12S rRNA gene and previously published sequences of von Willebrand factor were combined to support monophyly.
Abstract: The order Rodentia contains half of all extant mammal species, and from an evolutionary standpoint, there are persistent controversies surrounding the monophyly of the order, divergence dates for major lineages, and relationships among families. Exons of growth hormone receptor (GHR) and breast cancer susceptibility (BRCA1) genes were sequenced for a wide diversity of rodents and other mammals and combined with sequences of the mitochondrial 12S rRNA gene and previously published sequences of von Willebrand factor (vWF). Rodents exhibit rates of amino acid replacement twice those observed for nonrodents, and this rapid rate of evolution influences estimates of divergence dates. Based on GHR sequences, monophyly is supported, with the estimated divergence between hystricognaths and most sciurognaths dating to about 75 MYA. Most estimated dates of divergence are consistent with the fossil record, including a date of 23 MYA for Mus-Rattus divergence. These dates are considerably later than those derived from some other molecular studies. Among combined and separate analyses of the various gene sequences, moderate to strong support was found for several clades. GHR appears to have greater resolving power than do 12S or vWF. Despite its complete unresponsiveness to growth hormone, Cavia (and other hystricognaths) exhibits a conservative rate of change in the intracellular domain of GHR.

Journal ArticleDOI
TL;DR: The topology of protein domain networks generated with data from the ProDom, Pfam, and Prosite domain databases was studied and it was found that these networks exhibited small-world and scale-free topologies with a high degree of local clustering accompanied by a few long-distance connections.
Abstract: Several technical, social, and biological networks were recently found to demonstrate scale-free and small-world behavior instead of random graph characteristics. In this work, the topology of protein domain networks generated with data from the ProDom, Pfam, and Prosite domain databases was studied. It was found that these networks exhibited small-world and scale-free topologies with a high degree of local clustering accompanied by a few long-distance connections. Moreover, these observations apply not only to the complete databases, but also to the domain distributions in proteomes of different organisms. The extent of connectivity among domains reflects the evolutionary complexity of the organisms considered.

Journal ArticleDOI
TL;DR: Variations in cichlid spectral sensitivity have arisen through evolution of gene regulation, rather than through changes in opsin amino acid sequence.
Abstract: Spectral tuning of visual pigments is typically accomplished through changes in opsin amino acid sequence. Within a given opsin class, changes at a few key sites control wavelength specificity. To investigate known differences in the visual pigment spectral sensitivity of the Lake Malawi cichlids, Metriaclima zebra (368, 488, and 533 nm) and Dimidiochromis compressiceps (447, 536, and 569 nm), we sequenced cone opsin genes from these species as well as Labeotropheus fuelleborni and Oreochromis niloticus. These cichlids have five distinct classes of cone opsin genes, including two unique SWS-2 genes. Comparisons of the inferred amino acid sequences from the five cone opsin genes of M. zebra, D. compressiceps, and L. fuelleborni show the sequences to be nearly identical. Therefore, evolution of key opsin sites cannot explain the differences in visual pigment sensitivities. Real-time PCR demonstrates that different cichlid species express different subsets of the available cone opsin genes. Metriaclima zebra and L. fuelleborni express a complement of genes which give them UV-shifted visual pigments, while D. compressiceps expresses a different set to produce a red-shifted visual system. Thus, variations in cichlid spectral sensitivity have arisen through evolution of gene regulation, rather than through changes in opsin amino acid sequence.

Journal ArticleDOI
TL;DR: The resulting time estimates indicate that many major clades of modern birds had their origins within the Cretaceous and suggests that modern birds may have coexisted with other avian lineages for an extended period during theCretaceous.
Abstract: Molecular clocks can be calibrated using fossils within the group under study (internal calibration) or outside of the group (external calibration). Both types of calibration have their advantages and disadvantages. An internal calibration may reduce extrapolation error but may not be from the best fossil record, raising the issue of nonindependence. An external calibration may be more independent but also may have a greater extrapolation error. Here, we used the advantages of both methods by applying a sequential calibration to avian molecular clocks. We estimated a basal divergence within birds, the split between fowl (Galliformes) and ducks (Anseriformes), to be 89.8 +/- 6.97 MYA using an external calibration and 12 rate-constant nuclear genes. In turn, this time estimate was used as an internal calibration for three species-rich avian molecular data sets: mtDNA, DNA-DNA hybridization, and transferrin immunological distances. The resulting time estimates indicate that many major clades of modern birds had their origins within the Cretaceous. This supports earlier studies that identified large gaps in the avian fossil record and suggests that modern birds may have coexisted with other avian lineages for an extended period during the Cretaceous. The new time estimates are concordant with a continental breakup model for the origin of ratites.

Journal ArticleDOI
TL;DR: Here, a model allowing covarion-like evolution of DNA sequences is introduced, and this model allows the site-specific rate to vary between lineages by adding as few as two parameters to the widely used among-site rate variation model.
Abstract: Here, a model allowing covarion-like evolution of DNA sequences is introduced. In contrast to standard representation of the distribution of evolutionary rates, this model allows the site-specific rate to vary between lineages. This is achieved by adding as few as two parameters to the widely used among-site rate variation model, namely, (1) the proportion of sites undergoing rate changes and (2) the rate of rate change. This model is implemented in the likelihood framework, allowing parameter estimation, comparison of models, and tree reconstruction. An application to ribosomal RNA sequences suggests that covarions (i.e., site-specific rate changes) play an important role in the evolution of these molecules. Neglecting them results in a severe underestimate of the variance of rates across sites. It has, however, little influence on the estimation of ancestral G+C contents obtained from a nonhomogeneous model, or on the resulting inferences about the evolution of thermophyly. This theoretical effort should be useful for the study of protein adaptation, which presumably proceeds in a typical covarion-like manner.

Journal ArticleDOI
TL;DR: An intuitive visual framework, the generalized skyline plot, is presented, based on a genealogy inferred from the sequences, that provides a nonparametric estimate of effective population size through time and employs a small-sample Akaike information criterion to objectively choose the optimal grouping strategy.
Abstract: We present an intuitive visual framework, the generalized skyline plot, to explore the demographic history of sampled DNA sequences. This approach is based on a genealogy inferred from the sequences and provides a nonparametric estimate of effective population size through time. In contrast to previous related procedures, the generalized skyline plot is more applicable to cases where the underlying tree is not fully resolved and the data is not highly variable. This is achieved by the grouping of adjacent coalescent intervals. We employ a small-sample Akaike information criterion to objectively choose the optimal grouping strategy. We investigate the performance of our approach using simulation and subsequently apply it to HIV-1 sequences from central Africa and mtDNA sequences from red pandas.

Journal ArticleDOI
TL;DR: Overall, D. melanogaster autosomes harbor an excess of amino acid replacement polymorphisms relative to D. simulans, and range expansion from Africa appears to have had little effect on synonymous-to-replacement polymorphism ratios.
Abstract: Surveys of molecular variation in Drosophila melanogaster and Drosophila simulans have suggested that diversity outside of Africa is a subset of that within Africa. It has been argued that reduced levels of diversity in non-African populations reflect a population bottleneck, adaptation to temperate climates, or both. Here, I summarize the available single-nucleotide polymorphism data for both species. A simple "out of Africa" bottleneck scenario is consistent with geographic patterns for loci on the X chromosome but not with loci on the autosomes. Interestingly, there is a trend toward lower nucleotide diversity on the X chromosome relative to autosomes in non-African populations of D. melanogaster, but the opposite trend is seen in African populations. In African populations, autosomal inversion polymorphisms in D. melanogaster may contribute to reduced autosome diversity relative to the X chromosome. To elucidate the role that selection might play in shaping patterns of variability, I present a summary of within- and between-species patterns of synonymous and replacement variation in both species. Overall, D. melanogaster autosomes harbor an excess of amino acid replacement polymorphisms relative to D. simulans. Interestingly, range expansion from Africa appears to have had little effect on synonymous-to-replacement polymorphism ratios.

Journal ArticleDOI
TL;DR: A novel star contraction algorithm is presented here which rigorously identifies starlike nodes (clusters) diagnostic of prehistoric demographic expansions of East Asian and Papuan lineages.
Abstract: In the past decade, mitochondrial DNA (mtDNA) of 826 representative East Asians and Papuans has been typed by high-resolution (14-enzyme) restriction fragment length polymorphism (RFLP) analysis. Compared with mtDNA control region sequencing, RFLP typing of the complete human mitochondrial DNA generally yields a cleaner phylogeny, the nodes of which can be dated assuming a molecular clock. We present here a novel star contraction algorithm which rigorously identifies starlike nodes (clusters) diagnostic of prehistoric demographic expansions. Applied to the Asian and Papuan data, we date the out-of-Africa migration of the ancestral mtDNA types that founded all Eurasian (including Papuan) lineages at 54,000 years. While the proto-Papuan mtDNA continued expanding at this time along a southern route to Papua New Guinea, the proto-Eurasian mtDNA appears to have drifted genetically and does not show any comparable demographic expansion until 30,000 years ago. By this time, the East Asian, Indian, and European mtDNA pools seem to have separated from each other, as postulated by the weak Garden of Eden model. The east Asian expansion entered America about 25,000 years ago, but was then restricted on both sides of the Pacific to more southerly latitudes during the Last Glacial Maximum around 20,000 years ago, coinciding with a chronological gap in our expansion dates. Repopulation of northern Asian latitudes occurred after the Last Glacial Maximum, obscuring the ancestral Asian gene pool of Amerinds.

Journal ArticleDOI
TL;DR: The data suggest that the same climatic phenomenon synchronized the onset of genetic divergence of lineages in all three species flocks, such that their most recent evolutionary history seems to be linked to the same external modulators of adaptive radiation.
Abstract: Water level fluctuations are important modulators of speciation processes in tropical lakes, in that they temporarily form or break down barriers to gene flow among adjacent populations and/or incipient species. Time estimates of the most recent major lowstands of the three African Great Lakes are thus crucial to infer the relative timescales of explosive speciation events in cichlid species flocks. Our approach combines geological evidence with genetic divergence data of cichlid fishes from the three Great East African Lakes derived from the fastest-evolving mtDNA segment. Thereby, we show for each of the three lakes that individuals sampled from several populations which are currently isolated by long geographic distances and/or deep water form clusters of equally closely related haplotypes. The distribution of identical or equally closely related haplotypes in a lake basin allows delineation of the extent of lake level fluctuations. Our data suggest that the same climatic phenomenon synchronized the onset of genetic divergence of lineages in all three species flocks, such that their most recent evolutionary history seems to be linked to the same external modulators of adaptive radiation. A calibration of the molecular clock of the control region was elaborated by gauging the age of the Lake Malawi species flock through the divergence among the utaka-cichlid and the mbuna-cichlid lineages to minimally 570,000 years and maximally 1 Myr. This suggests that the low-lake-level period which established the observed patterns of genetic relatedness dates back less than 57,000 years, probably even to 17,000-12,400 years ago, when Lake Victoria dried up and Lakes Malawi and Tanganyika were also low. A rapid rise of all three lakes about 11,000 years ago established the large-scale population subdivisions observed today. Over that period of time, a multitude of species originated in Lakes Malawi and Victoria with an impressive degree of morphological and ecological differentiation, whereas the Tanganyikan taxa that were exposed to the same habitat changes hardly diverged ecologically and morphologically. Our findings also show that patterns of genetic divergences of stenotopic organisms provide valuable feedback on geological and sedimentological time estimates for lake level changes.

Journal ArticleDOI
TL;DR: It is predicted that local point mutations continually generate considerable genetic variation that is capable of altering gene expression and that new binding sites appear and become fixed within populations on microevolutionary timescales under an assumption of neutral evolution.
Abstract: Although the evolution of protein-coding sequences within genomes is well understood, the same cannot be said of the cis-regulatory regions that control transcription. Yet, changes in gene expression are likely to constitute an important component of phenotypic evolution. We simulated the evolution of new transcription factor binding sites via local point mutations. The results indicate that new binding sites appear and become fixed within populations on microevolutionary timescales under an assumption of neutral evolution. Even combinations of two new binding sites evolve very quickly. We predict that local point mutations continually generate considerable genetic variation that is capable of altering gene expression.

Journal ArticleDOI
TL;DR: The combination of a targeted mutator mechanism to generate high variability with the subsequent action of diversifying selection on highly expressed variants might explain both the hypervariability of conopeptides and the large number of unique sequences per species.
Abstract: Hypervariability is a prominent feature of large gene families that mediate interactions between organisms, such as venom-derived toxins or immunoglobulins. In order to study mechanisms for evolution of hypervariability, we examined an EST-generated assemblage of 170 distinct conopeptide sequences from the venoms of five species of marine Conus snails. These sequences were assigned to eight gene families, defined by conserved elements in the signal domain and untranslated regions. Order-of-magnitude differences were observed in the expression levels of individual conopeptides, with five to seven transcripts typically comprising over 50% of the sequenced clones in a given species. The conopeptide precursor alignments revealed four striking features peculiar to the mature peptide domain: (1) an accelerated rate of nucleotide substitution, (2) a bias for transversions over transitions in nucleotide substitutions, (3) a position-specific conservation of cysteine codons within the hypervariable region, and (4) a preponderance of nonsynonymous substitutions over synonymous substitutions. We propose that the first three observations argue for a mutator mechanism targeted to mature domains in conopeptide genes, combining a protective activity specific for cysteine codons and a mutagenic polymerase that exhibits transversion bias, such as DNA polymerase V. The high D:(n)/D:(s) ratio is consistent with positive or diversifying selection, and further analyses by intraspecific/interspecific gene tree contingency tests weakly support recent diversifying selection in the evolution of conopeptides. Since only the most highly expressed transcripts segregate in gene trees according to the feeding specificity of the species, diversifying selection might be acting primarily on these sequences. The combination of a targeted mutator mechanism to generate high variability with the subsequent action of diversifying selection on highly expressed variants might explain both the hypervariability of conopeptides and the large number of unique sequences per species.

Journal ArticleDOI
TL;DR: It is suggested that animal mtDNA molecules may recombine regularly and that the extent to which this generates new haplotypes may depend only on the frequency of biparental inheritance of the mitochondrial genome.
Abstract: The assumption that animal mitochondrial DNA (mtDNA) does not undergo homologous recombination is based on indirect evidence, yet it has had an important influence on our understanding of mtDNA repair and mutation accumulation (and thus mitochondrial disease and aging) and on biohistorical inferences made from population data. Recently, several studies have suggested recombination in primate mtDNA on the basis of patterns of frequency distribution and linkage associations of mtDNA mutations in human populations, but others have failed to produce similar evidence. Here, we provide direct evidence for homologous mtDNA recombination in mussels, where heteroplasmy is the rule in males. Our results indicate a high rate of mtDNA recombination. Coupled with the observation that mammalian mitochondria contain the enzymes needed for the catalysis of homologous recombination, these findings suggest that animal mtDNA molecules may recombine regularly and that the extent to which this generates new haplotypes may depend only on the frequency of biparental inheritance of the mitochondrial genome. This generalization must, however, await evidence from animal species with typical maternal mtDNA inheritance.

Journal ArticleDOI
TL;DR: The time of divergence of the insect proteins from the malacostracan hemocyanin subunits was estimated to be about 430-440 MYA, providing support for the notion that the Hexapoda evolved from the same crustacean lineage as the Malacstraca.
Abstract: Arthropod hemocyanins are members of a protein superfamily that also comprises the arthropod phenoloxidases (tyrosinases), crustacean pseudohemocyanins (cryptocyanins), and insect storage hexamerins. The evolution of these proteins was inferred by neighbor-joining, maximum-parsimony, and maximum-likelihood methods. Monte Carlo shuffling approaches provided evidence against a discernible relationship of the arthropod hemocyanin superfamily and molluscan hemocyanins or nonarthropodan tyrosinases. Within the arthropod hemocyanin superfamily, the phenoloxidase probably emerged early in the (eu-)arthropod stemline and thus form the most likely outgroup. The respiratory hemocyanins evolved from these enzymes before the radiation of the extant euarthropodan subphyla. Due to different functional constraints, replacement rates greatly vary between the clades. Divergence times were thus estimated assuming local molecular clocks using several substitution models. The results were consistent and indicated the separation of the cheliceratan and crustacean hemocyanins close to 600 MYA. The different subunit types of the multihexameric cheliceratan hemocyanin have a rather conservative structure and diversified in the arachnidan stemline between 550 and 450 MYA. By contrast, the separation of the crustacean (malacostracan) hemocyanin subunits probably occurred only about 200 MYA. The nonrespiratory pseudohemocyanins evolved within the Decapoda about 215 MYA. The insect hemocyanins and storage hexamerins emerged independently from the crustacean hemocyanins. The time of divergence of the insect proteins from the malacostracan hemocyanins was estimated to be about 430-440 MYA, providing support for the notion that the Hexapoda evolved from the same crustacean lineage as the Malacostraca.

Journal ArticleDOI
TL;DR: The expression of duplicated cyp19 genes at two different tissues highlights the evolutionary significance of maintaining two active genes on duplicated zebrafish chromosomes for specific functions in the ovary and the brain.
Abstract: Cytochrome P450 aromatase (Cyp19) is an enzyme catalyzing the synthesis of estrogens, thereby controlling various physiological functions of estrogens. We isolated two cyp19 cDNAs, termed cyp19a and cyp19b, respectively, from zebrafish. These genes are located in linkage groups 18 and 25, respectively. Detailed gene mapping indicated that zebrafish linkage groups 18 and 25 may have arisen from the same ancestral chromosome by a chromosome duplication event. Cyp19a is expressed mainly in the follicular cells lining the vitellogenic oocytes in the ovary during vitellogenesis. Cyp19b is expressed abundantly in the brain, at the hypothalamus and ventral telencephalon, extending to the olfactory bulbs. The expression of duplicated cyp19 genes at two different tissues highlights the evolutionary significance of maintaining two active genes on duplicated zebrafish chromosomes for specific functions in the ovary and the brain.

Journal ArticleDOI
TL;DR: The nuclear exons consistently performed better than mitochondrial protein and rRNA-tRNA coding genes on a per-residue basis in recovering benchmark clades and nuclear genes having appropriate rates of substitution should receive strong consideration in efforts to reconstruct deep-level phylogenetic relationships.
Abstract: Both mitochondrial and nuclear gene sequences have been employed in efforts to reconstruct deep-level phylogenetic relationships. A fundamental question in molecular systematics concerns the efficacy of different types of sequences in recovering clades at different taxonomic levels. We compared the performance of four mitochondrial data sets (cytochrome b, cytochrome oxidase II, NADH dehydrogenase subunit I, 12S rRNA-tRNA-16S rRNA) and eight nuclear data sets (exonic regions of alpha-2B adrenergic receptor, aquaporin, ss-casein, gamma-fibrinogen, interphotoreceptor retinoid binding protein, kappa-casein, protamine, von Willebrand Factor) in recovering deep-level mammalian clades. We employed parsimony and minimum-evolution with a variety of distance corrections for superimposed substitutions. In 32 different pairwise comparisons between these mitochondrial and nuclear data sets, we used the maximum set of overlapping taxa. In each case, the variable-length bootstrap was used to resample at the size of the smaller data set. The nuclear exons consistently performed better than mitochondrial protein and rRNA-tRNA coding genes on a per-residue basis in recovering benchmark clades. We also concatenated nuclear genes for overlapping taxa and made comparisons with concatenated mitochondrial protein-coding genes from complete mitochondrial genomes. The variable-length bootstrap was used to score the recovery of benchmark clades as a function of the number of resampled base pairs. In every case, the nuclear concatenations were more efficient than the mitochondrial concatenations in recovering benchmark clades. Among genes included in our study, the nuclear genes were much less affected by superimposed substitutions. Nuclear genes having appropriate rates of substitution should receive strong consideration in efforts to reconstruct deep-level phylogenetic relationships.

Journal ArticleDOI
TL;DR: The present study is the first to report the successful application of the SINE method in demonstrating the existence of such possible "ancient" incomplete lineage sorting, and discusses the possibility that it might potentially be very difficult to resolve the species phylogeny of a group that radiated explosively, even by resolving the genealogies of more than 10 nuclear loci, as a consequence of incomplete lineage sorted during speciation.
Abstract: Lake Tanganyika harbors numerous endemic species of extremely diverse cichlid fish that have been classified into 12 major taxonomic groups known as tribes. Analysis of short interspersed element (SINE) insertion data has been acknowledged to be a powerful tool for the elucidation of phylogenetic relationships, and we applied this method in an attempt to clarify such relationships among these cichlids. We studied insertion patterns of 38 SINEs in total, 24 of which supported the monophyly of three clades. The other 14 loci revealed extensive incongruence in terms of the patterns of SINE insertions. These incongruencies most likely stem from a period of adaptive radiation. One possible explanation for this phenomenon is the extensive incomplete lineage sorting of alleles for the presence or absence of a SINE during successive speciation events which took place about 5-10 MYA. The present study is the first to report the successful application of the SINE method in demonstrating the existence of such possible "ancient" incomplete lineage sorting. We discuss the possibility that it might potentially be very difficult to resolve the species phylogeny of a group that radiated explosively, even by resolving the genealogies of more than 10 nuclear loci, as a consequence of incomplete lineage sorting during speciation.