scispace - formally typeset
Search or ask a question

Showing papers in "Molecular Biology and Evolution in 2002"


Journal ArticleDOI
TL;DR: It is found that, for an extensive variety of cases, the most powerful tests for detecting population growth are Fu's F(S) test and the newly developed R(2) test.
Abstract: A number of statistical tests for detecting population growth are described. We compared the statistical power of these tests with that of others available in the literature. The tests evaluated fall into three categories: those tests based on the distribution of the mutation frequencies, on the haplotype distribution, and on the mismatch distribution. We found that, for an extensive variety of cases, the most powerful tests for detecting population growth are Fu's F(S) test and the newly developed R(2) test. The behavior of the R(2) test is superior for small sample sizes, whereas F(S) is better for large sample sizes. We also show that some popular statistics based on the mismatch distribution are very conservative.

1,929 citations


Journal ArticleDOI
TL;DR: A semiparametric smoothing method is developed using penalized likelihood, a saturated model in which every lineage has a separate rate combined with a roughness penalty that discourages rates from varying too much across a phylogeny.
Abstract: Rates of molecular evolution vary widely between lineages, but quantification of how rates change has proven difficult. Recently proposed estimation procedures have mainly adopted highly parametric approaches that model rate evolution explicitly. In this study, a semiparametric smoothing method is developed using penalized likelihood. A saturated model in which every lineage has a separate rate is combined with a roughness penalty that discourages rates from varying too much across a phylogeny. A data-driven cross-validation criterion is then used to determine an optimal level of smoothing. This criterion is based on an estimate of the average prediction error associated with pruning lineages from the tree. The methods are applied to three data sets of six genes across a sample of land plants. Optimally smoothed estimates of absolute rates entailed 2- to 10-fold variation across lineages.

1,877 citations


Journal ArticleDOI
TL;DR: Previous models to allow the omega ratio to vary both among sites and among lineages and implement the new models in the likelihood framework are extended and may be useful for identifying positive selection along prespecified lineages that affects only a few sites in the protein.
Abstract: The nonsynonymous (amino acid-altering) to synonymous (silent) substitution rate ratio (omega = d(N)/d(S)) provides a measure of natural selection at the protein level, with omega = 1, >1, and <1, indicating neutral evolution, purifying selection, and positive selection, respectively. Previous studies that used this measure to detect positive selection have often taken an approach of pairwise comparison, estimating substitution rates by averaging over all sites in the protein. As most amino acids in a functional protein are under structural and functional constraints and adaptive evolution probably affects only a few sites at a few time points, this approach of averaging rates over sites and over time has little power. Previously, we developed codon-based substitution models that allow the omega ratio to vary either among lineages or among sites. In this paper we extend previous models to allow the omega ratio to vary both among sites and among lineages and implement the new models in the likelihood framework. These models may be useful for identifying positive selection along prespecified lineages that affects only a few sites in the protein. We apply those branch-site models as well as previous branch- and site-specific models to three data sets: the lysozyme genes from primates, the tumor suppressor BRCA1 genes from primates, and the phytochrome (PHY) gene family in angiosperms. Positive selection is detected in the lysozyme and BRCA genes by both the new and the old models. However, only the new models detected positive selection acting on lineages after gene duplication in the PHY gene family. Additional tests on several data sets suggest that the new models may be useful in detecting positive selection after gene duplication in gene family evolution.

1,265 citations


Journal ArticleDOI
TL;DR: The role of recombination and HGT in giving phenotypic "coherence" to prokaryotic taxa at all levels of inclusiveness, the implications of these processes for the reconstruction and meaning of "phylogeny," and new views of proKaryotic adaptation and diversification based on gene acquisition and exchange are discussed.
Abstract: Accumulating prokaryotic gene and genome sequences reveal that the exchange of genetic information through both homology-dependent recombination and horizontal (lateral) gene transfer (HGT) is far more important, in quantity and quality, than hitherto imagined. The traditional view, that prokaryotic evolution can be understood primarily in terms of clonal divergence and periodic selection, must be augmented to embrace gene exchange as a creative force, itself responsible for much of the pattern of similarities and differences we see between prokaryotic microbes. Rather than replacing periodic selection on genetic diversity, gene loss, and other chromosomal alterations as important players in adaptive evolution, gene exchange acts in concert with these processes to provide a rich explanatory paradigm-some of whose implications we explore here. In particular, we discuss (1) the role of recombination and HGT in giving phenotypic "coherence" to prokaryotic taxa at all levels of inclusiveness, (2) the implications of these processes for the reconstruction and meaning of "phylogeny," and (3) new views of prokaryotic adaptation and diversification based on gene acquisition and exchange.

936 citations


Journal ArticleDOI
TL;DR: The identification of a fourth and novel type of globin in mouse, man, and zebrafish has been reported, which indicates that the vertebrate myoglobins are in fact a specialized intracellular globin that evolved in adaptation to the special needs of muscle cells.
Abstract: Vertebrates possess multiple respiratory globins that differ in terms of structure, function, and tissue distribution. Three types of globins have been described so far: hemoglobin facilitates the transport of oxygen in the blood, myoglobin serves oxygen transport and storage in the muscle, and neuroglobin has a yet unidentified function in nerve cells. Here we report the identification of a fourth and novel type of globin in mouse, man, and zebrafish. It is expressed in apparently all types of human tissue and therefore has been called cytoglobin (CYGB). Mouse and human CYGBs comprise 190 amino acids; the zebrafish CYGB, 174 amino acids. The human CYGB gene is located on chromosome 17q25. The mammalian genes display a unique exon-intron pattern with an additional exon resulting in a C-terminal extension of the protein, which is absent in the fish CYGB. Phylogenetic analyses suggest that the CYGBs had a common ancestor with vertebrate myoglobins. This indicates that the vertebrate myoglobins are in fact a specialized intracellular globin that evolved in adaptation to the special needs of muscle cells.

474 citations


Journal ArticleDOI
TL;DR: The study defines a robust nucleotide and amino acid mitochondrial molecular clock encompassing five insect orders, including the Blattaria, and explores Tilyard's theory proposing that the terrestrial transition of the aquatic arthropod ancestor to the insects is associated with a particular plant group (early vascular plants).
Abstract: A unified understanding of >390 Myr of insect evolution requires insight into their origin. Molecular clocks are widely applied for evolutionary dating, but clocks for the class Insecta have remained elusive. We now define a robust nucleotide and amino acid mitochondrial molecular clock encompassing five insect orders, including the Blattaria (cockroaches), Orthoptera (crickets and locusts), Hemiptera (true bugs), Diptera, and Lepidoptera (butterflies and moths). Calibration of the clock using one of the earliest, most extensive fossil records for insects (the early ancestors of extant Blattaria) was congruent with all available insect fossils, with biogeographic history, with the Cambrian explosion, and with independent dating estimates from Lepidopteran families. In addition, dates obtained from both nucleotide and amino acid clocks were congruent with each other. Of particular interest to vector biology is the early date of the emergence of triatomine bugs (99.8-93.5 MYA), coincident with the formation of the South American continent during the breakup of Gondwanaland. More generally, we reveal the insects arising from a common ancestor with the Anostraca (fairy shrimps) at around the Silurian-Ordovician boundary (434.2-421.1 MYA) coinciding with the earliest plant megafossil. We explore Tilyard's theory proposing that the terrestrial transition of the aquatic arthropod ancestor to the insects is associated with a particular plant group (early vascular plants). The major output of the study is a comprehensive series of dates for deep-branching points within insect evolution that can act as calibration points for further dating studies within insect families and genera.

455 citations


Journal ArticleDOI
TL;DR: An analysis of the evolutionary dynamics of transcription factor binding sites whose function had been experimentally verified in promoters of 51 human genes and their sequence to homologous sequences in other primate species and rodents shows extensive divergence.
Abstract: Comparisons between human and rodent DNA sequences are widely used for the identification of regulatory regions (phylogenetic footprinting), and the importance of such intergenomic comparisons for promoter annotation is expanding. The efficacy of such comparisons for the identification of functional regulatory elements hinges on the evolutionary dynamics of promoter sequences. Although it is widely appreciated that conservation of sequence motifs may provide a suggestion of function, it is not known as to what proportion of the functional binding sites in humans is conserved in distant species. In this report, we present an analysis of the evolutionary dynamics of transcription factor binding sites whose function hail been experimentally verified in promoters of 51 human genes and compare their sequence to homologous sequences in other primate species and rodents. Our results show that there is extensive divergence within the nucleotide sequence of transcription factor binding sites. Using direct experimental data from functional studies in both human and rodents for 20 of the regulatory regions, we estimate that 32%-40% of the human functional sites are not functional in rodents. This is evidence that there is widespread turnover of transcription factor binding sites. These results have important implications for the efficacy of phylogenetic footprinting and the interpretation of the pattern of evolution in regulatory sequences.

438 citations


Journal ArticleDOI
TL;DR: The phylogenetic backbone of the East Asian mtDNA tree is determined by using published complete mtDNA sequences and assessing both coding and control region variation in 69 Han individuals from southern China, confirming that the East Asia mtDNA pool is locally region-specific and completely covered by the two superhaplogroups M and N.
Abstract: We determine the phylogenetic backbone of the East Asian mtDNA tree by using published complete mtDNA sequences and assessing both coding and control region variation in 69 Han individuals from southern China. This approach assists in the interpretation of published mtDNA data on East Asians based on either control region sequencing or restriction fragment length polymorphism (RFLP) typing. Our results confirm that the East Asian mtDNA pool is locally region-specific and completely covered by the two superhaplogroups M and N. The phylogenetic partitioning based on complete mtDNA sequences corroborates existing RFLP-based classification of Asian mtDNA types and supports the distinction between northern and southern populations. We describe new haplogroups M7, M8, M9, N9, and R9 and demonstrate by way of example that hierarchically subdividing the major branches of the mtDNA tree aids in recognizing the settlement processes of any particular region in appropriate time scale. This is illustrated by the characteristically southern distribution of haplogroup M7 in East Asia, whereas its daughter-groups, M7a and M7b2, specific for Japanese and Korean populations, testify to a presumably (pre-)Jomon contribution to the modern mtDNA pool of Japan.

409 citations


Journal ArticleDOI
TL;DR: It is clear that the genome of Drosophila melanogaster has undergone few gene duplications in the recent past and has much fewer gene families than C. elegans, and yeast has the smallest number of duplicate genes.
Abstract: We conducted a detailed analysis of duplicate genes in three complete genomes: yeast, Drosophila, and Caenorhabditis elegans. For two proteins belonging to the same family we used the criteria: (1) their similarity is > or =I (I = 30% if L > or = 150 a.a. and I = 0.01n + 4.8L(-0.32(1 + exp(-L/1000))) if L or = 80% of the longer protein. We found it very important to delete isoforms (caused by alternative splicing), same genes with different names, and proteins derived from repetitive elements. We estimated that there were 530, 674, and 1,219 protein families in yeast, Drosophila, and C. elegans, respectively, so, as expected, yeast has the smallest number of duplicate genes. However, for the duplicate pairs with the number of substitutions per synonymous site (K(S)) < 0.01, Drosophila has only seven pairs, whereas yeast has 58 pairs and nematode has 153 pairs. After considering the possible effects of codon usage bias and gene conversion, these numbers became 6, 55, and 147, respectively. Thus, Drosophila appears to have much fewer young duplicate genes than do yeast and nematode. The larger numbers of duplicate pairs with K(S) < 0.01 in yeast and C. elegans were probably largely caused by block duplications. At any rate, it is clear that the genome of Drosophila melanogaster has undergone few gene duplications in the recent past and has much fewer gene families than C. elegans.

405 citations


Journal ArticleDOI
TL;DR: It is found that functional cis-regulatory variation is widespread in the human genome and that the consequent variation in gene expression is twofold or greater for 63% of the genes surveyed, and the distinctive consequences of cis-Regulatory variation for the genotype-phenotype relationship are outlined.
Abstract: Changes in gene expression and regulation--due in particular to the evolution of cis-regulatory DNA sequences--may underlie many evolutionary changes in phenotypes, yet little is known about the distribution of such variation in populations. We present in this study the first survey of experimentally validated functional cis-regulatory polymorphism. These data are derived from more than 140 polymorphisms involved in the regulation of 107 genes in Homo sapiens, the eukaryote species with the most available data. We find that functional cis-regulatory variation is widespread in the human genome and that the consequent variation in gene expression is twofold or greater for 63% of the genes surveyed. Transcription factor-DNA interactions are highly polymorphic, and regulatory interactions have been gained and lost within human populations. On average, humans are heterozygous at more functional cis-regulatory sites (>16,000) than at amino acid positions (<13,000), in part because of an overrepresentation among the former in multiallelic tandem repeat variation, especially (AC)(n) dinucleotide microsatellites. The role of microsatellites in gene expression variation may provide a larger store of heritable phenotypic variation, and a more rapid mutational input of such variation, than has been realized. Finally, we outline the distinctive consequences of cis-regulatory variation for the genotype-phenotype relationship, including ubiquitous epistasis and genotype-by-environment interactions, as well as underappreciated modes of pleiotropy and overdominance. Ordinary small-scale mutations contribute to pervasive variation in transcription rates and consequently to patterns of human phenotypic variation.

392 citations


Journal ArticleDOI
TL;DR: Recombination in HIV-1 seems to be much more widespread than previously thought, which might have serious implications on vaccine development and on the reliability of previous inferences of HIV- 1 evolutionary history and dynamics.
Abstract: The performance of 14 different recombination detection methods was evaluated by analyzing several empirical data sets where the presence of recombination has been suggested or where recombination is assumed to be absent. In general, recombination methods seem to be more powerful with increasing levels of divergence, but different methods showed distinct performance. Substitution methods using summary statistics gave more accurate inferences than most phylogenetic methods. However, definitive conclusions about the presence of recombination should not be derived on the basis of a single method. Performance patterns observed from the analysis of real data sets coincided very well with previous computer simulation results. Previous recombination inferences from some of the data sets analyzed here should be reconsidered. In particular, recombination in HIV-1 seems to be much more widespread than previously thought. This finding might have serious implications on vaccine development and on the reliability of previous inferences of HIV-1 evolutionary history and dynamics.

Journal ArticleDOI
TL;DR: In this article, the performance of Bayes prediction of amino acids under positive selection by computer simulation was evaluated, and it was shown that using a large number of lineages is the best way to improve the accuracy and power.
Abstract: Bayes prediction quantifies uncertainty by assigning posterior probabilities. It was used to identify amino acids in a protein under recurrent diversifying selection indicated by higher nonsynonymous (d(N)) than synonymous (d(S)) substitution rates or by omega = d(N)/d(S) > 1. Parameters were estimated by maximum likelihood under a codon substitution model that assumed several classes of sites with different omega ratios. The Bayes theorem was used to calculate the posterior probabilities of each site falling into these site classes. Here, we evaluate the performance of Bayes prediction of amino acids under positive selection by computer simulation. We measured the accuracy by the proportion of predicted sites that were truly under selection and the power by the proportion of true positively selected sites that were predicted by the method. The accuracy was slightly better for longer sequences, whereas the power was largely unaffected by the increase in sequence length. Both accuracy and power were higher for medium or highly diverged sequences than for similar sequences. We found that accuracy and power were unacceptably low when data contained only a few highly similar sequences. However, sampling a large number of lineages improved the performance substantially. Even for very similar sequences, accuracy and power can be high if over 100 taxa are used in the analysis. We make the following recommendations: (1) prediction of positive selection sites is not feasible for a few closely related sequences; (2) using a large number of lineages is the best way to improve the accuracy and power of the prediction; and (3) multiple models of heterogeneous selective pressures among sites should be applied in real data analysis.

Journal ArticleDOI
TL;DR: Maximum likelihood models for prepartitioned data sets, which account for the heterogeneity among site partitions by using different omega parameters for the partitions, are implemented and applied to data sets of the major histocompatibility complex (MHC) class I alleles and of the abalone sperm lysin genes.
Abstract: The nonsynonymous to synonymous substitution rate ratio (omega = d(N)/d(S)) provides a sensitive measure of selective pressure at the protein level, with omega values 1 indicating purifying selection, neutral evolution, and diversifying selection, respectively. Maximum likelihood models of codon substitution developed recently account for variable selective pressures among amino acid sites by employing a statistical distribution for the omega ratio among sites. Those models, called random-sites models, are suitable when we do not know a priori which sites are under what kind of selective pressure. Sometimes prior information (such as the tertiary structure of the protein) might be available to partition sites in the protein into different classes, which are expected to be under different selective pressures. It is then sensible to use such information in the model. In this paper, we implement maximum likelihood models for prepartitioned data sets, which account for the heterogeneity among site partitions by using different omega parameters for the partitions. The models, referred to as fixed-sites models, are also useful for combined analysis of multiple genes from the same set of species. We apply the models to data sets of the major histocompatibility complex (MHC) class I alleles from human populations and of the abalone sperm lysin genes. Structural information is used to partition sites in MHC into two classes: those in the antigen recognition site (ARS) and those outside. Positive selection is detected in the ARS by the fixed-sites models. Similarly, sites in lysin are classified into the buried and solvent-exposed classes according to the tertiary structure, and positive selection was detected at the solvent-exposed sites. The random-sites models identified a number of sites under positive selection in each data set, confirming and elaborating the results of the fixed-sites models. The analysis demonstrates the utility of the fixed-sites models, as well as the power of previous random-sites models, which do not use the prior information to partition sites.

Journal ArticleDOI
TL;DR: Although the inactivation of the hominoid Uox gene was caused by independent nonsense or frameshift mutations, the gene has taken a two-step deterioration process, first in the promoter and second in the coding region during primate evolution.
Abstract: We have determined and compared the promoter, coding, and intronic sequences of the urate oxidase (Uox) gene of various primate species. Although we confirm the previous observation that the inactivation of the gene in the clade of the human and the great apes results from a single CGA to TGA nonsense mutation in exon 2, we find that the inactivation in the gibbon lineage results from an independent nonsense mutation at a different CGA codon in exon 2 or from either one-base deletion in exon 3 or one-base insertion in exon 5, contrary to the previous claim that the cause is a 13-bp deletion in exon 2. We also find that compared with other organisms, the primate functional Uox gene is exceptional in terms of usage of CGA codons which are prone to TGA nonsense mutations. Nevertheless, we demonstrate rather strong selective constraint against nonsynonymous sites of the functional Uox gene and argue that this observation is consistent with the fact that the Uox gene is unique in the genome and evolutionarily conserved not only among animals but also among eukaryotes. Another finding that there are a few substitutions in the cis-acting element or CAAT-box (or both) of primate functional Uox genes may explain the lowered transcriptional activity. We suggest that although the inactivation of the hominoid Uox gene was caused by independent nonsense or frameshift mutations, the gene has taken a two-step deterioration process, first in the promoter and second in the coding region during primate evolution. It is also argued that the high concentration of uric acid in the blood of humans and nonhuman primates has developed molecular coevolution with the xanthine oxidoreductase in purine metabolism. However, it remains to be answered whether loss of Uox activity in hominoids is related to protection from oxidative damage and the prolonged life span.

Journal ArticleDOI
TL;DR: This work demonstrates that 95% of the variable positions in amino acid sequences of mitochondrial cytochrome b underwent dramatic variations of substitution rate among vertebrate lineages, and opens several new avenues of research, such as the understanding of the evolution of functional constraints or the improvement of phylogenetic reconstruction methods.
Abstract: Because of functional constraints, substitution rates vary among the positions of a protein but are usually assumed to be constant at a given site during evolution. The distribution of the rates across the sequence positions generally fits a Gamma distribution. Models of sequence evolution were accordingly designed and led to improved phylogenetic reconstruction. However, it has been convincingly demonstrated that the evolutionary rate of a given position is not always constant throughout time. We called such within-site rate variations heterotachy (for "different speed" in Greek). Yet, heterotachy was found among homologous sequences of distantly related organisms, often with different functions. In such cases, the functional constraints are likely different, which would explain the different distribution of variable sites. To evaluate the importance of heterotachy, we focused on amino acid sequences of mitochondrial cytochrome b, for which the function is likely the same in all vertebrates. Using 2,038 sequences, we demonstrate that 95% of the variable positions are heterotachous, i.e., underwent dramatic variations of substitution rate among vertebrate lineages. Heterotachy even occurs at small evolutionary scale, and in these cases it is very unlikely to be related to functional changes. Since a large number of sequences are required to efficiently detect heterotachy, the extent of this phenomenon could not be estimated for all proteins yet. It could be as large as for cytochrome b, since this protein is not a peculiar case. The observations made here open several new avenues of research, such as the understanding of the evolution of functional constraints or the improvement of phylogenetic reconstruction methods.

Journal ArticleDOI
TL;DR: There is a good correspondence between the genomic regions associated with reproductive isolation and the regions that show little or no evidence of gene flow, and a model in which D. pseudoobscura and D. persimilis have exchanged genes at some loci is supported.
Abstract: The divergence of Drosophila pseudoobscura from its close relatives, D. persimilis and D. pseudoobscura bogotana, was examined using the pattern of DNA sequence variation in a common set of 50 inbred lines at 11 loci from diverse locations in the genome. Drosophila pseudoobscura and D. persimilis show a marked excess of low-frequency variation across loci, consistent with a model of recent population expansion in both species. The different loci vary considerably, both in polymorphism levels and in the levels of polymorphisms that are shared by different species pairs. A major question we address is whether these patterns of shared variation are best explained by gene flow or by persistence since common ancestry. A new test of gene flow, based on patterns of linkage disequilibrium, is developed. The results from these, and other tests, support a model in which D. pseudoobscura and D. persimilis have exchanged genes at some loci. However, the pattern of variation suggests that most gene flow, although occurring after speciation began, was not recent. There is less evidence of gene flow between D. pseudoobscura and D. p. bogotana. The results are compared with recent work on the genomic locations of genes that contribute to reproductive isolation between D. pseudoobscura and D. persimilis. We show that there is a good correspondence between the genomic regions associated with reproductive isolation and the regions that show little or no evidence of gene flow.

Journal ArticleDOI
TL;DR: The ILD test has only limited power to detect incongruence caused by differences in the evolutionary conditions or in the tree topology, except when numerous characters are present and the substitution rate is homogeneous from site to site.
Abstract: This paper examines the efficiency of the incongruence length difference test (ILD) proposed by Farris et al. (1994) for assessing the incongruence between sets of characters. DNA sequences were simulated under various evolutionary conditions: (1) following symmetric or asymmetric trees, (2) with various mutation rates, (3) with constant or variable evolutionary rates along the branches, and (4) with different among-site substitution rates. We first compared two sets of sequences generated along the same tree and under the same evolutionary conditions. The probability of a Type-I error (wrongly rejecting the true hypothesis of congruence) was substantially below the standard 5% level of significance given by the ILD test; this finding indicates that the choice of the 5% level is rather conservative in this case. We then compared two data sets, still generated along the same tree, but under different evolutionary conditions (constant vs. variable evolutionary rate, homogeneity vs. heterogeneity rate of substitution). Under these conditions, the probability of rejecting the true hypothesis of congruence was greater than the 5% given by the ILD test and increased with the number of sites and the degree to which the tree was asymmetric. Finally, the comparison of the two data sets, simulated under contrasting tree structures (symmetric vs. asymmetric) but under the same evolutionary conditions, led us to reject the hypothesis of congruence, albeit weakly, particularly when the number of informative sites was low and among-site substitution rate heterogeneous. We conclude that the ILD test has only limited power to detect incongruence caused by differences in the evolutionary conditions or in the tree topology, except when numerous characters are present and the substitution rate is homogeneous from site to site.

Journal ArticleDOI
TL;DR: Calibration of nucleotide sequence divergence rates among six pairs of geminates in the Arcidae suggests that divergence rates can be greatly overestimated when dates corresponding to final closure of the Central American Seaway are used to calibrate the molecular clocks of marine organisms.
Abstract: Calibration of nucleotide sequence divergence rates provides an important method by which to test many hypotheses of evolution. In the absence of an adequate fossil record, geological events, rather than the first appearances of sister taxa in the geological record, are often used to calibrate molecular clocks. The formation of the Isthmus of Panama, which isolated the tropical western Atlantic and eastern Pacific oceans, is one such event that is frequently used to infer rates of nucleotide sequence divergence. Isthmian calibrations assume that morphologically similar "geminate" species living now on either side of the isthmus were isolated geographically by the latest stages of seaway closure 3.1-3.5 MYA. Here, I have applied calibration dates from the fossil record to cytochrome c oxidase-1 (CO1) and nuclear histone-3 (H3) divergences among six pairs of geminates in the Arcidae to test this hypothesis. Analysis of CO1 first and third positions yield geminate divergences that predate final seaway closure, and on the basis of CO1 first positions, times for all six geminates are significantly greater than 3.5 Myr. H3 sequences produce much more recent geminate divergences, some that are younger than 3.1 Myr. But H3-derived estimates for all arcid geminates are not significantly different from both 0 and 15 Myr. According to CO1, one of the two most divergent pairs, Arca mutabilis and A. imbricata, split more than 30 MYA. This date is compatible with the fossil record, which indicates that these species were morphologically distinct at least 16-21 MYA. Across all CO1 nucleotide sites, divergence rates for arcids are slower than the rates reported for other taxa on the basis of isthmian calibrations, with the exception of rates determined from the least divergent species pair in larger surveys of multiple transisthmian pairs. Rate differences between arcids and some taxa may be real, but these data suggest that divergence rates can be greatly overestimated when dates corresponding to final closure of the Central American Seaway are used to calibrate the molecular clocks of marine organisms.

Journal ArticleDOI
TL;DR: Rodentia is the largest order of placental mammals, with approximately 2,050 species divided into 28 families as discussed by the authors, and it is also one of the most controversial with respect to its monophyly, relationships between families, and divergence dates.
Abstract: Rodentia is the largest order of placental mammals, with approximately 2,050 species divided into 28 families. It is also one of the most controversial with respect to its monophyly, relationships between families, and divergence dates. Here, we have analyzed and compared the performance of three nuclear genes (von Willebrand Factor, interphotoreceptor retinoid-binding protein, and Alpha 2B adrenergic receptor) for a large taxonomic sampling, covering the whole rodent and placental diversity. The phylogenetic results significantly support rodent monophyly, the association of Rodentia with Lagomorpha (the Glires clade), and a Glires + Euarchonta (Primates, Dermoptera, and Scandentia) clade. The resolution of relationships among rodents is also greatly improved. The currently recognized families are divided here into seven well-defined clades (Anomaluromorpha, Castoridae, Ctenohystrica, Geomyoidea, Gliridae, Myodonta, and Sciuroidea) that can be grouped into three major clades: Ctenohystrica, Gliridae + Sciuroidea, and a mouse-related clade (Anomaluromorpha, Castoridae + Geomyoidea, and Myodonta). Molecular datings based on these three genes suggest that the rodent radiation took place at the transition between Paleocene and Eocene. The divergence between rodents and lagomorphs is placed just at the K-T boundary and the first splits among placentals in the Late Cretaceous. Our results thus tend to reconcile molecular and morphological-paleontological insights.

Journal ArticleDOI
TL;DR: The microsatellite-based estimate of the effective population size of maize is more than an order of magnitude less than previously reported values based on nucleotide sequence variation.
Abstract: Microsatellites are important tools for plant breeding, genetics, and evolution, but few studies have analyzed their mutation pattern in plants. In this study, we estimated the mutation rate for 142 microsatellite loci in maize (Zea mays subsp. mays) in two different experiments of mutation accumulation. The mutation rate per generation was estimated to be 7.7 x 10(-4) for microsatellites with dinucleotide repeat motifs, with a 95% confidence interval from 5.2 x 10(-4) to 1.1 x 10(-3). For microsatellites with repeat motifs of more than 2 bp in length, no mutations were detected; so we could only estimate the upper 95% confidence limit of 5.1 x 10(-5) for the mutation rate. For dinucleotide repeat microsatellites, we also determined that the variance of change in the number of repeats (sigma(m)2) is 3.2. We sequenced 55 of the 73 observed mutations, and all mutations proved to be changes in the number of repeats in the microsatellite or in mononucleotide tracts flanking the microsatellite. There is a higher probability to mutate to an allele of larger size. There is heterogeneity in the mutation rate among dinucleotide microsatellites and a positive correlation between the number of repeats in the progenitor allele and the mutation rate. The microsatellite-based estimate of the effective population size of maize is more than an order of magnitude less than previously reported values based on nucleotide sequence variation.

Journal ArticleDOI
TL;DR: Progress toward answering several remaining questions about Wolbachia evolution—such as which of their host effects are primitive and which are derived, the type of animals they first invaded, and how they were transferred between arthropods and nematodes—is currently hindered by a poor understanding of the relationships between the supergroups.
Abstract: Obligate intracellular bacteria of the genus Wolbachia (Class Alphaproteobacteria, Order Rickettsiales) are currently divided into four taxonomic supergroups on the basis of clustering patterns in ftsZ-based phylogenetic trees (Werren, Zhang, and Guo 1995; Bandi et al. 1998). Supergroups A and B are found only in arthropods, whereas C and D are found only in filarial nematodes. The term supergroup has recently been employed to avoid confusion with designation of more closely related groups based on wsp sequences (Zhou, Rousset, and O’Neill 1998). Wolbachia have generated substantial interest in recent years (Zimmer 2001), primarily because of the effects they have on their arthropod hosts, which include induction of cytoplasmic incompatibility (CI), parthenogenesis, feminization, and male-killing (reviewed in Stouthammer, Breeuwer, and Hurst 1999). Estimation of the phylogenetic relationships within each supergroup has provided useful information about the evolution and biology of these bacteria. The phylogenies of both A and B members have been found to be incongruent with that of their hosts, strongly suggesting horizontal transfer (Werren, Zhang, and Guo 1995). Recently, direct evidence for this phenomenon was found for parasitic wasps sharing a common food source (Huigens et al. 2000). Unlike the case of arthropods, the phylogeny of each nematode Wolbachia supergroup (C and D) appears to match that of their hosts (Casiraghi et al. 2001a), although further gene sequencing studies are required to confirm this. Such phylogenetic congruence would suggest a strictly dependent relationship, and this idea is supported by evidence that removal of Wolbachia using antibiotics has negative effects on the filariae they reside in (Bandi et al. 1999; Langworthy et al. 2000). Progress toward answering several remaining questions about Wolbachia evolution—such as which of their host effects are primitive and which are derived, the type of animals they first invaded, and how they were transferred between arthropods and nematodes—is currently hindered by a poor understanding of the relationships between the supergroups. An improved estimate of Wolbachia phylogeny at this level will require: (1) the inclusion of sequence information from diverse, possibly as-yet-unknown taxa, (2) an appropriate choice of genes and outgroups, and (3) the use of sound data analysis techniques which enable statistical assessment

Journal ArticleDOI
TL;DR: For investigating certain questions regarding the evolution ofCodon usage bias, it is useful to have a summary statistic describing the pattern of codon usage across all amino acids, which can explore general patterns, such as the relationship of cod on usage bias to recombination rate, gene length, or synonymous substitution rate.
Abstract: The phenomenon of codon usage bias has been important in the study of evolution because it provides examples of weak selection working at the molecular level. During the last two decades, evidence has accumulated that some examples of codon usage bias are driven by selection, particularly for species of fungi (e.g., Bennetzen and Hall 1982; Ikemura 1985), bacteria (e.g., Ikemura 1981; Sharp and Li 1987), and insects (e.g., Akashi 1997; Moriyama and Powell 1997). This connection between codon usage bias and selection has been important in stimulating the development of alternatives to strictly neutral theories of molecular evolution (Ohta and Gillespie 1996; Kreitman and Antezana 2000). Uncertainty persists, however, regarding the phylogenetic distribution of codon usage bias (such as whether selection-based codon usage bias is present in mammals; e.g., Karlin and Mrazek 1996; Iida and Akashi 2000; Sueoka and Kawanishi 2000; Smith and EyreWalker 2001a; Urrutia and Hurst 2001). Questions also remain as to what models of selection underlie codon preferences (Kreitman and Antezana 2000), and specifically whether the presence of suboptimal codons is the result of mutation and drift, variation in selection pressure across sites, or antagonistic selection pressures (Smith and Eyre-Walker 2001b). For investigating certain questions regarding the evolution of codon usage bias, it is useful to have a summary statistic describing the pattern of codon usage across all amino acids. Many summary statistics have already been developed to describe the patterns of codon usage. They can be divided roughly into two classes (Comeron and Aguade 1998). One class summarizes the usage of certain preferred codons, and the other compares every codon’s usage to a null distribution (typically uniform usage of synonymous codons). The former class of methods has the disadvantage that it requires a prior knowledge of the preferred codons. With summary statistics one can explore general patterns, such as the relationship of codon usage bias to recombination rate, gene length, or synonymous substitution rate. Observing broad patterns such as these has already provided insight into the evolutionary dynamics of codon usage bias (Kliman and Hey 1993; Akashi and Eyre-Walker 1998; Kreitman and Comeron 1999).

Journal ArticleDOI
TL;DR: This work presents a simple modification for existing distance estimation methods to relax the assumption of the substitution pattern homogeneity among lineages when analyzing DNA and protein sequences and shows that the modified method performs much better than the LogDet methods, which do not require the homogeneity assumption in estimating the number of substitutions per site.
Abstract: Most of the sophisticated methods to estimate evolutionary divergence between DNA sequences assume that the two sequences have evolved with the same pattern of nucleotide substitution after their divergence from their most recent common ancestor (homogeneity assumption) If this assumption is violated, the evolutionary distance estimated will be biased, which may result in biased estimates of divergence times and substitution rates, and may lead to erroneous branching patterns in the inferred phylogenies Here we present a simple modification for existing distance estimation methods to relax the assumption of the substitution pattern homogeneity among lineages when analyzing DNA and protein sequences Results from computer simulations and empirical data analyses for human and mouse genes are presented to demonstrate that the proposed modification reduces the estimation bias considerably and that the modified method performs much better than the LogDet methods, which do not require the homogeneity assumption in estimating the number of substitutions per site We also discuss the relationship of the substitution and mutation rate estimates when the substitution pattern is not the same in the lineages leading to the two sequences compared

Journal ArticleDOI
TL;DR: It is suggested that a mechanism involving complete mtDNA duplication followed by the loss of genes, predetermined by their transcriptional polarity and location in the genome, could generate this gene arrangement from the one ancestral for arthropods.
Abstract: We determined the complete mtDNA sequences of the millipedes Narceus annularus and Thyropygus sp. (Arthropoda: Diplopoda) and identified in both genomes all 37 genes typical for metazoan mtDNA. The arrangement of these genes is identical in the two millipedes, but differs from that inferred to be ancestral for arthropods by the location of four genes/gene clusters. This novel gene arrangement is unusual for animal mtDNA, in that genes with opposite transcriptional polarities are clustered in the genome and the two clusters are separated by two non-coding regions. The only exception to this pattern is the gene for cysteine tRNA, which is located in the part of the genome that otherwise contains all genes with the opposite transcriptional polarity. We suggest that a mechanism involving complete mtDNA duplication followed by the loss of genes, predetermined by their transcriptional polarity and location in the genome, could generate this gene arrangement from the one ancestral for arthropods. The proposed mechanism has important implications for phylogenetic inferences that are drawn on the basis of gene arrangement comparisons.

Journal ArticleDOI
TL;DR: The abundance and distribution of transposable elements (TEs) in a representative part of the euchromatic genome of Drosophila melanogaster were studied by analyzing the sizes and locations of TEs in the genomic sequences of chromosomes 2R, X, and 4.
Abstract: The abundance and distribution of transposable elements (TEs) in a representative part of the euchromatic genome of Drosophila melanogaster were studied by analyzing the sizes and locations of TEs of all known families in the genomic sequences of chromosomes 2R, X, and 4. TEs contribute to up to 2% of the sequenced DNA, which corresponds roughly to the euchromatin of these chromosomes. This estimate is lower than that previously available from in situ data and suggests that TEs accumulate in the heterochromatin more intensively than was previously thought. We have also found that TEs are not distributed at random in the chromosomes and that their abundance is more strongly associated with local recombination rates, rather than with gene density. The results are compatible with the ectopic exchange model, which proposes that selection against deleterious effects of chromosomal rearrangements is a major force opposing element spread in the genome of this species. Selection against insertional mutations also influences the observed patterns, such as an absence of insertions in coding regions. The results of the analyses are discussed in the light of recent findings on the distribution of TEs in other species.

Journal ArticleDOI
TL;DR: This study presents compelling evidence that recombination significantly increases the silent GC content of a genome in a selectively neutral manner, resulting in a highly significant positive correlation between recombination and "GC3s" in the yeast Saccharomyces cerevisiae, and suggests a new model for the evolution of sex.
Abstract: This study presents compelling evidence that recombination significantly increases the silent GC content of a genome in a selectively neutral manner, resulting in a highly significant positive correlation between recombination and "GC3s" in the yeast Saccharomyces cerevisiae. Neither selection nor mutation can explain this relationship. A highly significant GC-biased mismatch repair system is documented for the first time in any member of the Kingdom Fungi. Much of the variation in the GC3s within yeast appears to result from GC-biased gene conversion. Evidence suggests that GC-biased mismatch repair exists in numerous organisms spanning six kingdoms. This transkingdom GC mismatch repair bias may have evolved in response to a ubiquitous AT mutational bias. A significant positive correlation between recombination and GC content is found in many of these same organisms, suggesting that the processes influencing the evolution of the yeast genome may be a general phenomenon. Nonrecombining regions of the genome and nonrecombining genomes would not be subject to this type of molecular drive. It is suggested that the low GC content characteristic of many nonrecombining genomes may be the result of three processes (1) a prevailing AT mutational bias, (2) random fixation of the most common types of mutation, and (3) the absence of the GC-biased gene conversion which, in recombining organisms, permits the reversal of the most common types of mutation. A model is proposed to explain the observation that introns, intergenic regions, and pseudogenes typically have lower GC content than the silent sites of corresponding open reading frames. This model is based on the observation that the greater the heterology between two sequences, the less likely it is that recombination will occur between them. According to this "Constraint" hypothesis, the formation and propagation of heteroduplex DNA is expected to occur, on average, more frequently within conserved coding and regulatory regions of the genome. In organisms possessing GC-biased mismatch repair, this would enhance the GC content of these regions through biased gene conversion. These findings have a number of important implications for the way we view genome evolution and suggest a new model for the evolution of sex.

Journal ArticleDOI
TL;DR: This article presents a Bayesian phylogenetic method that evaluates the adequacy of evolutionary models using posterior predictive distributions and, unlike the likelihood-ratio test and parametric bootstrap, accounts for uncertainty in the phylogeny and model parameters.
Abstract: Bayesian inference is becoming a common statistical approach to phylogenetic estimation because, among other reasons, it allows for rapid analysis of large data sets with complex evolutionary models. Conveniently, Bayesian phylogenetic methods use currently available stochastic models of sequence evolution. However, as with other model-based approaches, the results of Bayesian inference are conditional on the assumed model of evolution: inadequate models (models that poorly fit the data) may result in erroneous inferences. In this article, I present a Bayesian phylogenetic method that evaluates the adequacy of evolutionary models using posterior predictive distributions. By evaluating a model's posterior predictive performance, an adequate model can be selected for a Bayesian phylogenetic study. Although I present a single test statistic that assesses the overall (global) performance of a phylogenetic model, a variety of test statistics can be tailored to evaluate specific features (local performance) of evolutionary models to identify sources failure. The method presented here, unlike the likelihood-ratio test and parametric bootstrap, accounts for uncertainty in the phylogeny and model parameters.

Journal ArticleDOI
TL;DR: The studies reveal an ancient duplication of an MIKC-type gene that occurred before the separation of the lineages that led to extant mosses and vascular plants more than about 450 MYA, and strongly suggest that the most recent common ancestor of mossesand vascular plants contained at least one MI KC(c)-type and one MIKC(*)-type gene.
Abstract: Characterization of seven MADS-box genes, termed PPM1-PPM4 and PpMADS1-PpMADS3, from the moss model species Physcomitrella patens is reported. Phylogeny reconstructions and comparison of exon-intron structures revealed that the genes described here represent two different classes of homologous, yet distinct, MIKC-type MADS-box genes, termed MIKC(c)-type genes-"(c)" stands for "classic"-(PPM1, PPM2, PpMADS1) and MIKC(*)-type genes (PPM3, PPM4, PpMADS2, PpMADS3). The two gene classes deviate from each other in a characteristic way, especially in a sequence stretch termed intervening region. MIKC(c)-type genes are abundantly present in all land plants which have been investigated in this respect, and give rise to well-known gene types such as floral meristem and organ identity genes. In contrast, LAMB1 from the clubmoss Lycopodium annotinum was identified as the only other MIKC(*)-type gene published so far. Our findings strongly suggest that the most recent common ancestor of mosses and vascular plants contained at least one MIKC(c)-type and one MIKC(*)-type gene. Our studies thus reveal an ancient duplication of an MIKC-type gene that occurred before the separation of the lineages that led to extant mosses and vascular plants more than about 450 MYA. The identification of bona fide K-domains in both MIKC(*)-type and MIKC(c)-type proteins suggests that the K-domain is more ancient than is suggested by a recent alternative hypothesis. MIKC(*)-type genes may have escaped identification in ferns and seed plants so far. It seems more likely, however, that they represent a class of genes which has been lost in the lineage which led to extant ferns and seed plants. The high number of P. patens MADS-box genes and the presence of a K-box in the coding region and of some potential binding sites for MADS-domain proteins and other transcription factors in the putative promoter regions of these genes suggest that MADS-box genes in mosses are involved in complex gene regulatory networks similar to those in flowering plants.

Journal ArticleDOI
TL;DR: It is shown that genes expressed solely in spermatozoa represent a highly diverged subset among mouse and human tissue-specific orthologs, and the molecular action of sexual selection on a variety of characters involved in mammalian sperm function is highlighted.
Abstract: A growing number of genes involved in sex and reproduction have been demonstrated to be rapidly evolving Here, we show that genes expressed solely in spermatozoa represent a highly diverged subset among mouse and human tissue-specific orthologs The average rate of nonsynonymous substitutions per site (K a ) is significantly higher in sperm proteins (mean K a = 018; N = 35) than in proteins expressed specifically in all other tissues (mean K a = 0074; N = 473) No differences, however, are found in the synonymous substitution rate (K s ) between tissues, suggesting that selective forces, and not mutation rate, explain the high rate of replacement substitutions in sperm proteins Four out of 19 sperm-specific genes with characterized function demonstrated evidence of strong positive Darwinian selection, including a protein involved in gene regulation, Protamine-1 (PRM1), a protein involved in glycolysis, GAPDS, and two egg-binding proteins, Adam-2 precursor (ADAM2) and sperm-adhesion molecule-1 (SAM1) These results demonstrate the rapid evolution of sperm-specific genes and highlight the molecular action of sexual selection on a variety of characters involved in mammalian sperm function

Journal ArticleDOI
TL;DR: Maximum likelihood and Bayesian phylogenetic analyses of a 47 placental taxa data set resolved the phylogeny of Xenarthra with some evidence for two radiation events in armadillos and provided a strongly supported picture of placental interordinal relationships.
Abstract: Extant xenarthrans (armadillos, anteaters and sloths) are among the most derived placental mammals ever evolved. South America was the cradle of their evolutionary history. During the Tertiary, xenarthrans experienced an extraordinary radiation, whereas South America remained isolated from other continents. The 13 living genera are relics of this earlier diversification and represent one of the four major clades of placental mammals. Sequences of the three independent protein-coding nuclear markers alpha2B adrenergic receptor (ADRA2B), breast cancer susceptibility (BRCA1), and von Willebrand Factor (VWF) were determined for 12 of the 13 living xenarthran genera. Comparative evolutionary dynamics of these nuclear exons using a likelihood framework revealed contrasting patterns of molecular evolution. All codon positions of BRCA1 were shown to evolve in a strikingly similar manner, and third codon positions appeared less saturated within placentals than those of ADRA2B and VWF. Maximum likelihood and Bayesian phylogenetic analyses of a 47 placental taxa data set rooted by three marsupial outgroups resolved the phylogeny of Xenarthra with some evidence for two radiation events in armadillos and provided a strongly supported picture of placental interordinal relationships. This topology was fully compatible with recent studies, dividing placentals into the Southern Hemisphere clades Afrotheria and Xenarthra and a monophyletic Northern Hemisphere clade (Boreoeutheria) composed of Laurasiatheria and Euarchontoglires. Partitioned likelihood statistical tests of the position of the root, under different character partition schemes, identified three almost equally likely hypotheses for early placental divergences: a basal Afrotheria, an Afrotheria + Xenarthra clade, or a basal Xenarthra (Epitheria hypothesis). We took advantage of the extensive sampling realized within Xenarthra to assess its impact on the location of the root on the placental tree. By resampling taxa within Xenarthra, the conservative Shimodaira-Hasegawa likelihood-based test of alternative topologies was shown to be sensitive to both character and taxon sampling.