scispace - formally typeset
Search or ask a question

Showing papers in "Molecular Biology and Evolution in 1996"


Journal ArticleDOI
TL;DR: A versatile method, quartet puzzling, is introduced to reconstruct the topology (branching pattern) of a phylogenetic tree based on DNA or amino acid sequence data and outperforms neighbor joining in some cases with high transition/transversion bias.
Abstract: A versatile method, quartet puzzling, is introduced to reconstruct the topology (branching pattern) of a phylogenetic tree based on DNA or amino acid sequence data. This method applies maximum-likelihood tree reconstruction to all possible quartets that can be formed from n sequences. The quartet trees serve as starting points to reconstruct a set of optimal n-taxon trees. The majority rule consensus of these trees defines the quartet puzzling tree and shows groupings that are well supported. Computer simulations show that the performance of quartet puzzling to reconstruct the true tree is always equal to or better than that of neighbor joining. For some cases with high transition/transversion bias quartet puzzling outperforms neighbor joining by a factor of 10. The application of quartet puzzling to mitochondrial RNA and tRNAVd’ sequences from amniotes demonstrates the power of the approach. A PHYLIP-compatible ANSI C program, PUZZLE, for analyzing nucleotide or amino acid sequence data is available.

2,620 citations


Journal ArticleDOI
TL;DR: It is concluded that a proportion of the inferred change in the nonmitochondrial sequences occurred before transposition, and that Sitobion aphids (and other species exhibiting mtDNA transposition) may be important for studying the molecular evolution of mtDNA and pseudogenes.
Abstract: Polymerase chain reaction (PCR) products corresponding to 803 bp of the cytochrome oxidase subunits I and II region of mitochondrial DNA (mtDNA COI-II) were deduced to consist of multiple haplotypes in three Sitobion species. We investigated the molecular basis of these observations. PCR products were cloned, and six clones from one individual per species were sequenced. In each individual, one sequence was found commonly, but also two or three divergent sequences were seen. The divergent sequences were shown to be nonmitochondrial by sequencing from purified mtDNA and Southern blotting experiments. All seven nonmitochondrial clones sequenced to completion were unique. Nonmitochondrial sequences have a high proportion of unique sites, and very few characters are shared between nonmitochondrial clones to the exclusion of mtDNA. From these data, we infer that fragments of mtDNA have been transposed separately (probably into aphid chromosomes), at a frequency only known to be equalled in humans. The transposition phenomenon appears to occur infrequently or not at all in closely related genera and other aphids investigated. Patterns of nucleotide substitution in mtDNA inferred over a parsimony tree are very different from those in transposed sequences. Compared with mtDNA, nonmitochondrial sequences have less codon position bias, more even exchanges between A, G, C and T, and a higher proportion of nonsynonymous replacements. Although these data are consistent with the transposed sequences being under less constraint than mtDNA, changes in the nonmitochondrial sequences are not random: there remains significant position bias, and probable excesses of synonymous replacements and of conservative inferred amino acid replacements. We conclude that a proportion of the inferred change in the nonmitochondrial sequences occurred before transposition. We believe that Sitobion aphids (and other species exhibiting mtDNA transposition) may be important for studying the molecular evolution of mtDNA and pseudogenes. However, our data highlight the need to establish the true evolutionary relationships between sequences in comparative investigations.

1,076 citations


Journal ArticleDOI
TL;DR: The method of Hidden Markov Models is used to allow for unequal and unknown evolutionary rates at different sites in molecular sequences and it is shown how to use the Newton-Raphson method to estimate branch lengths of a phylogeny and to infer from a phylogenies what assignment of rates to sites has the largest posterior probability.
Abstract: The method of Hidden Markov Models is used to allow for unequal and unknown evolutionary rates at different sites in molecular sequences. Rates of evolution at different sites are assumed to be drawn from a set of possible rates, with a finite number of possibilities. The overall likelihood of phylogeny is calculated as a sum of terms, each term being the probability of the data given a particular assignment of rates to sites, times the prior probability of that particular combination of rates. The probabilities of different rate combinations are specified by a stationary Markov chain that assigns rate categories to sites. While there will be a very large number of possible ways of assigning rates to sites, a simple recursive algorithm allows the contributions to the likelihood from all possible combinations of rates to be summed, in a time proportional to the number of different rates at a single site. Thus with three rates, the effort involved is no greater than three times that for a single rate. This “Hidden Markov Model” method allows for rates to differ between sites and for correlations between the rates of neighboring sites. By summing over all possibilities it does not require us to know the rates at individual sites. However, it does not allow for correlation of rates at nonadjacent sites, nor does it allow for a continuous distribution of rates over sites. It is shown how to use the Newton-Raphson method to estimate branch lengths of a phylogeny and to infer from a phylogeny what assignment of rates to sites has the largest posterior probability. An example is given using phemoglobin DNA sequences in eight mammal species; the regions of high and low evolutionary rates are inferred and also the average length of patches of similar rates.

831 citations


Journal ArticleDOI
TL;DR: Analyses of the genomes of three prokaryotes revealed a new type of genomic compartmentalization of base frequencies, showing that the substitution patterns of the two strands of DNA were asymmetric.
Abstract: Analyses of the genomes of three prokaryotes, Escherichia coli, Bacillus subtilis, and Haemophilus influenzae, revealed a new type of genomic compartmentalization of base frequencies. There was a departure from intrastrand equifrequency between A and T or between C and G, showing that the substitution patterns of the two strands of DNA were asymmetric. The positions of the boundaries between these compartments were found to coincide with the origin and terminus of chromosome replication, and there were more A-T and C-G deviations in intergenic regions and third codon positions, suggesting that a mutational bias was responsible for this asymmetry. The strand asymmetry was found to be due to a difference in base compositions of transcripts in the leading and lagging strands. This difference is sufficient to affect codon usage, but it is small compared to the effects of gene expressivity and amino-acid composition.

726 citations


Journal ArticleDOI
TL;DR: The results obtained represent the first molecular evidence for a Tardigrada + Arthropoda clade and indicate the need to review those obtained solely on morphological characters.
Abstract: The complete 18S rDNA gene sequence of Macrobiotus group hufelandi (Tardigrada) was obtained and aligned with 18S rDNA and rRNA gene sequences of 24 metazoans (mainly protostomes) Discrete character (maximum-parsimony) and distance (neighbor-joining) methods were used to infer their phylogeny The evolution of bootstrap proportions with sequence length (pattern of resolved nodes, PRN) was studied to test the resolution of the nodes in neighbor-joining trees The results show that arthropods are monophyletic Tardigrades represent the sister group of arthropods (in parsimony analyses) or they are related with crustaceans (distance analysis and PRN) Arthropoda are divided into two main evolutionary lines, the Hexapoda + Crustacea line (weakly supported), and the Myriapoda + Chelicerata line The Hexapoda + Crustacea line includes Pentastomida, but the internal resolution is far from clear The Insecta (Ectognatha) are monophyletic, but no evidence for the monophyly of Hexapoda is found The Chelicerata are a monophyletic group and the Myriapoda cluster close to Arachnida Overall, the results obtained represent the first molecular evidence for a Tardigrada + Arthropoda clade In addition, the congruence between molecular phylogenies of the Arthropoda from other authors and this obtained here indicates the need to review those obtained solely on morphological characters

529 citations


Journal ArticleDOI
TL;DR: It is found that recent population expansions and mutation rate heterogeneity have similar effects on several polymorphism indicators, like the shape and the mean of the observed pairwise difference distribution, or the number of segregating sites, if nucleotides have dissimilar substitution rates.
Abstract: In order to study the effect of mutation rate heterogeneity on patterns of DNA polymorphism, we simulated samples of DNA sequences with gamma-distributed nucleotide substitution rates in stationary and expanding populations. We find that recent population expansions and mutation rate heterogeneity have similar effects on several polymorphism indicators, like the shape and the mean of the observed pairwise difference distribution, or the number of segregating sites. The inferred size of population expansion thus appears overestimated if nucleotides have dissimilar substitution rates. Interestingly, population expansion and uneven mutation rates have contrasting effects on Tajima's D statistic when acting separately, and the consequence on the associated test of selective neutrality is investigated. The patterns of polymorphism of several human populations analyzed for the mitochondrial control region are examined, mainly showing the difficulty in quantifying the respective contribution of past demographic history and uneven mutation rates from a single sampled evolutionary process. However, substitution rates appear more heterogeneous in the second hypervariable segment of the control region than in the first segment.

420 citations


Journal ArticleDOI
TL;DR: It is concluded that many variables are affecting levels of DNA polymorphism in Drosophila, from properties of nucleotides to population history and, perhaps, mating structure.
Abstract: We have summarized and analyzed all available nuclear DNA sequence polymorphism studies for three species of Drosophila, D. melanogaster (24 loci), D. simulans (12 loci), and D. pseudoobscura (5 loci). Our major findings are: (1) The average nucleotide heterozygosity ranges from about 0.4% to 2% depending upon species and function of the region, i.e., coding or noncoding. (2) Compared to D. simulans and D. pseudoobscura (which are about equally variable), D. melanogaster displays a low degree of DNA polymorphism. (3) Noncoding introns and 3' and 5' flanking DNA shows less polymorphism than silent sites within coding DNA. (4) X-linked genes are less variable than autosomal genes. (5) Transition (Ts) and transversion (Tv) polymorphisms are about equally frequent in non-coding DNA and at fourfold degenerate sites in coding DNA while Ts polymorphisms outnumber Tv polymorphisms by about 2:1 in total coding DNA. The increased Ts polymorphism in coding regions is likely due to the structure of the genetic code: silent changes are more often Ts's than are replacement substitutions. (6) The proportion of replacement polymorphisms is significantly higher in D. melanogaster than in D. simulans. (7) The level of variation in coding DNA and the adjacent noncoding DNA is significantly correlated indicating regional effects, most notably recombination. (8) Surprisingly, the level of polymorphism at silent coding sites in D. melanogaster is positively correlated with degree of codon usage bias. (9) Three proposed tests of the neutral theory of DNA polymorphisms have been performed on the data: Tajima's test, the HKA test, and the McDonald-Kreitman test. About half of the loci fail to conform to the expectations of neutral theory by one of the tests. We conclude that many variables are affecting levels of DNA polymorphism in Drosophila, from properties of nucleotides to population history and, perhaps, mating structure. No simple, all encompassing explanation satisfactorily accounts for the data.

409 citations


Journal ArticleDOI
TL;DR: It is found that 17 gene groups can be the candidates for the genes on which positive selection may operate, but these genes are found to occupy only about 0.5% of the vast number of gene groups so far available.
Abstract: We conducted a systematic search for the candidate genes on which positive selection may operate, on the premise that for such genes the number of nonsynonymous substitution is expected to be larger than that of synonymous substitutions when the nucleotide sequences of genes under investigation are compared with each other. By obtaining 3,595 groups of homologous sequences from the DDBJ, EMBL, and GenBank DNA sequence databases, we found that 17 gene groups can be the candidates for the genes on which positive selection may operate. Thus, such genes are found to occupy only about 0.5% of the vast number of gene groups so far available. Interestingly enough 9 out of the 17 gene groups were the surface antigens of parasites or viruses.

380 citations


Journal ArticleDOI
TL;DR: The patterns of polymorphism and divergence at charge-altering amino acid sites are presented for the Drosophila ND5 gene to examine the evolution of functionally distinct mutations and suggest that opposing evolutionary pressures may act on different regions of mitochondrial genes and genomes.
Abstract: Recent studies of mitochondrial DNA (mtDNA) variation in mammals and Drosophila have shown an excess of amino acid variation within species (replacement polymorphism) relative to the number of silent and replacement differences fixed between species. To examine further this pattern of nonneutral mtDNA evolution, we present sequence data for the ND3 and ND5 genes from 59 lines of Drosophila melanogaster and 29 lines of D. simulans. Of interest are the frequency spectra of silent and replacement polymorphisms, and potential variation among genes and taxa in the departures from neutral expectations. The Drosophila ND3 and ND5 data show no significant excess of replacement polymorphism using the McDonald-Kreitman test. These data are in contrast to significant departures from neutrality for the ND3 gene in mammals and other genes in Drosophila mtDNA (cytochrome b and ATPase 6). Pooled across genes, however, both Drosophila and human mtDNA show very significant excesses of amino acid polymorphism. Silent polymorphisms at ND5 show a significantly higher variance in frequency than replacement polymorphisms, and the latter show a significant skew toward low frequencies (Tajima's D = -1.954). These patterns are interpreted in light of the nearly neutral theory where mildly deleterious amino acid haplotypes are observed as ephemeral variants within species but do not contribute to divergence. The patterns of polymorphism and divergence at charge-altering amino acid sites are presented for the Drosophila ND5 gene to examine the evolution of functionally distinct mutations. Excess charge-altering polymorphism is observed at the carboxyl terminal and excess charge-altering divergence is detected at the amino terminal. While the mildly deleterious model fits as a net effect in the evolution of nonrecombining mitochondrial genomes, these data suggest that opposing evolutionary pressures may act on different regions of mitochondrial genes and genomes.

380 citations


Journal ArticleDOI
TL;DR: This work investigated the performance of all mitochondrial protein-coding genes to recover two expected phylogenies of tetrapods and mammals and found that simple length differences and rate differences between these genes cannot account for their different phylogenetic performance.
Abstract: A large number of studies in evolutionary biology utilize phylogenetic information obtained from mitochondrial DNA Researchers place trust in this molecule and expect it generally to be a reliable marker for addressing questions ranging from population genetics to phylogenies among distantly related lineages Yet, regardless of the phylogenetic method and weighting treatment, individual mitochondrial genes might potentially produce misleading evolutionary inferences and hence might not constitute an adequate representation neither of the entire mitochondrial genome nor of the evolutionary history of the organisms from which they are derived We investigated the performance of all mitochondrial protein-coding genes to recover two expected phylogenies of tetrapods and mammals According to these tests, mitochondrial protein-coding genes can be roughly classified into three groups of good (ND4, ND5, ND2, cytb, and COI), medium (COII, COIII, ND1, and ND6), and poor (ATPase 6, ND3, ATPase 8, and ND4L) phylogenetic performers in recovering these expected trees among phylogenetically distant relatives How general our findings are is unclear Simple length differences and rate differences between these genes cannot account for their different phylogenetic performance The phylogenetic performance of these mitochondrial genes might depend on various factors that play a role in determining the probability of discovering the correct phylogeny such as the density of lineage creation events in time, the phylogenetic "depth" of the question, lineage-specific rate heterogeneity, and the completeness of taxa representation

378 citations


Journal ArticleDOI
TL;DR: These results show that polymorphism in mate recognition loci required for rapid evolution of sexual isolation can arise within natural populations.
Abstract: Bindin is a gamete recognition protein of sea urchins that mediates species-specific attachment of sperm to an egg-surface receptor during fertilization. Sequences of bindin from closely related urchins show fixed species-specific differences. Within species, highly polymorphic bindin alleles result from point substitution, insertion/deletion, and recombination. Since speciation, positive selection favoring allelic variants has generated diversity in bindin polypeptides. Intraspecific bindin variation can be tolerated by the egg receptor, which suggests functional parallels between this system and other flexible recognition systems, including immune recognition. These results show that polymorphism in mate recognition loci required for rapid evolution of sexual isolation can arise within natural populations.

Journal ArticleDOI
TL;DR: The rbcL phylogeny reveals a surprising number of gene relationships that are fundamentally incongruent with organismal relationships as inferred from multiple lines of other molecular evidence, suggesting that the rubisco operon has undergone multiple events of both horizontal gene transfer and gene duplication in different lineages.
Abstract: Previous work has shown that molecular phylogenies of plastids, cyanobacteria, and proteobacteria based on the rubisco (ribulose-1,5-bisphosphate carboxylase/oxygenase) genes rbcL and rbcS are incongruent with molecular phylogenies based on other genes and are also incompatible with structural and biochemical information. Although it has been much speculated that this is the consequence of a single horizontal gene transfer (of a proteobacterial or mitochondrial rubisco operon into plastids of rhodophytic and chromophytic algae), neither this hypothesis nor the alternative hypothesis of ancient gene duplication have been examined in detail. We have conducted phylogenetic analyses of all available bacterial rbcL sequences, and representative plastid sequences, in order to explore these alternative hypothesis and fully examine the complexity of rubisco gene evolution. The rbcL phylogeny reveals a surprising number of gene relationships that are fundamentally incongruent with organismal relationships as inferred from multiple lines of other molecular evidence. On the order of six horizontal gene transfers are implied by the form I (L8S8) rbcL phylogeny, two between cyanobacteria and proteobacteria, one between proteobacteria and plastids, and three within proteobacteria. Alternatively, a single ancient duplication of the form I rubisco operon, followed by repeated and pervasive differential loss of one operon or the other, would account for much of this incongruity. In all probability, the rubisco operon has undergone multiple events of both horizontal gene transfer and gene duplication in different lineages.

Journal ArticleDOI
TL;DR: Divergent genes from Caenorhabditis elegans and Saccharomyces cerevisiae that have been proposed to represent the novel classes delta- and epsilon-tubulin were found to be specifically related to gamma-tubulins from animals and fungi respectively, and therefore are best seen as rapidly evolving orthologues of gamma- Tubulin.
Abstract: The tubulin gene family, which includes alpha-,beta-, and gamma-tubulin subfamilies, is composed of highly conserved proteins which are the principle structural and functional components of eukaryotic microtubules. We are interested in (1) establishing when in eukaryotic evolution the duplications leading to paralogous alpha, beta, and gamma subfamilies occurred and (2) the possible utility of tubulin sequences in reconstructing organismal phylogeny. To broaden the taxonomic representation of alpha-tubulins so that it roughly equals that of beta-tubulins, alpha-tubulin genes from three Microsporidia (Encephalitozoon hellem, Nosema locustae, and Spraguea lophii), two Parabasalia (Monocercomonas sp. and Trichomitus batrachorum), and one Heterolobosean (Acrasis rosea) were sequenced. With these new genes, phylogenetic trees of alpha- and beta-tubulins were constructed and compared. Trees were congruent with each other, but incongruent with other molecular phylogenies. The agreement between alpha- and beta-tubulin trees could arise by the co-adaptation of one molecule to variants of the other as a result of their intimate steric association in microtubules. Thus, these trees may not be providing independent support for the phylogenetic results. However, one of these unexpected results, that microsporidia cluster with fungi, is supported by other circumstantial evidence, and may therefore reflect a real relationship despite the basal position usually assigned to microsporidia. Relationships between the three tubulins were also examined by constructing trees of all three types. These trees were found to be of limited value for determining the position of the root within each subfamily because of the great interfamily distances, but they do confirm the classification of all known genes into three monophyletic subfamilies. Divergent genes from Caenorhabditis elegans and Saccharomyces cerevisiae that have been proposed to represent the novel classes delta- and epsilon-tubulin were found to be specifically related to gamma-tubulins from animals and fungi respectively, and therefore are best seen as rapidly evolving orthologues of gamma-tubulin.

Journal ArticleDOI
TL;DR: The relative efficiencies of different protein-coding genes of the mitochondrial genome and different tree-building methods in recovering a known vertebrate phylogeny was evaluated, and the utility of the currently used optimization principles in phylogenetic construction is questioned.
Abstract: The relative efficiencies of different protein-coding genes of the mitochondrial genome and different tree-building methods in recovering a known vertebrate phylogeny (two whale species, cow, rat, mouse, opossum, chicken, frog, and three bony fish species) was evaluated. The tree-building methods examined were the neighbor joining (NJ), minimum evolution (ME), maximum parsimony (MP), and maximum likelihood (ML), and both nucleotide sequences and deduced amino acid sequences were analyzed. Generally speaking, amino acid sequences were better than nucleotide sequences in obtaining the true tree (topology) or trees close to the true tree. However, when only first and second codon positions data were used, nucleotide sequences produced reasonably good trees. Among the 13 genes examined, Nd5 produced the true tree in all tree-building methods or algorithms for both amino acid and nucleotide sequence data. Genes Cytb and Nd4 also produced the correct tree in most tree-building algorithms when amino acid sequence data were used. By contrast, Co2, Nd1, and Nd41 showed a poor performance. In general, large genes produced better results, and when the entire set of genes was used, all tree-building methods generated the true tree. In each tree-building method, several distance measures or algorithms were used, but all these distance measures or algorithms produced essentially the same results. The ME method, in which many different topologies are examined, was no better than the NJ method, which generates a single final tree. Similarly, an ML method, in which many topologies are examined, was no better than the ML star decomposition algorithm that generates a single final tree. In ML the best substitution model chosen by using the Akaike information criterion produced no better results than simpler substitution models. These results question the utility of the currently used optimization principles in phylogenetic construction. Relatively simple methods such as the NJ and ML star decomposition algorithms seem to produce as good results as those obtained by more sophisticated methods. The efficiencies of the NJ, ME, MP, and ML methods in obtaining the correct tree were nearly the same when amino acid sequence data were used. The most important factor in constructing reliable phylogenetic trees seems to be the number of amino acids or nucleotides used.

Journal ArticleDOI
TL;DR: This work advocates the use of conserved motifs and other secondary structure information for assessing sequencing fidelity, and is similar to previous models but is more specific to mitochondrial DNA, fitting both invertebrate and vertebrate groups.
Abstract: Secondary structure models are an important step for aligning sequences, understanding probabilities of nucleotide substitutions, and evaluating the reliability of phylogenetic reconstructions. A set of conserved sequence motifs is derived from comparative sequence analysis of 184 invertebrate and vertebrate taxa (including many taxa from the same genera, families, and orders) with reference to a secondary structure model for domain III of animal mitochondrial small subunit (12S) ribosomal RNA. A template is presented to assist with secondary structure drawing. Our model is similar to previous models but is more specific to mitochondrial DNA, fitting both invertebrate and vertebrate groups, including taxa with markedly different nucleotide compositions. The second half of the domain III sequence can be difficult to align precisely, even when secondary structure information is considered. This is especially true for comparisons of anciently diverged taxa, but well-conserved motifs assist in determining biologically meaningful alignments. Patterns of conservation and variability in both paired and unpaired regions make differential phylogenetic weighting in terms of "stems" and "loops" unsatisfactory. We emphasize looking carefully at the sequence data before and during analyses, and advocate the use of conserved motifs and other secondary structure information for assessing sequencing fidelity.

Journal ArticleDOI
TL;DR: It is shown that the differences that exist between the different operons are ignored when sequences are obtained either after cloning of a single operon or directly from polymerase chain reaction (PCR) products, and that if gene conversions that homogenize the rRNA multigene family are rare events, some nodes in phylogenetic trees will reflect these recombination events and these trees may therefore be gene trees rather than organismal trees.
Abstract: We have analyzed what phylogenetic signal can be derived by small subunit rRNA comparison for bacteria of different but closely related genera (enterobacteria) and for different species or strains within a single genus (Escherichia or Salmonella), and finally how similar are the ribosomal operons within a single organism (Escherichia coli). These sequences have been analyzed by neighbor-joining, maximum likelihood, and parsimony. The robustness of each topology was assessed by bootstrap. Sequences were obtained for the seven rrn operons of E. coli strain PK3. These data demonstrated differences located in three highly variable domains. Their nature and localization suggest that since the divergence of E. coli and Salmonella typhimurium, most point mutations that occurred within each gene have been propagated among the gene family by conversions involving short domains, and that homogenization by conversions may not have affected the entire sequence of each gene. We show that the differences that exist between the different operons are ignored when sequences are obtained either after cloning of a single operon or directly from polymerase chain reaction (PCR) products. Direct sequencing of PCR products produces a mean sequence in which mutations present in the most variable domains become hidden. Cloning a single operon results in a sequence that differs from that of the other operons and of the mean sequence by several point mutations. For identification of unknown bacteria at the species level or below, a mean sequence or the sequence of a single nonidentified operon should therefore be avoided. Taking into account the seven operons and therefore mutations that accumulate in the most variable domains would perhaps increase tree resolution. However, if gene conversions that homogenize the rRNA multigene family are rare events, some nodes in phylogenetic trees will reflect these recombination events and these trees may therefore be gene trees rather than organismal trees.

Journal ArticleDOI
TL;DR: The analyses support a range of 11-16 cross-species transmissions throughout the history of these sequences, and an extension of a nesting procedure is offered for sequence data that forms nested clades used in hypothesis testing, which is superior at establishing relationships and identifying instances of transmission.
Abstract: Using two sets of nucleotide sequences of the human and simian T-cell leukemia/lymphoma virus type I (HTLV-I/STLV-I), one consisting of 522 bp of the env gene from 70 viral strains and the other a 140-bp segment from the pol gene of 52 viral strains, I estimated cladograms based on a statistical parsimony procedure that was developed specifically to estimate within-species gene trees. An extension of a nesting procedure is offered for sequence data that forms nested clades used in hypothesis testing. The nested clades were used to test three hypotheses relating to transmission of HTLV/STLV sequences: (1) Have cross-species transmissions occurred and, if so, how many? (2) In what direction have they occurred? (3) What are the geographic relationships of these transmission events? The analyses support a range of 11-16 cross-species transmissions throughout the history of these sequences. Additionally, outgroup weights were assigned to haplotypes using arguments from coalescence theory to infer directionality of transmission events. Conclusions on geographic origins of transmission events and particular viral strains are inconclusive due to small samples and inadequate sampling design. Finally, this approach is compared directly to results obtained from a traditional maximum parsimony approach and found to be superior at establishing relationships and identifying instances of transmission.

Journal ArticleDOI
TL;DR: The levels of synonymous codon bias is shown to be positively correlated to gene length in Escherichia coli genes which are thought to be expressed at similar levels; it is argued that the positive correlation could be caused by selection to avoid missense errors during translation.
Abstract: The level of synonymous codon bias is shown to be positively correlated to gene length in Escherichia co/i genes which are thought to be expressed at similar levels; these are genes whose products are present in multimeric proteins in equimolar amounts. It is argued that the positive correlation could be caused by selection to avoid missense errors during translation. Since the cost of producing a protein is proportional to its length, selection in favor of codons which increase accuracy should be greater in longer genes, and long genes should therefore have higher synonymous codon bias. It is also shown that there is variation in synonymous codon use which is independent of either expression level, gene length, amino acid composition, or chromosomal location. This variation is consistent with selection for translational accuracy but may have other origins.

Journal ArticleDOI
TL;DR: The pattern of sequence and length variation within and between species, together with the capability of the arrays to form stable secondary structures, suggests that the dominant mechanism involved in the evolution of these arrays in unidirectional replication slippage is related to asymmetry in the replication of each strand of the mtDNA molecule.
Abstract: The complete mitochondrial DNA (mtDNA) control region was amplified and directly sequenced in two species of shrew, Crocidura russula and Sorex araneus (Insectivora, Mammalia). The general organization is similar to that found in other mammals: a central conserved region surrounded by two more variable domains. However, we have found in shrews the simultaneous presence of arrays of tandem repeats in potential locations where repeats tend to occur separately in other mammalian species. These locations correspond to regions which are associated with a possible interruption of the replication processes, either at the end of the three-stranded D-loop structure or toward the end of the heavy-strand replication. In the left domain the repeated sequences (R1 repeats) are 78 bp long, whereas in the right domain the repeats are 12 bp long in C. russula and 14 bp long in S. araneus (R2 repeats). Variation in the copy number of these repeated sequences results in mtDNA control region length differences. Southern blot analysis indicates that level of heteroplasmy (more than one mtDNA form within an individual) differs between species. A comparative study of the R2 repeats in 12 additional species representing three shrew subfamilies provides useful indications for the understanding of the origin and the evolution of these homologous tandemly repeated sequences. An asymmetry in the distribution of variants within the arrays, as well as the constant occurrence of shorter repeated sequences flanking only one side of the R2 arrays, could be related to asymmetry in the replication of each strand of the mtDNA molecule. The pattern of sequence and length variation within and between species, together with the capability of the arrays to form stable secondary structures, suggests that the dominant mechanism involved in the evolution of these arrays in unidirectional replication slippage.

Journal ArticleDOI
TL;DR: A comparative study of the transfer RNA genes in animal mitochondrial and nuclear genomes demonstrates that the former accumulate nucleotide substitutions much more rapidly than do the latter, and several lines of evidence are consistent with the idea that the excess substitutions are mildly deleterious.
Abstract: The accumulation of deleterious mutations is thought to be a major factor preventing the long-term persistence of obligately asexual lineages relative to their sexual ancestors. This phenomenon is also of potential relevance to sexual species that harbor asexually propagating organelle genomes. A comparative study of the transfer RNA genes in animal mitochondrial and nuclear genomes demonstrates that the former accumulate nucleotide substitutions much more rapidly than do the latter, and several lines of evidence are consistent with the idea that the excess substitutions are mildly deleterious. First, the average binding stability between complementary strands in the stems of mitochondrial tRNAs is less than half that in nuclear tRNAs. Second, most loop sizes in the mitochondrial tRNAs have experienced a net reduction in size over evolutionary time, and they are nearly 50 times more variable in the mitochondrial than in the nuclear genome. Third, although nearly 20% of the nucleotides in nuclear tRNA genes (particularly those involved in tertiary interactions) are invariant across all animal taxa and all tRNA species, there are no invariant sites in the mitochondrial tRNAs. These observations, as well as results from recent laboratory experiments, are consistent with the hypothesis that nonrecombining organelle genomes are subject to gradual loss of fitness due to the cumulative chance fixation of mildly deleterious mutations.

Journal ArticleDOI
TL;DR: The ITS data indicated a near contemporary divergence of domesticated maize and its two closest wild relatives.
Abstract: Ribosomal internal transcribed spacer (ITS) sequences were used to evaluate the phylogenetics of Zea and Tripsacum. Maximum likelihood and polymorphism parsimony were used for phylogenetic reconstructions. Zea ITS nucleotide diversity was high compared to other plant species, but approximately equivalent to other maize loci. Coalescence of ITS alleles was rapid relative to other nuclear loci; however, there was still much diversity within populations. Zea and Tripsacum form a clade clearly differentiated from all other Poaceae. Four Zea ITS pseudogenes were identified by phylogenetic position and nucleotide composition. The phylogenetic position of Z. mays ssp. huehuetenangensis was clearly established as basal to the other Z. mays. The ITS phylogeny disfavored a Z. luxurians and Z. diploperennis clade, which conflicted with some previous studies. The introgression of Z. mays alleles into Z. perennis and Z. diploperennis was also established. The ITS data indicated a near contemporary divergence of domesticated maize and its two closest wild relatives.

Journal ArticleDOI
TL;DR: The results suggest that the duplicate state of the control-region-like sequences has long persisted in snake mtDNAs, possibly since the original insertion of theControl- Region-like sequence and tRNA(Leu) gene into the tRNA gene cluster, which occurred in the early stage of the divergence of snakes.
Abstract: Mitochondrial DNA (mtDNA) regions corresponding to two major tRNA gene clusters were amplified and sequenced for the Japanese pit viper, himehabu. In one of these clusters, which in most vertebrates characterized to date contains three tightly connected genes for tRNA(Ile), and tRNA(Gln), and tRNA(Met), a sequence of approximately 1.3 kb was found to be inserted between the genes for tRNA(Ile) and tRNA(Gln). The insert consists of a control-region-like sequence possessing some conserved sequence blocks, and short flanking sequences which may be folded into tRNA(Pro), tRNA(Phe), and tRNA(Leu) genes. Several other snakes belonging to different families were also found to possess a control-region-like sequence and tRNA(Leu) gene between the tRNA(Ile)and tRNA(Gln) genes. We also sequenced a region surrounded by genes for cytochrome b and 12S rRNA, where the control region and genes for tRNA(Pro) and tRNA(Phe) are normally located in the mtDNAs of most vertebrates. In this region of three examined snakes, a control-region-like sequence exists that is almost completely identical to the one found between the tRNA(Ile) and tRNA(Gln) genes. The mtDNAs of these snakes thus possess two nearly identical control-region-like sequences which are otherwise divergent to a large extent between the species. These results suggest that the duplicate state of the control-region-like sequences has long persisted in snake mtDNAs, possibly since the original insertion of the control-region-like sequence and tRNA(Leu) gene into the tRNA gene cluster, which occurred in the early stage of the divergence of snakes. It is also suggested that the duplicated control-region-like sequences at two distant locations of mtDNA have evolved concertedly by a mechanism such as frequent gene conversion. The secondary structures of the determined tRNA genes point to the operation of simplification pressure on the T psi C arm of snake mitochondrial tRNAs.

Journal ArticleDOI
TL;DR: The results suggest that harbor seals are regionally philopatric, on the scale of several hundred kilometers, however, genetic discontinuities may exist, even between neighboring populations such as those on the Scottish and east English coasts or the east and west Baltic.
Abstract: The harbor seal (Phoca vitulina) has one of the broadest geographic distributions of any pinniped, stretching from the east Baltic, west across the Atlantic and Pacific Oceans to southern Japan. Although individuals may travel several hundred kilometers on annual feeding migrations, harbor seals are generally believed to be philopatric, returning to the same areas each year to breed. Consequently, seals from different areas are likely to be genetically differentiated, with levels of genetic divergence increasing with distance. Differentiation may also be caused by long-standing topographic barriers such as the polar sea ice. We analyzed samples of 227 harbor seals from 24 localities and defined 34 genotypes based on 435 bp of control region sequence. Phylogenetic analysis and analysis of molecular variance showed that populations in the Atlantic and Pacific Oceans and east and west coast populations of these oceans are significantly differentiated. Within these four regions, populations that are geographically farthest apart generally are the most differentiated and often do not share genotypes or differ in genotype frequency. The average corrected sequence divergence between populations in the Atlantic and Pacific Oceans is 3.28% +/- 0.38% and those among populations within each of these oceans are 0.75% +/- 0.69% and 1.19% +/- 0.65%, respectively. Our results suggest that harbor seals are regionally philopatric, on the scale of several hundred kilometers. However, genetic discontinuities may exist, even between neighboring populations such as those on the Scottish and east English coasts or the east and west Baltic. The mitochondrial data are consistent with an ancient isolation of populations in both oceans, due to the development of polar sea ice. In the Atlantic and Pacific, populations appear to have been colonized from west to east with the European populations showing the most recent common ancestry. We suggest the recent ancestry of European seal populations may reflect recolonization from Ice Age refugia after the last glaciation.

Journal ArticleDOI
TL;DR: Phylogenetic analyses of the casein data suggest that hippopotamid artiodactyls are more closely related to cetaceans than to other artiodactoryls (even-toed hoofed mammals), and an analysis of the nuclear casein sequences combined with published mitochondrial cytochrome b DNA sequences also supports the Cetacea/Hippopotamidae sister group.
Abstract: The inferred transition from terrestrial hoofed mammal to fully aquatic cetacean has been intensively studied with fossil evidence. However, large sections of this remarkable evolutionary sequence are missing. Phylogenetic analysis of extant taxa may help to fill in some of these gaps. In this report, kappa-casein (exon 4) and beta-casein (exon 7) milk protein genes from cetaceans and other placental mammals were PCR-amplified, sequenced, and aligned to previously published sequences. Phylogenetic analyses of the casein data suggest that hippopotamid artiodactyls are more closely related to cetaceans than to other artiodactyls (even-toed hoofed mammals). An analysis of the nuclear casein sequences combined with published mitochondrial cytochrome b DNA sequences also supports the Cetacea/Hippopotamidae sister group. This affinity implies that some of the aquatic traits of cetaceans were derived in the common ancestor of Cetacea and Hippopotamidae. An extant "missing link" to Cetacea may have been overlooked by science since the description of the semiaquatic Hippopotamus in 1758. Paleontological information is grossly inconsistent with this hypothesis. If the casein phylogeny is accurate, large gaps in the fossil record as well as extensive morphological reversals and convergences must be acknowledged.

Journal ArticleDOI
TL;DR: The analysis of the deep-level phylogenetic signal in the highly conserved but short 5.8S and hypervariable ITS2 sequences indicates that ITS region sequences can diagnose organismal origins and phylogenetic relationships at many phylogenetic levels and provide a useful paradigm for molecular evolutionary study.
Abstract: The similarity of certain reported angiosperm rDNA internal transcribed spacer (ITS) region sequences to those of green algae prompted our analysis of the deep-level phylogenetic signal in the highly conserved but short 5.8s and hypervariable ITS2 sequences. We found that 5.8s sequences yield phylogenetic trees similar to but less well supported than those generated by a ca. lo-fold longer alignment from rDNA- 18s sequences, as well as independent evidence. We attribute this result to our finding that, compared to 18S, the 5.8s has a higher proportion of sites subject to vary and greater among-site substitution rate homogeneity. We also determined that our phylogenetic results are not likely affected by intramolecular compensatory mutation to maintain RNA secondary structure nor by evident systematic biases in base composition. Despite historical homology, there appears to be no ITS2 primary sequence similarity shared between fungi, green algae, and angiosperms. ITS2 sequences within each of these groups, however, share sufficient similarity to cluster correctly on the basis of alignability. Our results indicate that ITS region sequences can diagnose organismal origins and phylogenetic relationships at many phylogenetic levels and provide a useful paradigm for molecular evolutionary study.

Journal ArticleDOI
TL;DR: Recently developed reduced consensus methods can be used to summarize much additional agreement on hypothesised phylogenetic relationships among multiple trees and are designed to produce more informative summaries of bootstrap analyses and foster more informed assessment of the strengths and weaknesses of complex phylogenetic hypotheses.
Abstract: Bootstrap analyses are usually summarized with majority-rule component consensus trees. This consensus method is based on replicated components and, like all component consensus methods, it is insensitive to other kinds of agreement between trees. Recently developed reduced consensus methods can be used to summarize much additional agreement on hypothesised phylogenetic relationships among multiple trees. The new methods are “strict” in the sense that they require agreement among all the trees being compared for any relationships to be represented in a consensus tree. Majority-rule reduced consensus methods are described and their use in bootstrap analyses is illustrated with a hypothetical and a real example. The new methods provide summaries of the bootstrap proportions of all n-taxon statements/partitions and facilitate the identification of hypotheses of relationships that are supported by high bootstrap proportions, in spite of a lack of support for particular components or clades. In practice majorityrule reduced consensus profiles may contain many trees. The size of the profile can be reduced by constraints on minimal bootstrap proportions and/or cardinality of the included trees. Majority-rule reduced consensus trees can also be selected a posteriori from the profile. Surrogates to the majority-rule reduced consensus methods using partition tables or tree pruning options provided by widely used phylogenetic inference software are also described. The methods are designed to produce more informative summaries of bootstrap analyses and thereby foster more informed assessment of the strengths and weaknesses of complex phylogenetic hypotheses.

Journal ArticleDOI
TL;DR: An evolutionary model that combines protein secondary structure and amino acid replacement is introduced that allows likelihood analysis of aligned protein sequences and does not require the underlying secondary structures of these sequences to be known.
Abstract: An evolutionary model that combines protein secondary structure and amino acid replacement is introduced. It allows likelihood analysis of aligned protein sequences and does not require the underlying secondary (or tertiary) structures of these sequences to be known. One component of the model describes the organization of secondary structure along a protein sequence and another specifies the evolutionary process for each category of secondary structure. A database of proteins with known secondary structures is used to estimate model parameters representing these two components. Phylogeny, the third component of the model, can be estimated from the data set of interest. As an example, we employ our model to analyze a set of sucrose synthase sequences. For the evolution of sucrose synthase, a parametric bootstrap approach indicates that our model is statistically preferable to one that ignores secondary structure.

Journal ArticleDOI
TL;DR: It is proposed that these regulatory mechanisms act at the level of the transposase protein subunits by promoting the assembly of oligomeric forms, or of mixed-subunit oligomers, that have reduced activity to mariner-like elements (MLEs).
Abstract: Genetic studies of the mariner transposable element Mos1 have revealed two novel types of regulatory mechanisms. In one mechanism, overproduction of the wild-type transposase reduces the overall level of transposase activity as assayed by the excision of a nonautonomous mariner target element. This mechanism is termed overproduction inhibition (OPI). Another mechanism is observed in a class of hypomorphic missense mutations in the transposase. In the presence of wild-type Mos1 transposase, these mutations exhibit dominant-negative complementation (DNC) that antagonizes the activity of the wild-type transposase. We propose that these regulatory mechanisms act at the level of the transposase protein subunits by promoting the assembly of oligomeric forms, or of mixed-subunit oligomers, that have reduced activity. We suggest that these regulatory mechanisms may apply generally to mariner-like elements (MLEs). Overproduction inhibition may help explain why the MLE copy number reaches very different levels in different species. Dominant-negative complementation may help explain why most naturally occurring copies of MLEs have been mutationally inactivated.

Journal ArticleDOI
TL;DR: Predictions are developed regarding the relative prevalence of different classes of mutations and are found to compare favorably with reports from the literature and point mutations in target loci were the dominant form of resistance for both lab and field selection.
Abstract: To the prevailing biochemical/physiological classification of mechanisms of organismal resistance to toxicants, an additional molecular dimension is proposed. Predictions are developed regarding the relative prevalence of different classes of mutations and are found to compare favorably with reports from the literature. In particular, point mutations in target loci were the dominant form of resistance for both lab and field selection. Amplifications of target loci were less common than structural mutations, and more common for lab-selected than for field-selected strains. Amplification was the most common mechanism of up-regulation of metabolizing enzymes. In comparison, only one mutation involving cis-regulation and several involving trans-acting regulation were found. Mutations involving gene disruption and down-regulation were uncommon, but were found in appropriate cases, i.e., when toxicants stimulated rather than inhibited target function and when metabolizing enzymes converted toxicants into more toxic metabolites. Additional phenomena of likely but uncertain importance are genetic "succession," recombinational limitation, and negative cross-resistance. More work on these phenomena and on quantification of fitness costs of resistance is recommended.

Journal ArticleDOI
TL;DR: Group I introns were discovered inserted at the same position in the nuclear small-subunit ribosomal DNA (nuc-ssu-rDNA) in several species of homobasidiomycetes (mushroom-forming fungi) andylogenetic analyses of intron sequences suggest that the mushroom introns are monophyletic, and are nested within a clade that contains four other introns that insert at thesame position as the mushrooms.
Abstract: Group I introns were discovered inserted at the same position in the nuclear small-subunit ribosomal DNA (nuc-ssu-rDNA) in several species of homobasidiomycetes (mushroom-forming fungi). Based on conserved intron sequences, a pair of intron-specific primers was designed for PCR amplification and sequencing of intron-containing rDNA repeats. Using the intron-specific primers together with flanking rDNA primers, a PCR assay was conducted to determine presence or absence of introns in 39 species of homobasidiomycetes. Introns were confined to the genera Panellus, Clavicorona, and Lentinellus. Phylogenetic analyses of nuc-ssu-rDNA and mitochondrial ssu-rDNA sequences suggest that Clavicorona and Lentinellus are closely related, but that Panellus is not closely related to these. The simplest explanation for the distribution of the introns is that they have been twice independently gained via horizontal transmission, once on the lineage leading to Panellus, and once on the lineage leading to Lentinellus and Clavicorona. BLAST searches using the introns from Panellus and Lentinellus as query sequences retrieved 16 other similar group I introns of nuc-ssu-rDNA and nuclear large-subunit rDNA (nuc-lsu-rDNA) from fungal and green algal hosts. Phylogenetic analyses of intron sequences suggest that the mushroom introns are monophyletic, and are nested within a clade that contains four other introns that insert at the same position as the mushroom introns, two from different groups of fungi and two from green algae. The distribution of host lineages and insertion sites among the introns suggests that horizontal and vertical transmission, homing, and transposition have been factors in intron evolution. As distinctive, heritable features of nuclear rDNAs in certain lineages, group I introns have promise as phylogenetic markers. Nevertheless, the possibility of horizontal transmission and homing also suggest that their use poses certain pitfalls.