scispace - formally typeset
Search or ask a question

Showing papers in "Journal of Molecular Evolution in 1997"


Journal ArticleDOI
TL;DR: Estimates of amelioration times indicate that the entire Escherichia coli chromosome contains more than 600 kb of horizontally transferred, protein-coding DNA, which predicts that the E. coli and Salmonella enterica lineages have each gained and lost more than 3 megabases of novel DNA since their divergence.
Abstract: Although bacterial species display wide variation in their overall GC contents, the genes within a particular species' genome are relatively similar in base composition. As a result, sequences that are novel to a bacterial genome—i.e., DNA introduced through recent horizontal transfer—often bear unusual sequence characteristics and can be distinguished from ancestral DNA. At the time of introgression, horizontally transferred genes reflect the base composition of the donor genome; but, over time, these sequences will ameliorate to reflect the DNA composition of the new genome because the introgressed genes are subject to the same mutational processes affecting all genes in the recipient genome. This process of amelioration is evident in a large group of genes involved in host-cell invasion by enteric bacteria and can be modeled to predict the amount of time required after transfer for foreign DNA to resemble native DNA. Furthermore, models of amelioration can be used to estimate the time of introgression of foreign genes in a chromosome. Applying this approach to a 1.43-megabase continuous sequence, we have calculated that the entire Escherichia coli chromosome contains more than 600 kb of horizontally transferred, protein-coding DNA. Estimates of amelioration times indicate that this DNA has accumulated at a rate of 31 kb per million years, which is on the order of the amount of variant DNA introduced by point mutations. This rate predicts that the E. coli and Salmonella enterica lineages have each gained and lost more than 3 megabases of novel DNA since their divergence.

856 citations


Journal ArticleDOI
TL;DR: The results do not support the notion that selection pressure induces complementary oligonucleotides in close proximity and therefore numerous secondary structures in prokaryotic DNA, as the genomic G+C content does not behave in the same way as that of folded RNA with respect to optimal growth temperature.
Abstract: G:C pairs are more stable than A:T pairs because they have an additional hydrogen bond. This has led to many studies on the correlation between the guanine+cytosine (G+C) content of nucleic acids and temperature over the last 20 years. We collected the optimal growth temperatures (Topt) and the G+C contents of genomic DNA; 23S, 16S, and 5S ribosomal RNAs; and transfer RNAs for 764 prokaryotic species. No correlation was found between genomic G+C content and Topt, but there were striking correlations between the G+C content of ribosomal and transfer RNA stems and Topt. Two explanations have been proposed—neutral evolution and selection pressure—for the approximate equalities of G and C (respectively, A and T) contents within each strand of DNA molecules. Our results do not support the notion that selection pressure induces complementary oligonucleotides in close proximity and therefore numerous secondary structures in prokaryotic DNA, as the genomic G+C content does not behave in the same way as that of folded RNA with respect to optimal growth temperature.

391 citations


Journal ArticleDOI
TL;DR: A common secondary structure that is conserved despite wide intra- and interfamilial primary sequence divergence is revealed and may contribute significantly to the value of ITS-2 sequences in phylogenetic analyses at several taxonomic levels, but particularly in characterizing populations and species.
Abstract: Sequences of the Internal Transcribed Spacer 2 (ITS-2) regions of the nuclear rDNA repeats from 111 organisms of the family Volvocaceae (Chlorophyta) and unicellular organisms of the Volvocales, including Chlamydomonas reinhardtii, were determined. The use of thermodynamic energy optimization to generate secondary structures and phylogenetic comparative analysis of the spacer regions revealed a common secondary structure that is conserved despite wide intra- and interfamilial primary sequence divergence. The existence of this conserved higher-order structure is supported by the presence of numerous compensating basepair changes as well as by an evolutionary history of insertions and deletions that nevertheless maintains major aspects of the overall structure. Furthermore, this general structure is preserved across broad phylogenetic lines, as it is observed in the ITS-2s of other chlorophytes, including flowering plants; previous reports of common ITS-2 secondary structures in other eukaryotes were restricted to the order level. The reported ITS-2 structure possesses important conserved structural motifs which may help to mediate cleavages in the ITS-2 that occur during rRNA transcript processing. Their recognition can guide further studies of eukaryotic rRNA processing, and their application to sequence alignments may contribute significantly to the value of ITS-2 sequences in phylogenetic analyses at several taxonomic levels, but particularly in characterizing populations and species.

331 citations


Journal ArticleDOI
TL;DR: It is concluded that synonymous codon usage in Drosophila is well explained by tRNA availability and is probably influenced by developmental changes in relative abundance.
Abstract: Codon usage bias of 1,117 Drosophila melanogaster genes, as well as fewer D pseudoobscura and D virilis genes, was examined from the perspective of relative abundance of isoaccepting tRNAs and their changes during development We found that each amino acid contributes about equally and highly significantly to overall codon usage bias, with the exception of Asp which had very low contribution to overall bias Asp was also the only amino acid that did not show a clear preference for one of its synonymous codons Synonymous codon usage in Drosophila was consistent with ``optimal'' codons deduced from the isoaccepting tRNA availability Interestingly, amino acids whose major isoaccepting tRNAs change during development did not show as strong bias as those with developmentally unchanged tRNA pools Asp is the only amino acid for which the major isoaccepting tRNAs change between larval and adult stages We conclude that synonymous codon usage in Drosophila is well explained by tRNA availability and is probably influenced by developmental changes in relative abundance

318 citations


Journal ArticleDOI
TL;DR: A bootstrapped distance tree of 500 small subunit ribosomal RNA sequences from organisms belonging to the so-called crown of eukaryote evolution suggests that animals, true fungi, and choanoflagellates share a common origin and that the animals no longer appear as a monophyletic grouping in most distance trees.
Abstract: In this study we constructed a bootstrapped distance tree of 500 small subunit ribosomal RNA sequences from organisms belonging to the so-called crown of eukaryote evolution. Taking into account the substitution rate of the individual nucleotides of the rRNA sequence alignment, our results suggest that (1) animals, true fungi, and choanoflagellates share a common origin: The branch joining these taxa is highly supported by bootstrap analysis (bootstrap support [BS] > 90%), (2) stramenopiles and alveolates are sister groups (BS = 75%), (3) within the alveolates, dinoflagellates and apicomplexans share a common ancestor BS > 95%), while in turn they both share a common origin with the ciliates (BS > 80%), and (4) within the stramenopiles, heterokont algae, hyphochytriomycetes, and oomycetes form a monophyletic grouping well supported by bootstrap analysis (BS > 85%), preceded by the well-supported successive divergence of labyrinthulomycetes and bicosoecids. On the other hand, many evolutionary relationships between crown taxa are still obscure on the basis of 18S rRNA. The branching order between the animal-fungal-choanoflagellates clade and the chlorobionts, the alveolates and stramenopiles, red algae, and several smaller groups of organisms remains largely unresolved. When among-site rate variation is not considered, the inferred tree topologies are inferior to those where the substitution rate spectrum for the 18S rRNA is taken into account. This is primarily indicated by the erroneous branching of fast-evolving sequences. Moreover, when different substitution rates among sites are not considered, the animals no longer appear as a monophyletic grouping in most distance trees.

271 citations


Journal ArticleDOI
TL;DR: The accuracy of ancestral amino acids inferred by two currently available methods (maximum-parsimony and maximum-likelihood methods) in addition to a distance method, which was newly developed in this paper, give reliable inference when the divergence of amino acid sequences is low.
Abstract: Information about protein sequences of ancestral organisms is important for identifying critical amino acid substitutions that have caused the functional change of proteins in evolution. Using computer simulation, we studied the accuracy of ancestral amino acids inferred by two currently available methods (maximum-parsimony [MP] and maximum-likelihood [ML] methods) in addition to a distance method, which was newly developed in this paper. All three methods give reliable inference when the divergence of amino acid sequences is low. When the extent of sequence divergence is high, however, the ML and distance methods give more accurate results than the MP method, particularly when the phylogenetic tree includes long branches. The accuracy of inferred ancestral amino acids does not change very much when a few present-day sequences are added or eliminated. When an incorrect model of amino acid substitution is used for the ML and distance methods, the accuracy decreases, but it is still higher than that for the MP method. When the tree topology used is partially incorrect, the accuracy in the correct part of the tree is virtually unaffected. The posterior probability of inferred ancestral amino acids computed by the ML and distance methods is an unbiased estimate of the true probability when a correct substitution model is used but may become an overestimate when a simpler model is used.

244 citations


Journal ArticleDOI
TL;DR: It is proposed that changes to the secondary structure of tRNACys may destroy function of the origin for light-strand replication which, in turn, may facilitate shifts in gene order.
Abstract: A phylogenetic tree for major lineages of iguanian lizards is estimated from 1,488 aligned base positions (858 informative) of newly reported mitochondrial DNA sequences representing coding regions for eight tRNAs, ND2, and portions of ND1 and COI. Two well-supported groups are defined, the Acrodonta and the Iguanidae (sensu lato). This phylogenetic hypothesis is used to investigate evolutionary shifts in mitochondrial gene order, origin for light-strand replication, and secondary structure of tRNACys. These three characters shift together on the branch leading to acrodont lizards. Plate tectonics and the fossil record indicate that these characters changed in the Jurassic. We propose that changes to the secondary structure of tRNACys may destroy function of the origin for light-strand replication which, in turn, may facilitate shifts in gene order.

235 citations


Journal ArticleDOI
TL;DR: The Felidae family represents a challenge for molecular phylogenetic reconstruction because it consists of 38 living species that evolved from a relatively recent common ancestor (10–15 million years ago).
Abstract: The Felidae family represents a challenge for molecular phylogenetic reconstruction because it consists of 38 living species that evolved from a relatively recent common ancestor (10-15 million years ago). We have determined mitochondrial DNA sequences from two genes that evolve at relatively rapid evolutionary rates, 16S rRNA (379 bp) and NADH dehydrogenase subunit 5 (NADH-5, 318 bp), from multiple individuals of 35 species. Based on separate and combined gene analyses using minimum evolution, maximum parsimony, and maximum likelihood phylogenetic methods, we recognized eight significant clusters or species clades that likely reflect separate monophyletic evolutionary radiations in the history of this family. The clusters include (1) ocelot lineage, (2) domestic cat lineage, (3) Panthera genus, (4) puma group, (5) Lynx genus, (6) Asian leopard cat group, (7) caracal group, and (8) bay cat group. The results confirm and extend previously hypothesized associations in most cases, but in others, e.g., the bay cat group, suggest novel phylogenetic relationships. The results are compared and evaluated with molecular, cytogenetic, and morphological data to derive a phylogenetic synthesis of field evolutionary history.

213 citations


Journal ArticleDOI
TL;DR: The comparison of FGF and FGF receptor sequences in vertebrates and nonvertebrates shows that the F GF and F GF receptor families have evolved through phases of gene duplications, one of which may have coincided with the emergence of vertebrates, in relation with their new system of body scaffold.
Abstract: FGFs (fibroblast growth factors) play major roles in a number of developmental processes. Recent studies of several human disorders, and concurrent analysis of gene knock-out and properties of the corresponding recombinant proteins have shown that FGFs and their receptors are prominently involved in the development of the skeletal system in mammals. We have compared the sequences of the nine known mammalian FGFs, FGFs from other vertebrates, and three additional sequences that we extracted from existing databases: two human FGF sequences that we tentatively designated FGF10 and FGF11, and an FGF sequence from Caenorhabditis elegans. Similarly, we have compared the sequences of the four FGF receptor paralogs found in chordates with four non-chordate FGF receptors, including one recently identified in C. elegans. The comparison of FGF and FGF receptor sequences in vertebrates and nonvertebrates shows that the FGF and FGF receptor families have evolved through phases of gene duplications, one of which may have coincided with the emergence of vertebrates, in relation with their new system of body scaffold.

210 citations


Journal ArticleDOI
TL;DR: The collembolan COII gene showed the lowest A +T content of all insects so far examined, confirming that the well-known A + T bias in insect mitochondrial genes tends to increase from the basal to apical orders.
Abstract: The sequence of the mitochondrial COII gene has been widely used to estimate phylogenetic relationships at different taxomonic levels across insects. We investigated the molecular evolution of the COII gene and its usefulness for reconstructing phylogenetic relationships within and among four collembolan families. The collembolan COII gene showed the lowest A + T content of all insects so far examined, confirming that the well-known A + T bias in insect mitochondrial genes tends to increase from the basal to apical orders. Fifty-seven percent of all nucleotide positions were variable and most of the third codon positions appeared free to vary. Values of genetic distance between congeneric species and between families were remarkably high; in some cases the latter were higher than divergence values between other orders of insects. The remarkably high divergence levels observed here provide evidence that collembolan taxa are quite old; divergence levels among collembolan families equaled or exceeded divergences among pterygote insect orders. Once the saturated third-codon positions (which violated stationarity of base frequencies) were removed, the COII sequences contained phylogenetic information, but the extent of that information was overestimated by parsimony methods relative to likelihood methods. In the phylogenetic analysis, consistent statistical support was obtained for the monophyly of all four genera examined, but relationships among genera/families were not well supported. Within the genus Orchesella, relationships were well resolved and agreed with allozyme data. Within the genus Isotomurus, although three pairs of populations were consistently identified, these appeared to have arisen in a burst of evolution from an earlier ancestor. Isotomurus italicus always appeared as basal and I. palustris appeared to harbor a cryptic species, corroborating allozyme data.

203 citations


Journal ArticleDOI
TL;DR: A new type of unsupervised, growing, self-organizing neural network that expands itself by following the taxonomic relationships that exist among the sequences being classified, which is an excellent tool for phylogenetic analysis of a large number of sequences.
Abstract: We propose a new type of unsupervised, growing, self-organizing neural network that expands itself by following the taxonomic relationships that exist among the sequences being classified. The binary tree topology of this neutral network, contrary to other more classical neural network topologies, permits an efficient classification of sequences. The growing nature of this procedure allows to stop it at the desired taxonomic level without the necessity of waiting until a complete phylogenetic tree is produced. This novel approach presents a number of other interesting properties, such as a time for convergence which is, approximately, a lineal function of the number of sequences. Computer simulation and a real example show that the algorithm accurately finds the phylogenetic tree that relates the data. All this makes the neural network presented here an excellent tool for phylogenetic analysis of a large number of sequences.

Journal ArticleDOI
TL;DR: This new approach combines the high-level description of molecular function with pair statistics that express genome organization, expected to complement traditional methods of sequence analysis in the study of genomic structure, function, and evolution.
Abstract: An approach for genome comparison, combining function classification of gene products and sequence comparison, is presented. The genomes of Haemophilus influenzae and Escherichia coli are analyzed, and all genes are classified into nine major functional classes, corresponding to important cellular processes. To study gene order relationships and genome organization in the two bacteria, we performed statistics on neighboring pairs of genes. To estimate the significance of the observations, a statistical model based on binomial distributions has been developed. Significant patterns of gene order are observed within, as well as between, the two bacterial genomes: Functionally related genes tend to be neighbors more often than do unrelated genes. Some of these groups represent well-known operons, but additional gene clusters are identified. These clusters correspond to genomic elements that have been conserved during bacterial evolution. In addition to nearest-neighbor relationships, the method is also useful to study the relative direction of transcription in genomes, which is also highly conserved between homologous gene pairs. This new approach combines the high-level description of molecular function with pair statistics that express genome organization. It is expected to complement traditional methods of sequence analysis in the study of genomic structure, function, and evolution.

Journal ArticleDOI
TL;DR: The observation of several families of alleles at the population level provides information about the evolutionary history and mutation processes of microsatellites and may have implications for the use of these markers in phylogenetic, linkage disequilibrium studies, and gene mapping.
Abstract: Microsatellite DNA sequences have become the dominant source of nuclear genetic markers for most applications. It is important to investigate the basis of variation between alleles and to know if current assumptions about the mechanisms of microsatellite mutation (that is to say, variations involving simple changes in the number of repeat) are correct. We have characterized, by DNA sequencing, the human alleles of a new highly informative (CA)n repeat localized approximately 20 kb centromeric to the HLA-B gene. Although 12 alleles were identified based on conventional length criteria, sequencing of the alleles demonstrated that differences between alleles were found to be more complex than previously assumed: A high degree of microsatellite variability is due to variation in the region immediately flanking the repeat. These data indicate that the mutational process which generates polymorphism in this region has involved not only simple changes in the number of dinucleotide CA repeats but also perturbations in the nonrepeated 5' and 3' flanking sequences. Three families of alleles (not visible from the overall length of the alleles), with presumably separate evolutionary histories, exist and can yield to homoplasy of size. Effectively, we can observe alleles of the same size with different internal structures which are separated by a significant amount of variation. Although allelic homoplasy for non-interrupted microsatellite loci has been suggested between different species, it has not been unequivocally demonstrated within species. A strong association is noted between alleles defined at the sequence level and HLA-B alleles. The observation of several families of alleles at the population level provides information about the evolutionary history and mutation processes of microsatellites and may have implications for the use of these markers in phylogenetic, linkage disequilibrium studies, and gene mapping.

Journal ArticleDOI
TL;DR: In animal mitochondria homologous genes that differ in guanine plus cytosine (G + C) content code for proteins differing in amino acid content in a manner that relates to the G + C content of the codons, a new method that combines graphical visualization and statistical analysis of compositional differences in both DNA and protein is shown.
Abstract: We show that in animal mitochondria homologous genes that differ in guanine plus cytosine (G + C) content code for proteins differing in amino acid content in a manner that relates to the G + C content of the codons. DNA sequences were analyzed using square plots, a new method that combines graphical visualization and statistical analysis of compositional differences in both DNA and protein. Square plots divide codons into four groups based on first and second position A + T (adenine plus thymine) and G + C content and indicate differences in amino acid content when comparing sequences that differ in G + C content. When sequences are compared using these plots, the amino acid content is shown to correlate with the nucleotide bias of the genes. This amino acid effect is shown in all protein-coding genes in the mitochondrial genome, including cox I, cox II, and cyt b, mitochondrial genes which are commonly used for phylogenetic studies. Furthermore, nucleotide content differences are shown to affect the content of all amino acids with A + T- and G + C-rich codons. We speculate that phylogenetic analysis of genes so affected may tend erroneously to indicate relatedness (or lack thereof) based only on amino acid content.

Journal ArticleDOI
TL;DR: It is demonstrated that insect Vgs sequences can be confidently aligned with one another along their entire lengths and with multiple vertebrate and nematode Vg sequences along most of their spans, establishing conclusively that Vgs from the three phyla are homologous.
Abstract: The eggs of most oviparous animals are provisioned with a class of protein called vitellogenin (Vg) which is stored as the major component of yolk. Until recently, deduced amino acid sequences were available only from vertebrate and nematode Vgs, which proved to be homologous. The sequences of several insect Vgs are now known, but early attempts at pairwise alignments with vertebrate and nematode Vgs have been problematic, leading to conflicting conclusions about how closely insect Vgs are related to the others. In this paper we demonstrate that insect Vg sequences can be confidently aligned with one another along their entire lengths and with multiple vertebrate and nematode Vg sequences along most of their spans. Although divergence is high, conservation among insect, vertebrate, and nematode Vg sequences is widespread with a preponderance of glycine, proline, and cysteine residues among strictly conserved amino acids, establishing conclusively that Vgs from the three phyla are homologous. Areas of least-certain alignment are primarily in and around insect and vertebrate polyserine domains which are not homologous. Phylogenetic reconstructions of Vgs based on sequence identities indicate that the insect lineage is the most diverged and that the mammalian serum protein, apolipoprotein B-100, arose from a Vg ancestor after the nematode/vertebrate divergence.

Journal ArticleDOI
TL;DR: Sequence analyses and inspection of the few available three-dimensional structures suggest that the secondary structure of domain B varies with the enzyme specificity in the α-amylase family.
Abstract: The available amino acid sequences of the alpha-amylase family (glycosyl hydrolase family 13) were searched to identify their domain B, a distinct domain that protrudes from the regular catalytic (beta/alpha)8-barrel between the strand beta3 and the helix alpha3. The isolated domain B sequences were inspected visually and also analyzed by Hydrophobic Cluster Analysis (HCA) to find common features. Sequence analyses and inspection of the few available three-dimensional structures suggest that the secondary structure of domain B varies with the enzyme specificity. Domain B in these different forms, however, may still have evolved from a common ancestor. The largest number of different specificities was found in the group with structural similarity to domain B from Bacillus cereus oligo-1,6-glucosidase that contains an alpha-helix succeeded by a three-stranded antiparallel beta-sheet. These enzymes are alpha-glucosidase, cyclomaltodextrinase, dextran glucosidase, trehalose-6-phosphate hydrolase, neopullulanase, and a few alpha-amylases. Domain B of this type was observed also in some mammalian proteins involved in the transport of amino acids. These proteins show remarkable similarity with (beta/alpha)8-barrel elements throughout the entire sequence of enzymes from the oligo-1, 6-glucosidase group. The transport proteins, in turn, resemble the animal 4F2 heavy-chain cell surface antigens, for which the sequences either lack domain B or contain only parts thereof. The similarities are compiled to indicate a possible route of domain evolution in the alpha-amylase family.

Journal ArticleDOI
TL;DR: It is suggested that the origin of the ancestral ssRNAP gene closely paralleled in time the introduction of mitochondria into eukaryotic cells through a eubacterial endosymbiosis.
Abstract: Many eukaryotic nuclear genomes as well as mitochondrial plasmids contain genes displaying evident sequence similarity to those encoding the single-subunit RNA polymerase (ssRNAP) of bacteriophage T7 and its relatives. We have collected and aligned these ssRNAP sequences and have constructed unrooted phylogenetic trees that demonstrate the separation of ssRNAPs into three well-defined and nonoverlapping clusters (phage-encoded, nucleus-encoded, and plasmid-encoded). Our analyses indicate that these three subfamiles of T7-like RNAPs shared a common ancestor; however, the order in which the groups diverged cannot be inferred from available data. On the basis of structural similarities and mutational data, we suggest that the ancestral ssRNAP gene may have arisen via duplication and divergence of a DNA polymerase or reverse transcriptase gene. Considering the current phylogenetic distribution of ssRNAP sequences, we further suggest that the origin of the ancestral ssRNAP gene closely paralleled in time the introduction of mitochondria into eukaryotic cells through a eubacterial endosymbiosis.

Journal ArticleDOI
TL;DR: The evolution of HBV represents a typical constrained evolution, and this mode of viral evolution is proposed to be called “constrained evolution.”
Abstract: With the aim of elucidating the evolution of a hepadnavirus family, we constructed molecular phylogenetic trees for 27 strains of hepatitis B virus (HBV) using both the unweighted pair-grouping and neighbor-joining methods. All five gene regions, P, C, S, X, and preS, were used to construct the phylogenetic trees. Using the phylogenetic trees obtained, we classified these strains into five major groups in which the strains were closely related to each other. Our classification reinforced our previous view that genetic classification is not always compatible with conventional classification determined by serological subtypes. Moreover, constraints on the evolutionary process of HBV were analyzed for amino-acid-altering (nonsynonymous) and silent (synonymous) substitutions, because two-thirds of the open reading frame (ORF), P, contains alternating overlapping ORFs. In our unique analysis of this interesting gene structure of HBV, the most frequent synonymous substitutions were observed in the nonoverlapped parts of the P and C genes. On the other hand, the number of synonymous substitutions per nucleotide site for the S gene was quite low and appeared a strongly constrained evolution. Because the P gene overlaps the S gene in a different frame, the low rate of synonymous substitution for the S gene can be explained by the evolutionary constraints which are imposed on the overlapping gene region. In other words, synonymous substitutions in the S gene can cause amino acid changes in its overlapping region in a different frame. Thus, the evolution of HBV is constrained evolutionarily by the overlapping genes. We propose calling this mode of viral evolution “constrained evolution.” The evolution of HBV represents a typical constrained evolution.

Journal ArticleDOI
TL;DR: The difference in synonymous substitution rates is due to a combination of two factors: a higher transitional mutation rate in mtDNA and constraints on nuclear genes due to selection for codon usage.
Abstract: Synonymous substitution rates in mitochondrial and nuclear genes of Drosophila were compared. To make accurate comparisons, we considered the following: (1) relative synonymous rates, which do not require divergence time estimates, should be used; (2) methods estimating divergence should take into account base composition; (3) only very closely related species should be used to avoid effects of saturation; (4) the heterogeneity of rates should be examined. We modified the methods estimating synonymous substitution numbers to account for base composition bias. By using these methods, we found that mitochondrial genes have 1.7—3.4 times higher synonymous substitution rates than the fastest nuclear genes or 4.5–9.0 times higher rates than the average nuclear genes. The average rate of synonymous transversions was 2.7 (estimated from the melanogaster species subgroup) or 2.9 (estimated from the obscura group) times higher in mitochondrial genes than in nuclear genes. Synonymous transversions in mitochondrial genes occurred at an approximately equivalent rate to those in the fastest nuclear genes. This last result is not consistent with the hypothesis that the difference in turnover rates between mitochondrial and nuclear genomes is the major factor determining higher synonymous substitution rates in mtDNA. We conclude that the difference in synonymous substitution rates is due to a combination of two factors: a higher transitional mutation rate in mtDNA and constraints on nuclear genes due to selection for codon usage.

Journal ArticleDOI
TL;DR: The analysis demonstrated that accuracy of the functional site prediction could be improved if one takes into account correlations between the site positions, and the accuracy of prediction by using human consensus sequences was tested on sequences from different organisms.
Abstract: We present here a new algorithm for functional site analysis It is based on four main assumptions: each variation of nucleotide composition makes a different contribution to the overall binding free energy of interaction between a functional site and another molecule; nonfunctioning site-like regions (pseudosites) are absent or rare in genomes; there may be errors in the sample of sites; and nucleotides of different site positions are considered to be mutually dependent In this algorithm, the site set is divided into subsets, each described by a certain consensus Donor splice sites of the human protein-coding genes were analyzed Comparing the results with other methods of donor splice site prediction has demonstrated a more accurate prediction of consensus sequences AG/GU(A,G), G/GUnAG, /GU(A,G)AG, /GU(A,G)nGU, and G/GUA than is achieved by weight matrix and consensus (A,C)AG/GU(A,G)AGU with mismatches The probability of the first type error, El, for the obtained consensus set was about 005, and the probability of the second type error, E2, was 015 The analysis demonstrated that accuracy of the functional site prediction could be improved if one takes into account correlations between the site positions The accuracy of prediction by using human consensus sequences was tested on sequences from different organisms Some differences in consensus sequences for the plant Arabidopsis sp, the invertebrate Caenorhabditis sp, and the fungus Aspergillus sp were revealed For the yeast Saccharomyces sp only one conservative consensus, /GUA(U,A,C)G(U,A,C), was revealed (El = 003, E2 = 003) Yeast is a very interesting model to use for analysis of molecular mechanisms of splicing

Journal ArticleDOI
TL;DR: The photolyase-blue-light photoreceptor family is composed of cyclobutane pyrimidine dimer (CPD) photolyases, (6-4) photosynthetic enzymes, and blue light photoreceptors as discussed by the authors.
Abstract: The photolyase–blue-light photoreceptor family is composed of cyclobutane pyrimidine dimer (CPD) photolyases, (6-4) photolyases, and blue-light photoreceptors CPD photolyase and (6-4) photolyase are involved in photoreactivation for CPD and (6-4) photoproducts, respectively CPD photolyase is classified into two subclasses, class I and II, based on amino acid sequence similarity Blue-light photoreceptors are essential light detectors for the early development of plants The amino acid sequence of the receptor is similar to those of the photolyases, although the receptor does not show the activity of photoreactivation To investigate the functional divergence of the family, the amino acid sequences of the proteins were aligned The alignment suggested that the recognition mechanisms of the cofactors and the substrate of class I CPD photolyases (class I photolyases) are different from those of class II CPD photolyases (class II photolyases) We reconstructed the phylogenetic trees based on the alignment by the NJ method and the ML method The phylogenetic analysis suggested that the ancestral gene of the family had encoded CPD photolyase and that the gene duplication of the ancestral proteins had occurred at least eight times before the divergence between eubacteria and eukaryotes

Journal ArticleDOI
TL;DR: The study delineates the taxonomic level at which ITS sequences, in comparison to ribosomal gene sequences, are most useful in systematic and other studies and suggests a subset of the ITS-2 positions is relatively conserved.
Abstract: The determination of the secondary structure of the internal transcribed spacer (ITS) regions separating nuclear ribosomal RNA genes of Chlorophytes has improved the fidelity of alignment of nuclear ribosomal ITS sequences from related organisms. Application of this information to sequences from green algae and plants suggested that a subset of the ITS-2 positions is relatively conserved. Organisms that can mate are identical at all of these 116 positions, or differ by at most, one nucleotide change. Here we sequenced and compared the ITS-1 and ITS-2 of 40 green flagellates in search of the nearest relative to Chlamydomonas reinhardtii. The analysis clearly revealed one unique candidate, C. incerta. Several ancillary benefits of the analysis included the identification of mislabelled cultures, the resolution of confusion concerning C. smithii, the discovery of misidentified sequences in GenBank derived from a green algal contaminant, and an overview of evolutionary relationships among the Volvocales, which is congruent with that derived from rDNA gene sequence comparisons but improves upon its resolution. The study further delineates the taxonomic level at which ITS sequences, in comparison to ribosomal gene sequences, are most useful in systematic and other studies.

Journal ArticleDOI
TL;DR: Phylogenetic trees were drawn and analyzed based on the nucleotide sequences of the 1.5-kb gene fragment coding for the L and M subunits of the photochemical reaction center of various purple photo-synthetic bacteria, which imply horizontal transfer of the genes that code for the photosynthetic apparatus in purple bacteria.
Abstract: Phylogenetic trees were drawn and analyzed based on the nucleotide sequences of the 1.5-kb gene fragment coding for the L and M subunits of the photochemical reaction center of various purple photo-synthetic bacteria. These trees are mostly consistent with phylogenetic trees based on 16S rRNA and soluble cy-tochrome c, but differ in some significant details. This inconsistency implies horizontal transfer of the genes that code for the photosynthetic apparatus in purple bacteria. Possibilities of similar transfers of photosynthesis genes during the evolution of photosynthesis are discussed especially for the establishment of oxygenic photosynthesis.

Journal ArticleDOI
TL;DR: It is proposed that RecA-like proteins derive evolutionarily from an assortment of independent domains and that the functional homologs of RecA in noneubacteria comprise an array of Rec a-likeprotein acting in series or cooperatively.
Abstract: Protein sequences with similarities to Escherichia coli RecA were compared across the major kingdoms of eubacteria, archaebacteria, and eukaryotes. The archaeal sequences branch monophyletically and are most closely related to the eukaryotic paralogous Rad51 and Dmc1 groups. A multiple alignment of the sequences suggests a modular structure of RecA-like proteins consisting of distinct segments, some of which are conserved only within subgroups of sequences. The eukaryotic and archaeal sequences share an N-terminal domain which may play a role in interactions with other factors and nucleic acids. Several positions in the alignment blocks are highly conserved within the eubacteria as one group and within the eukaryotes and archaebacteria as a second group, but compared between the groups these positions display nonconservative amino acid substitutions. Conservation within the RecA-like core domain identifies possible key residues involved in ATP-induced conformational changes. We propose that RecA-like proteins derive evolutionarily from an assortment of independent domains and that the functional homologs of RecA in noneubacteria comprise an array of RecA-like proteins acting in series or cooperatively.

Journal ArticleDOI
TL;DR: Despite strong constraints on sequence change, phylogenetic information was preserved in the group of recently differentiated species and supported the clustering of sequences into three major mtDNA groupings.
Abstract: Sequence differences in the tRNA-proline (tRNApro) end of the mitochondrial control-region of three species of Pacific butterflyfishes accumulated 33–43 times more rapidly than did changes within the mitochondrial cytochrome b gene (cytb). Rapid evolution in this region was accompanied by strong transition/transversion bias and large variation in the probability of a DNA substitution among sites. These substitution constraints placed an absolute ceiling on the magnitude of sequence divergence that could be detected between individuals. This divergence ``ceiling'' was reached rapidly and led to a decay in the relative rate of control-region/cytb b evolution. A high rate of evolution in this section of the control-region of butterflyfishes stands in marked contrast to the patterns reported in some other fish lineages. Although the mechanism underlying rate variation remains unclear, all taxa with rapid evolution in the 5′-end of the control-region showed extreme transition biases. By contrast, in taxa with slower control-region evolution, transitions accumulated at nearly the same rate as transversions. More information is needed to understand the relationship between nucleotide bias and the rate of evolution in the 5′-end of the control-region. Despite strong constraints on sequence change, phylogenetic information was preserved in the group of recently differentiated species and supported the clustering of sequences into three major mtDNA groupings. Within these groups, very similar control-region sequences were widely distributed across the Pacific Ocean and were shared between recognized species, indicating a lack of mitochondrial sequence monophyly among species.

Journal ArticleDOI
TL;DR: It is found that dynamic rearrangements have so frequently occurred in eubacterial genomes as to break operon structures during evolution, even after the relatively recent divergence between E. coli and H. influenzae.
Abstract: To test the hypotheses that eubacterial genomes leave evolutionarily stable structures and that the variety of genome size is brought about through genome doubling during evolution, the genome structures of Haemophilus influenzae, Mycoplasma genitalium, Escherichia coli, and Bacillus subtilis were compared using the DNA sequences of the entire genome or substantial portions of genome. In these comparisons, the locations of orthologous genes were examined among different genomes. Using orthologous genes for the comparisons guaranteed that differences revealed in physical location would reflect changes in genome structure after speciation. We found that dynamic rearrangements have so frequently occurred in eubacterial genomes as to break operon structures during evolution, even after the relatively recent divergence between E. coli and H. influenzae. Interestingly, in such eubacterial genomes of high plasticity, we could find several highly conservative regions with the longest conserved region comprising the S10, spc, and α operons. This suggests that such exceptional conservative regions have undergone strong structural constraints during evolution.

Journal ArticleDOI
TL;DR: It is shown that sequences obtained from scleractinians are homologous to other metazoan 16S ribosomal sequences and fall into two distinct clades defined by size of the amplified gene product, which indicates that a reevaluation of evolutionary affinities in the order is needed.
Abstract: Relationships among families and suborders of scleractinian corals are poorly understood because of difficulties 1) in making inferences about the evolution of the morphological characters used in coral taxonomy and 2) in interpreting their 240-million-year fossil record. Here we describe patterns of molecular evolution in a segment of the mitochondrial (mt) 16S ribosomal gene from taxa of 14 families of corals and the use of this gene segment in a phylogenetic analysis of relationships within the order. We show that sequences obtained from scleractinians are homologous to other metazoan 16S ribosomal sequences and fall into two distinct clades defined by size of the amplified gene product. Comparisons of sequences from the two clades demonstrate that both sets of sequences are evolving under similar evolutionary constraints: they do not differ in nucleotide composition, numbers of transition and transversion substitutions, spatial patterns of substitutions, or in rates of divergence. The characteristics and patterns observed in these sequences as well as the secondary structures, are similar to those observed in mt 16S ribosomal DNA sequences from other taxa. Phylogenetic analysis of these sequences shows that they are useful for evaluating relationships within the order. The hypothesis generated from this analysis differs from traditional hypotheses for evolutionary relationships among the Scleractinia and suggests that a reevaluation of evolutionary affinities in the order is needed.

Journal ArticleDOI
TL;DR: Comparison of complete genome sequences for different variants of hepatitis C virus (HCV) reveals several different constraints on sequence change, which need to be taken into account in interpreting sequence change in longitudinal studies.
Abstract: Comparison of complete genome sequences for different variants of hepatitis C virus (HCV) reveals several different constraints on sequence change. Synonymous changes are suppressed in coding regions at both 5′ and 3′ ends of the genome. No evidence was found for the existence of alternative reading frames or for a lower mutation frequency in these regions. Instead, suppression may be due to constraints imposed by RNA secondary structures identified within the core and NS5b genes. Nonsynonymous substitutions are less frequent than synonymous ones except in the hypervariable region of E2 and, to a lesser extent, in E1, NS2, and NS5b. Transitions are more frequent than transversions, particularly at the third position of codons where the bias is 16:1. In addition, nucleotide substitutions may not occur symmetrically since there is a bias toward G or C at the third position of codons, while T ↔ C transitions were twice as frequent as A ↔ G transitions. These different biases do not affect the phylogenetic analysis of HCV variants but need to be taken into account in interpreting sequence change in longitudinal studies.

Journal ArticleDOI
TL;DR: Chemical natural selection appeared, the first step in the transition from inanimate to animate matter, and initiated the first animate property, fitness, i.e., the capacity to adapt to the environment and to survive.
Abstract: Theories of the origin of life have proposed hypotheses to link inanimate to animate matter. The theory proposed here derived the crucial stages in the origin of animate matter directly from the basic properties of inanimate matter. It asked what were the general characteristics of the link, rather than what might have been its chemical details. Life and its origin are shown to be one continuous physicochemical process of replication, random variation, and natural selection. Since life exists here and now, animate properties must have been initiated in the past somewhere. According to the theory, life originated from an as yet unknown elementary autocatalyst which occurred spontaneously, then replicated autocatalytically. As it multiplied to macroscopic abundance, its replicas gradually exhausted their reactants. Random chemical drift initiated diversity among autocatalysts. Diversity led to competition. Competition and depletion of reactants slowed down the rates of net replication of the autocatalysts. Some reached negative rates and became extinct, while those which stayed positive "survived." Thus chemical natural selection appeared, the first step in the transition from inanimate to animate matter. It initiated the first animate property, fitness, i.e., the capacity to adapt to the environment and to survive. As the environment was depleted of reactants, it was enriched with sequels-namely, with decomposition products and all other products which accompany autocatalysis. The changing environment exerted a selective pressure on autocatalysts to replace dwindling reactants by accumulating sequels. Sequels that were incorporated into the autocatalytic process became internal components of complex autocatalytic systems. Primitive forms of metabolism and organization were thus initiated. They evolved further by the same mechanism to ever higher levels of complexity, such as homochirality (handedness) and membranal enclosure. Subsequent evolution by the same mechanism generated cellular metabolism, cell division, information carriers, and a genetic code. Theories of self-organization without natural selection are refuted.

Journal ArticleDOI
TL;DR: According to phylogenetic analyses and the calculation of evolutionary rates, these chitinases probably arose from different class I lineages by relatively recent deletion events and their potential involvement in host–pathogen interactions are discussed.
Abstract: The analysis of nuclear-encoded chitinase sequences from various angiosperms has allowed the categorization of the chitinases into discrete classes. Nucleotide sequences of their catalytic domains were compared in this study to investigate the evolutionary relationships between chitinase classes. The functionally distinct class III chitinases appear to be more closely related to fungal enzymes involved in morphogenesis than to other plant chitinases. The ordering of other plant chitinases into additional classes mainly relied on the presence of auxiliary domains-namely, a chitin-binding domain and a carboxy-terminal extension-flanking the main catalytic domain. The results of our phylogenetic analyses showed that classes I and IV form discrete and well-supported monophyletic groups derived from a common ancestral sequence that predates the divergence of dicots and monocots. In contrast, other sequences included in classes I* and II, lacking one or both types of auxiliary domains, were nested within class I sequences, indicating that they have a polyphyletic origin. According to phylogenetic analyses and the calculation of evolutionary rates, these chitinases probably arose from different class I lineages by relatively recent deletion events. The occurrence of such evolutionary trends in cultivated plants and their potential involvement in host-pathogen interactions are discussed.