scispace - formally typeset
Search or ask a question

Showing papers in "Journal of Molecular Evolution in 1998"


Journal ArticleDOI
TL;DR: An analysis of conserved core sequences in 159 P-type ATPases indicates invention of new substrate specificities is accompanied by abrupt changes in the rate of sequence evolution, and a hitherto-unrecognized family of P- type ATPases has been identified that is expected to be represented in all the major phyla of eukarya.
Abstract: P-type ATPases make up a large superfamily of ATP-driven pumps involved in the transmembrane transport of charged substrates. We have performed an analysis of conserved core sequences in 159 P-type ATPases. The various ATPases group together in five major branches according to substrate specificity, and not according to the evolutionary relationship of the parental species, indicating that invention of new substrate specificities is accompanied by abrupt changes in the rate of sequence evolution. A hitherto-unrecognized family of P-type ATPases has been identified that is expected to be represented in all the major phyla of eukarya.

894 citations


Journal ArticleDOI
TL;DR: It is found that at 22 of the 48 nuclear loci examined, the nonsynonymous/synonymous rate ratio varies significantly across branches of the tree, which provides strong evidence against a strictly neutral model of molecular evolution.
Abstract: A maximum likelihood approach was used to estimate the synonymous and nonsynonymous substitution rates in 48 nuclear genes from primates, artiodactyls, and rodents. A codon-substitution model was assumed, which accounts for the genetic code structure, transition/transversion bias, and base frequency biases at codon positions. Likelihood ratio tests were applied to test the constancy of nonsynonymous to synonymous rate ratios among branches (evolutionary lineages). It is found that at 22 of the 48 nuclear loci examined, the nonsynonymous/synonymous rate ratio varies significantly across branches of the tree. The result provides strong evidence against a strictly neutral model of molecular evolution. Our likelihood estimates of synonymous and nonsynonymous rates differ considerably from previous results obtained from approximate pairwise sequence comparisons. The differences between the methods are explored by detailed analyses of data from several genes. Transition/transversion rate bias and codon frequency biases are found to have significant effects on the estimation of synonymous and nonsynonymous rates, and approximate methods do not adequately account for those factors. The likelihood approach is preferable, even for pairwise sequence comparison, because morerealistic models about the mutation and substitution processes can be incorporated in the analysis.

611 citations


Journal ArticleDOI
TL;DR: It is concluded that the natural genetic code is extremely efficient at minimizing the effects of errors, but also that its structure reflects biases in these errors, as might be expected were the code the product of selection.
Abstract: Statistical and biochemical studies of the genetic code have found evidence of nonrandom patterns in the distribution of codon assignments. It has, for example, been shown that the code minimizes the effects of point mutation or mistranslation: erroneous codons are either synonymous or code for an amino acid with chemical properties very similar to those of the one that would have been present had the error not occurred. This work has suggested that the second base of codons is less efficient in this respect, by about three orders of magnitude, than the first and third bases. These results are based on the assumption that all forms of error at all bases are equally likely. We extend this work to investigate (1) the effect of weighting transition errors differently from transversion errors and (2) the effect of weighting each base differently, depending on reported mistranslation biases. We find that if the bias affects all codon positions equally, as might be expected were the code adapted to a mutational environment with transition/transversion bias, then any reasonable transition/transversion bias increases the relative efficiency of the second base by an order of magnitude. In addition, if we employ weightings to allow for biases in translation, then only 1 in every million random alternative codes generated is more efficient than the natural code. We thus conclude not only that the natural genetic code is extremely efficient at minimizing the effects of errors, but also that its structure reflects biases in these errors, as might be expected were the code the product of selection.

500 citations


Journal ArticleDOI
TL;DR: A study, based on computer simulation, of how the different methods proposed to evaluate the nonrandom use of synonymous codons are affected by the length of the coding region analyzed shows that some of these methods are heavily influenced by the number of codons and that the comparison of codon usage bias between coding regions of different lengths shows a methodological bias.
Abstract: Synonymous codons are not generally used at equal frequencies, and this trend is observed for most genes and organisms. Several methods have been proposed and used to estimate the degree of the nonrandom use of the different synonymous codons. The estimates obtained by these methods, however, show different levels of both precision and dispersion when coding regions of a finite number of codons are under analysis. Here, we present a study, based on computer simulation, of how the different methods proposed to evaluate the nonrandom use of synonymous codons are affected by the length of the coding region analyzed. The results show that some of these methods are heavily influenced by the number of codons and that the comparison of codon usage bias between coding regions of different lengths shows a methodological bias under different conditions of nonrandom use of synonymous codons. The study of the dispersion of the estimates obtained by the different methods gives, on the other hand, an indication of the methods to be applied to compare values of codon usage bias among coding regions of equivalent length.

300 citations


Journal ArticleDOI
TL;DR: A novel hypothesis for the origin of the eukaryotic cell is presented, based on a metabolic symbiosis (syntrophy) between a methanogenic archaeon (methanobacterial-like) and a δ-proteobacterium (an ancestral sulfate-reducing myxobacteria) that was originally mediated by interspecies H2 transfer in anaerobic, possibly moderately thermophilic, environments.
Abstract: We present a novel hypothesis for the origin of the eukaryotic cell, or eukaryogenesis, based on a metabolic symbiosis (syntrophy) between a methanogenic archaeon (methanobacterial-like) and a δ-proteobacterium (an ancestral sulfate-reducing myxobacterium). This syntrophic symbiosis was originally mediated by interspecies H2 transfer in anaerobic, possibly moderately thermophilic, environments. During eukaryogenesis, progressive cellular and genomic cointegration of both types of prokaryotic partners occurred. Initially, the establishment of permanent consortia, accompanied by extensive membrane development and close cell–cell interactions, led to a highly evolved symbiotic structure already endowed with some primitive eukaryotic features, such as a complex membrane system defining a protonuclear space (corresponding to the archaeal cytoplasm), and a protoplasmic region (derived from fusion of the surrounding bacterial cells). Simultaneously, bacterial-to-archaeal preferential gene transfer and eventual replacement took place. Bacterial genome extinction was thus accomplished by gradual transfer to the archaeal host, where genes adapted to a new genetic environment. Emerging eukaryotes would have inherited archaeal genome organization and dynamics and, consequently, most DNA-processing information systems. Conversely, primordial genes for social and developmental behavior would have been provided by the ancient myxobacterial symbiont. Metabolism would have been issued mainly from the versatile bacterial organotrophy, and progressively, methanogenesis was lost.

300 citations


Journal ArticleDOI
TL;DR: Base composition skews measured at third codon positions probably reflect mutational biases, whereas those measured over all bases in a sequence can be strongly affected by protein considerations due to the tendency in some bacteria for genes to be transcribed in the same direction that they are replicated.
Abstract: Variation in GC content, GC skew and AT skew along genomic regions was examined at third codon positions in completely sequenced prokaryotes. Eight out of nine eubacteria studied show GC and AT skews that change sign at the origin of replication. The leading strand in DNA replication is G-T rich at codon position 3 in six eubacteria, but C-T rich in two Mycoplasma species. In M. genitalium the AT and GC skews are symmetrical around the origin and terminus of replication, whereas its GC content variation has been shown to have a centre of symmetry elsewhere in the genome. Borrelia burgdorferi and Treponema pallidum show extraordinary extents of base composition skew correlated with direction of DNA replication. Base composition skews measured at third codon positions probably reflect mutational biases, whereas those measured over all bases in a sequence (or at codon positions 1 and 2) can be strongly affected by protein considerations due to the tendency in some bacteria for genes to be transcribed in the same direction that they are replicated. Consequently in some species the direction of skew for total genomic DNA is opposite to that for codon position 3.

281 citations


Journal ArticleDOI
TL;DR: Results imply a pattern of modular evolution of the Myb proteins centering on the possession of a helix-turn-helix motif, suggested to be a polyphyletic group related only by a ``Myb-box'' DNA-binding motif.
Abstract: The Myb family of proteins is a group of functionally diverse transcriptional activators found in both plants and animals that is characterized by a conserved DNA-binding domain of approximately 50 amino acids. Phylogenetic analyses of amino acid sequences of this family of proteins portray very disparate evolutionary histories in plants and animals. Animal Myb proteins have diverged from a common ancestor, while plants appear related only within the DNA-binding domain. Results imply a pattern of modular evolution of the Myb proteins centering on the possession of a helix-turn-helix motif. Based on this it is suggested that Myb proteins are a polyphyletic group related only by a "Myb-box" DNA-binding motif.

278 citations


Journal ArticleDOI
TL;DR: Standard phylogenetic methods and newer algorithms insensitive to such biases did not recover different branching patterns within the marine picophytoplankton group, and failed to cluster Prochlorococcus with chloroplasts or other chlorophyll b-containing prokaryotes.
Abstract: Cultured isolates of the unicellular planktonic cyanobacteria Prochlorococcus and marine Synechococcus belong to a single marine picophytoplankton clade. Within this clade, two deeply branching lineages of Prochlorococcus, two lineages of marine A Synechococcus and one lineage of marine B Synechococcus exhibit closely spaced divergence points with low bootstrap support. This pattern is consistent with a near-simultaneous diversification of marine lineages with divinyl chlorophyll b and phycobilisomes as photosynthetic antennae. Inferences from 16S ribosomal RNA sequences including data for 18 marine picophytoplankton clade members were congruent with results of psbB and petB and D sequence analyses focusing on five strains of Prochlorococcus and one strain of marine A Synechococcus. Third codon position and intergenic region nucleotide frequencies vary widely among members of the marine picophytoplankton group, suggesting that substitution biases differ among the lineages. Nonetheless, standard phylogenetic methods and newer algorithms insensitive to such biases did not recover different branching patterns within the group, and failed to cluster Prochlorococcus with chloroplasts or other chlorophyll b-containing prokaryotes. Prochlorococcus isolated from surface waters of stratified, oligotrophic ocean provinces predominate in a lineage exhibiting low G + C nucleotide frequencies at highly variable positions.

264 citations


Journal ArticleDOI
TL;DR: A sequential (step by step) Darwinian model for the evolution of life from the late stages of the RNA world through to the emergence of eukaryotes and prokaryotes, with a functional explanation that proKaryote ancestors underwent selection for thermophily and/or for rapid reproduction at least once in their history.
Abstract: We describe a sequential (step by step) Darwinian model for the evolution of life from the late stages of the RNA world through to the emergence of eukaryotes and prokaryotes. The starting point is our model, derived from current RNA activity, of the RNA world just prior to the advent of genetically-encoded protein synthesis. By focusing on the function of the protoribosome we develop a plausible model for the evolution of a protein-synthesizing ribosome from a high-fidelity RNA polymerase that incorporated triplets of oligonucleotides. With the standard assumption that during the evolution of enzymatic activity, catalysis is transferred from RNA → RNP → protein, the first proteins in the ``breakthrough organism'' (the first to have encoded protein synthesis) would be nonspecific chaperone-like proteins rather than catalytic. Moreover, because some RNA molecules that pre-date protein synthesis under this model now occur as introns in some of the very earliest proteins, the model predicts these particular introns are older than the exons surrounding them, the ``introns-first'' theory. Many features of the model for the genome organization in the final RNA world ribo-organism are more prevalent in the eukaryotic genome and we suggest that the prokaryotic genome organization (a single, circular genome with one center of replication) was derived from a ``eukaryotic-like'' genome organization (a fragmented linear genome with multiple centers of replication). The steps from the proposed ribo-organism RNA genome → eukaryotic-like DNA genome → prokaryotic-like DNA genome are all relatively straightforward, whereas the transition prokaryotic-like genome → eukaryotic-like genome appears impossible under a Darwinian mechanism of evolution, given the assumption of the transition RNA → RNP → protein. A likely molecular mechanism, ``plasmid transfer,'' is available for the origin of prokaryotic-type genomes from an eukaryotic-like architecture. Under this model prokaryotes are considered specialized and derived with reduced dependence on ssRNA biochemistry. A functional explanation is that prokaryote ancestors underwent selection for thermophily (high temperature) and/or for rapid reproduction (r selection) at least once in their history.

256 citations


Journal ArticleDOI
TL;DR: The model as developed serves as an outgroup to root the tree of life and is an alternative to using sequence data for inferring properties of the earliest cells.
Abstract: An RNA world is widely accepted as a probable stage in the early evolution of life. Two impli- cations are that proteins have gradually replaced RNA as the main biological catalysts and that RNA has not taken on any major de novo catalytic function after the evolu- tion of protein synthesis, that is, there is an essentially irreversible series of steps RNA → RNP → protein. This transition, as expected from a consideration of catalytic perfection, is essentially complete for reactions when the substrates are small molecules. Based on these principles we derive criteria for identifying RNAs in modern or- ganisms that are relics from the RNA world and then examine the function and phylogenetic distribution of RNA for such remnants of the RNA world. This allows an estimate of the minimum complexity of the last ribo- organism—the stage just preceding the advent of geneti- cally encoded protein synthesis. Despite the constraints placed on its size by a low fidelity of replication (the Eigen limit), we conclude that the genome of this organ- ism reached a considerable level of complexity that in- cluded several RNA-processing steps. It would include a large protoribosome with many smaller RNAs involved in its assembly, pre-tRNAs and tRNA processing, an ability for recombination of RNA, some RNA editing, an ability to copy to the end of each RNA strand, and some transport functions. It is harder to recognize specific metabolic reactions that must have existed but synthetic and bio-energetic functions would be necessary. Overall, this requires that such an organism maintained a multiple copy, double-stranded linear RNA genome capable of recombination and splicing. The genome was most likely fragmented, allowing each ''chromosome'' to be repli- cated with minimum error, that is, within the Eigen limit. The model as developed serves as an outgroup to root the tree of life and is an alternative to using sequence data for inferring properties of the earliest cells.

242 citations


Journal ArticleDOI
TL;DR: The results and a site-by-site examination of the sequences clearly suggest that convergent or parallel evolution has occurred in ND1 between primates and rodents and/or between ferungulates and the outgroup.
Abstract: The phylogenetic relationship among primates, ferungulates (artiodactyls + cetaceans + perissodactyls + carnivores), and rodents was examined using proteins encoded by the H strand of mtDNA, with marsupials and monotremes as the outgroup Trees estimated from individual proteins were compared in detail with the tree estimated from all 12 proteins (either concatenated or summing up log-likelihood scores for each gene) Although the overall evidence strongly suggests ((primates, ferungulates), rodents), the ND1 data clearly support another tree, ((primates, rodents), ferungulates) To clarify whether this contradiction is due to (1) a stochastic (sampling) error; (2) minor model-based errors (eg, ignoring site rate variability), or (3) convergent and parallel evolution (specifically between either primates and rodents or ferungulates and the outgroup), the ND1 genes from many additional species of primates, rodents, other eutherian orders, and the outgroup (marsupials + monotremes) were sequenced The phylogenetic analyses were extensive and aimed to eliminate the following artifacts as possible causes of the aberrant result: base composition biases, unequal site substitution rates, or the cumulative effects of both Neither more sophisticated evolutionary analyses nor the addition of species changed the previous conclusion That is, the statistical support for grouping rodents and primates to the exclusion of all other taxa fluctuates upward or downward in quite a tight range centered near 95% confidence These results and a site-by-site examination of the sequences clearly suggest that convergent or parallel evolution has occurred in ND1 between primates and rodents and/or between ferungulates and the outgroup While the primate/rodent grouping is strange, ND1 also throws some interesting light on the relationships of some eutherian orders, marsupials, and montremes In these parts of the tree, ND1 shows no apparent tendency for unexplained convergences

Journal ArticleDOI
TL;DR: Their conserved organization and slow molecular evolution make D-loops of galliforms appropriate for phylogenetic studies, although homoplasy can be be generated at a few hypervariable sites and at some sites which probably have mutated by strand slippage during DNA replication.
Abstract: The entire mitochondrial DNA control region (mtDNA D-loop) was sequenced in the seven extant species of Alectoris partridges. The D-loop length is very conserved (1155 ± 2 nucleotides), and substitution rates are lower than for the mitochondrial cytochrome b gene of the same species, on average. Comparative analyses suggest that these D-loops can be divided into three domains, corresponding to the highly variable peripheral domains I and III and to the central conserved domain II of vertebrates (Baker and Marshall 1997). Nevertheless, the first 161 nucleotides of domain I of the Alectoris, immediately flanking the tRNAGlu, evolve at an unusually low rate and show motifs similar to the mammalian extended termination-associated sequences [ETAS1 and ETAS2 (Sbisa et al. 1997)], which can form stable secondary structures. The second part of domain I contains a hypervariable region with two divergent copies of a tandemly repeated sequence described previously in other species of anseriforms and galliforms (Quinn and Wilson 1993; Fumihito et al. 1995). Some of the conserved sequence blocks of mammals can be mapped in the central domain of Alectoris. Domain III is highly variable and has sequences similar to mammalian CSB1. The bidirectional transcription promoter HSP/LSP box of the chicken is partially conserved among the Alectoris. This structural organization can be found in the anseriform and galliform species studied so far, suggesting that strong functional constraints might have controlled the evolution of the D-loop since the origin of Galloanserae. Their conserved organization and slow molecular evolution make D-loops of galliforms appropriate for phylogenetic studies, although homoplasy can be be generated at a few hypervariable sites and at some sites which probably have mutated by strand slippage during DNA replication. Phylogenetic analyses of D-loops of Alectoris are concordant with previously published cytochrome b and allozyme phylogenies (Randi 1996). Alectoris is monophyletic and includes three major clades: (1) basal barbara and melanocephala; (2) intermediate rufa and graeca; and (3) recent philbyi, magna, and chukar. Comparative description of the organization and substitution patterns of the mitochondrial control region can aid in mapping hypervariable sites and avoid some sources of homoplasy in data sets which are to be used in phylogenetic analyses.

Journal ArticleDOI
TL;DR: The complete mitochondrial genome sequence of the pig, Sus scrofa, was determined and identified the pig (Suiformes) as a sister group of a cow/whale clade, making Artiodactyla paraphyletic.
Abstract: The complete mitochondrial genome sequence of the pig, Sus scrofa, was determined. The length of the sequence presented is 16,679 nucleotides. This figure is not absolute, however, due to pronounced heteroplasmy caused by variable numbers of the motif GTACACGTGC in the control region of different molecules. A phylogenetic study was performed on the concatenated amino acid and nucleotide sequences of 12 protein-coding genes of the mitochondrial genome. The analysis identified the pig (Suiformes) as a sister group of a cow/whale clade, making Artiodactyla paraphyletic. The split between pig and cow/whale was molecularly dated at 65 million years before present.

Journal ArticleDOI
TL;DR: It is demonstrated that amino acids conserved in allMAPKs are located primarily in the center of the protein around the catalytic cleft, and it is concluded that these residues are important for maintaining proper folding into the gross structure common to all MAPKs.
Abstract: All currently sequenced stress-activated protein kinases (SAPKs), extracellular signal-regulated kinases (ERKs), and other mitogen-activated protein kinases (MAPKs) were analyzed by sequence alignment, phylogenetic tree construction, and three-dimensional structure modeling in order to classify members of the MAPK family. Based on this analysis the MAPK family was divided into three subgroups (SAPKs, ERKs, and MAPK3) that consist of at least nine subfamilies. Members of a given subfamily were exclusively from animals, plants, or yeast/fungi. A single signature sequence, [LIVM][TS]XX[LIVM]XT[RK][WY]YRXPX[LIVM] [LIVM], was identified that is characteristic for all MAPKs and sufficient to distinguish MAPKs from other members of the protein kinase superfamily. This signature sequence contains the phosphorylation site and is located on loop 12 of the three-dimensional structure of MAPKs. I also identified signature sequences that are characteristic for each of the nine subfamilies of MAPKs. By modeling the three-dimensional structure of three proteins for each MAPK subfamily based on the resolved atomic structures of rat ERK2 and murine p38, it is demonstrated that amino acids conserved in all MAPKs are located primarily in the center of the protein around the catalytic cleft. I conclude that these residues are important for maintaining proper folding into the gross structure common to all MAPKs. On the other hand, amino acids conserved in a given subfamily are located mainly in the periphery of MAPKs, indicating their possible importance for defining interactions with substrates, activators, and inhibitors. Within these subfamily-specific regions, amino acids were identified that represent unique residues occurring in only a single subfamily and their location was mapped in three-dimensional structure models. These unique residues are likely to be crucial for subfamily-specific interactions of MAPKs with substrates, inhibitors, or activators and, therefore, represent excellent targets for site-directed mutagenesis experiments.

Journal ArticleDOI
TL;DR: The findings suggest that the strains of wild sheep from which domestic sheep originated were more closely related than were the B. primigenius subspecies which gave rise to B. indicus and B. taurus cattle.
Abstract: The complete mitochondrial DNA (mtDNA) molecule of the domestic sheep, Ovis aries, was sequenced, together with part of the mtDNA of a specimen representing the other major O. aries haplotype group. The length of the complete ovine mtDNA presented is 16,616 nucleotides (nt). This length is not absolute, however, due to heteroplasmy caused by the occurrence of different numbers of a 75-nt-long tandem repeat in the control region. The sequence data were included in analyses of intraspecific ovine molecular differences, molecular comparisons with bovine mtDNAs, and phylogenetic analyses based on complete mtDNAs. The comparisons with bovine mtDNAs were based on the central domains of the ovine control regions, representing both major ovine haplotype groups, and the corresponding domains of Bos taurus and B. indicus. The comparisons showed that the difference between the bovids was 1.4 times greater than the intraspecific ovine difference. These findings suggest that the strains of wild sheep from which domestic sheep originated were more closely related than were the B. primigenius subspecies which gave rise to B. indicus and B. taurus cattle. Datings based on complete mtDNAs suggest that the bovine and ovine lineages diverged about 30 million years before present. This dating is considerably earlier than that proposed previously.

Journal ArticleDOI
TL;DR: A functional analysis of the global consensus identifies a (new) highly significant mixed charge cluster proximal to the carboxyl terminus of the sequence highlighting the hypercharge run EEDKKRRER (one-letter aa code used).
Abstract: The heat shock protein 70 kDa sequences (HSP70) are of great importance as molecular chaperones in protein folding and transport. They are abundant under conditions of cellular stress. They are highly conserved in all domains of life: Archaea, eubacteria, eukaryotes, and organelles (mitochondria, chloroplasts). A multiple alignment of a large collection of these sequences was obtained employing our symmetric-iterative ITERALIGN program (Brocchieri and Karlin 1998). Assessments of conservation are interpreted in evolutionary terms and with respect to functional implications. Many archaeal sequences (methanogens and halophiles) tend to align best with the Gram-positive sequences. These two groups also miss a signature segment [about 25 amino acids (aa) long] present in all other HSP70 species (Gupta and Golding 1993). We observed a second signature sequence of about 4 aa absent from all eukaryotic homologues, significantly aligned in all prokaryotic sequences. Consensus sequences were developed for eight groups [Archaea, Gram-positive, proteobacterial Gram-negative, singular bacteria, mitochondria, plastids, eukaryotic endoplasmic reticulum (ER) isoforms, eukaryotic cytoplasmic isoforms]. All group consensus comparisons tend to summarize better the alignments than do the individual sequence comparisons. The global individual consensus "matches" 87% with the consensus of consensuses sequence. A functional analysis of the global consensus identifies a (new) highly significant mixed charge cluster proximal to the carboxyl terminus of the sequence highlighting the hypercharge run EEDKKRRER (one-letter aa code used). The individual Archaea and Gram-positive sequences contain a corresponding significant mixed charge cluster in the location of the charge cluster of the consensus sequence. In contrast, the four Gram-negative proteobacterial sequences of the alignment do not have a charge cluster (even at the 5% significance level). All eukaryotic HSP70 sequences have the analogous charge cluster. Strikingly, several of the eukaryotic isoforms show multiple mixed charged clusters. These clusters were interpreted with supporting data related to HSP70 activity in facilitating chaperone, transport, and secretion function. We observed that the consensus contains only a single tryptophan residue and a single conserved cysteine. This is interpreted with respect to the target rule for disaggregating misfolded proteins. The mitochondrial HSP70 connections to bacterial HSP70 are analyzed, suggesting a polyphyletic split of Trypanosoma and Leishmania protist mitochondrial (Mt) homologues separated from Mt-animal/fungal/plant homologues. Moreover, the HSP70 sequences from the amitochondrial Entamoeba histolytica and Trichomonas vaginalis species were analyzed. The E. histolytica HSP70 is most similar to the higher eukaryotic cytoplasmic sequences, with significantly weaker alignments to ER sequences and much diminished matching to all eubacterial, mitochondrial, and chloroplast sequences. This appears to be at variance with the hypothesis that E. histolytica rather recently lost its mitochondrial organelle. T. vaginalis contains two HSP70 sequences, one Mt-like and the second similar to eukaryotic cytoplasmic sequences suggesting two diverse origins.

Journal ArticleDOI
TL;DR: An up-to-date compilation of the immunoglobulin superfamily is reported, where all known members of the IgSF are classified on the basis of their common functional role and their distribution in tissue and in species.
Abstract: The immunoglobulin superfamily (IgSF) is a heterogenic group of proteins built on a common fold, called the Ig fold, which is a sandwich of two sp sheets. Although members of the IgSF share a similar Ig fold, they differ in their tissue distribution, amino acid composition, and biological role.

Journal ArticleDOI
TL;DR: The 18S rRNA molecule is an unsuitable candidate for reconstructing the evolutionary history of all metazoan phyla, and the polytomies cannot be used as a single or reliable source of evidence to support the hypothesis of a Cambrian explosion.
Abstract: We document the phylogenetic behavior of the 18S rRNA molecule in 67 taxa from 28 metazoan phyla and assess the effects of among-site rate variation on reconstructing phylogenies of the animal kingdom. This empirical assessment was undertaken to clarify fur- ther the limits of resolution of the 18S rRNA gene as a phylogenetic marker and to address the question of whether 18S rRNA phylogenies can be used as a source of evidence to infer the reality of a Cambrian explosion. A notable degree of among-site rate variation exists be- tween different regions of the 18S rRNA molecule, as well as within all classes of secondary structure. There is a significant negative correlation between inferred num- ber of nucleotide substitutions and phylogenetic infor- mation, as well as with the degree of substitutional satu- ration within the molecule. Base compositional differences both within and between taxa exist and, in certain lineages, may be associated with long branches and phylogenetic position. Importantly, excluding sites with different degrees of nucleotide substitution signifi- cantly influences the topology and degree of resolution of maximum-parsimony phylogenies as well as neigh- bor-joining phylogenies (corrected and uncorrected for among-site rate variation) reconstructed at the metazoan scale. Together, these data indicate that the 18S rRNA molecule is an unsuitable candidate for reconstructing the evolutionary history of all metazoan phyla, and that the polytomies, i.e., unresolved nodes within 18S rRNA phylogenies, cannot be used as a single or reliable source of evidence to support the hypothesis of a Cambrian explosion.

Journal ArticleDOI
TL;DR: The findings show that recalculation is necessary of all molecular datings based directly or indirectly on a Cercopithecoidea/Hominoidea split 30 MYBP, and all hominoid divergences receive a much earlier dating.
Abstract: The complete mitochondrial DNA (mtDNA) molecule of the hamadryas baboon, Papio hamadryas, was sequenced and included in a molecular analysis of 24 complete mammalian mtDNAs. The particular aim of the study was to time the divergence between Cercopithecoidea and Hominoidea. That divergence, set at 30 million years before present (MYBP) was a fundamental reference for the original proposal of recent hominoid divergences, according to which the split among gorilla, chimpanzee, and Homo took place 5 MYBP. In the present study the validity of the postulated 30 MYBP dating of the Cercopithecoidea/Hominoidea divergence was examined by applying two independent nonprimate molecular references, the divergence between artiodactyls and cetaceans set at 60 MYBP and that between Equidae and Rhinocerotidae set at 50 MYBP. After calibration for differences in evolutionary rates, application of the two references suggested that the Cercopithecoidea/Hominoidea divergence took place >50 MYBP. Consistent with the marked shift in the dating of the Cercopithecoidea/Hominoidea split, all hominoid divergences receive a much earlier dating. Thus the estimated date of the divergence between Pan (chimpanzee) and Homo is 10–13 MYBP and that between Gorilla and the Pan/Homo linage ≈17 MYBP. The same datings were obtained in an analysis of clocklike evolving genes. The findings show that recalculation is necessary of all molecular datings based directly or indirectly on a Cercopithecoidea/Hominoidea split 30 MYBP.

Journal ArticleDOI
TL;DR: The results show a peculiar and distinctly different DNA composition of SOPE with respect to the other obligate intracellular bacteria, and, combined with biological and biochemical data, they elucidate the evolution of symbiosis in S. oryzae.
Abstract: The principal intracellular symbiotic bacteria of the cereal weevil Sitophilus oryzae were characterized using the sequence of the 16S rDNA gene (rrs gene) and G + C content analysis. Polymerase chain reaction amplification with universal eubacterial primers of the rrs gene showed a single expected sequence of 1,501 bp. Comparison of this sequence with the available database sequences placed the intracellular bacteria of S. oryzae as members of the Enterobacteriaceae family, closely related to the free-living bacteria, Erwinia herbicola and Escherichia coli, and the endocytobiotic bacteria of the tsetse fly and aphids. Moreover, by high-performance liquid chromatography, we measured the genomic G + C content of the S. oryzae principal endocytobiotes (SOPE) as 54%, while the known genomic G + C content of most intracellular bacteria is about 39.5%. Furthermore, based on the third codon position G + C content and the rrs gene G + C content, we demonstrated that most intracellular bacteria except SOPE are A + T biased irrespective of their phylogenetic position. Finally, using the hsp60 gene sequence, the codon usage of SOPE was compared with that of two phylogenetically closely related bacteria: E. coli, a free-living bacterium, and Buchnera aphidicola, the intracellular symbiotic bacteria of aphids. Taken together, these results show a peculiar and distinctly different DNA composition of SOPE with respect to the other obligate intracellular bacteria, and, combined with biological and biochemical data, they elucidate the evolution of symbiosis in S. oryzae.

Journal ArticleDOI
TL;DR: The phylogenetic analysis of MutS family protein sequences indicates that the S. glaucum mtMSH protein is more closely related to the nuclear DNA-encoded mitochondrial mismatch repair protein (MSH1) of the yeast Saccharomyces cerevisiae than to eukaryotic homologue involved in nuclear function, or to bacterial homologues.
Abstract: The nucleotide sequences of two segments of 6,737 ntp and 258 ntp of the 18.4-kb circular mitochondrial (mt) DNA molecule of the soft coral Sarcophyton glaucum (phylum Cnidaria, class Anthozoa, subclass Octocorallia, order Alcyonacea) have been determined. The larger segment contains the 3′ 191 ntp of the gene for subunit 1 of the respiratory chain NADH dehydrogenase (ND1), complete genes for cytochrome b (Cyt b), ND6, ND3, ND4L, and a bacterial MutS homologue (MSH), and the 5’ terminal 1,124 ntp of the gene for the large subunit rRNA (1-rRNA). These genes are arranged in the order given and all are transcribed from the same strand of the molecule. The smaller segment contains the 3′ terminal 134 ntp of the ND4 gene and a complete tRNAf-Met gene, and these genes are transcribed in opposite directions. As in the hexacorallian anthozoan, Metridium senile, the mt-genetic code of S. glaucum is near standard: that is, in contrast to the situation in mt-genetic codes of other invertebrate phyla, AGA and AGG specify arginine, and ATA specifies isoleucine. However, as appears to be universal for metazoan mt-genetic codes, TGA specifies tryptophan rather than termination. Also, as in M. senile the mt-tRNAf-Met gene has primary and secondary structural features resembling those of Escherichia coli initiator tRNA, including standard dihydrouridine and Tv(iC loop sequences, and a mismatched nucleotide pair at the top of the amino-acyl stem. The presence of a mutS gene homologue, which has not been reported to occur in any other known mtDNA, suggests that there is mismatch repair activity in S. glaucum mitochondria. In support of this, phylogenetic analysis of MutS family protein sequences indicates that the S. glaucum mtMSH protein is more closely related to the nuclear DNA-encoded mitochondrial mismatch repair protein (MSH1) of the yeast Saccharomyces cerevisiae than to eukaryotic homologues involved in nuclear function, or to bacterial homologues. Regarding the possible origin of the S. glaucum mtMSH gene, the phylogenetic analysis results, together with comparative base composition considerations, and the absence of an MSH gene in any other known mtDNA best support the hypothesis that S. glaucum mtDNA acquired the mtMSH gene from nuclear DNA early in the evolution of octocorals. The presence of mismatch repair activity in S. glaucum mitochondria might be expected to influence the rate of evolution of this organism’s mtDNA.

Journal ArticleDOI
TL;DR: The hypothesis that Archaea and Bacteria were differentiated by the occurrence of cells enclosed by membranes of phospholipids with G-1-P and G-3-P as a backbone, respectively is proposed.
Abstract: One of the most remarkable biochemical differences between the members of two domains Archaea and Bacteria is the stereochemistry of the glycerophosphate backbone of phospholipids, which are exclusively opposite. The enzyme responsible to the formation of Archaea-specific glycerophosphate was found to be NAD(P)-linked sn-glycerol-1-phosphate (G-1-P) dehydrogenase and it was first purified from Methanobacterium thermoautotrophicum cells and its gene was cloned. This structure gene named egsA (enantiomeric glycerophosphate synthase) consisted of 1,041 bp and coded the enzyme with 347 amino acid residues. The amino acid sequence deduced from the base sequence of the cloned gene (egsA) did not share any sequence similarity except for NAD-binding region with that of NAD(P)-linked sn-glycerol-3-phosphate (G-3-P) dehydrogenase of Escherichia coli which catalyzes the formation of G-3-P backbone of bacterial phospholipids, while the deduced protein sequence of the enzyme revealed some similarity with bacterial glycerol dehydrogenases. Because G-1-P dehydrogenase and G-3-P dehydrogenase would originate from different ancestor enzymes and it would be almost impossible to interchange stereospecificity of the enzymes, it seems likely that the stereostructure of membrane phospholipids of a cell must be maintained from the time of birth of the first cell. We propose here the hypothesis that Archaea and Bacteria were differentiated by the occurrence of cells enclosed by membranes of phospholipids with G-1-P and G-3-P as a backbone, respectively.

Journal ArticleDOI
TL;DR: In this article, the origins of coded peptide synthesis are considered. But the origins hypothesis is based on the DRT (direct RNA templating) hypothesis, which does not account for other generalizable structural features which recur in hydrophobic RNA sites, such as internal and bulge loops.
Abstract: Numerous RNA binding sites for specific amino acids are now known, coming predominantly from selection-amplification experiments. These sites are chemically discriminating despite being predominantly small, simple RNA structures: internal and bulge loops. Recent studies of sites for hydrophobic side chains suggest that there are other generalizable structural features which recur in hydrophobic RNA sites. Further, sites for hydrophobic side chains can contain codons for the bound amino acid, as has also long been known for the polar amino acid arginine. Such findings are comprehensively reviewed, and the implications for the origin of coded peptide synthesis are considered. An origins hypothesis which accommodates all the data, DRT (direct RNA templating), is formulated.

Journal ArticleDOI
TL;DR: Cloned cytoplasmic intermediate filament (IF) proteins from a large number of invertebrate phyla are cloned using cDNA probes, the monoclonal antibody IFA, peptide sequence information, and various RT-PCR procedures to find two IF prototypes.
Abstract: We have cloned cytoplasmic intermediate filament (IF) proteins from a large number of invertebrate phyla using cDNA probes, the monoclonal antibody IFA, peptide sequence information, and various RT-PCR procedures. Novel IF protein sequences reported here include the urochordata and nine protostomic phyla, i.e., Annelida, Brachiopoda, Chaetognatha, Echiura, Nematomorpha, Nemertea, Platyhelminthes, Phoronida, and Sipuncula. Taken together with the wealth of data on IF proteins of vertebrates and the results on IF proteins of Cephalochordata, Mollusca, Annelida, and Nematoda, two IF prototypes emerge. The L-type, which includes 35 sequences from 11 protostomic phyla, shares with the nuclear lamins the long version of the coil 1b subdomain and, in most cases, a homology segment of some 120 residues in the carboxyterminal tail domain. The S-type, which includes all four subfamilies (types I to IV) of vertebrate IF proteins, lacks 42 residues in the coil 1b subdomain and the carboxyterminal lamin homology segment. Since IF proteins from all three phyla of the chordates have the 42-residue deletion, this deletion arose in a progenitor prior to the divergence of the chordates into the urochordate, cephalochordate, and vertebrate lineages, possibly already at the origin of the deuterostomic branch. Four phyla recently placed into the protostomia on grounds of their 18S rDNA sequences (Brachiopoda, Nemertea, Phoronida, and Platyhelminthes) show IF proteins of the L-type and fit by sequence identity criteria into the lophotrochozoic branch of the protostomia.

Journal ArticleDOI
TL;DR: This paper suggests, based on the energetic aspect of genome organization, that the emergence of eukaryotes was promoted by the establishment of an efficient energy-converting organelle, such as the mitochondrion, which was acquired by the endosymbiosis of ancient α-purple photosynthetic Gram-negative eubacteria that reorganized the prokaryotic metabolism of the archaebacterial-like ancestral host cells.
Abstract: One of the most important omissions in recent evolutionary theory concerns how eukaryotes could emerge and evolve. According to the currently accepted views, the first eukaryotic cell possessed a nucleus, an endomembrane system, and a cytoskeleton but had an inefficient prokaryotic-like metabolism. In contrast, one of the most ancient eukaryotes, the metamonada Giardia lamblia, was found to have formerly possessed mitochondria. In sharp contrast with the traditional views, this paper suggests, based on the energetic aspect of genome organization, that the emergence of eukaryotes was promoted by the establishment of an efficient energy-converting organelle, such as the mitochondrion. Mitochondria were acquired by the endosymbiosis of ancient α-purple photosynthetic Gram-negative eubacteria that reorganized the prokaryotic metabolism of the archaebacterial-like ancestral host cells. The presence of an ATP pool in the cytoplasm provided by this cell organelle allowed a major increase in genome size. This evolutionary change, the remarkable increase both in genome size and complexity, explains the origin of the eukaryotic cell itself.

Journal ArticleDOI
TL;DR: The data indicate that selection for translation efficiency plays a significant role in determining the codon bias of chloroplast genes but that it acts with different intensities in different lineages.
Abstract: In the plant chloroplast genome the codon usage of the highly expressed psbA gene is unique and is adapted to the tRNA population, probably due to selection for translation efficiency. In this study the role of selection on codon usage in each of the fully sequenced chloroplast genomes, in addition to Chlamydomonas re-inhardtii, is investigated by measuring adaptation to this pattern of codon usage. A method is developed which tests selection on each gene individually by constructing sequences with the same amino acid composition as the gene and randomly assigning codons based on the nucleotide composition of noncoding regions of that genome. The codon bias of the actual gene is then compared to a distribution of random sequences. The data indicate that within the algae selection is strong in Cy-anophora paradoxa, affecting a majority of genes, of intermediate intensity in Odontella sinensis, and weaker in Porphyra purpurea and Euglena gracilis. In the plants, selection is found to be quite weak in Pinus thun-bergii and the angiosperms but there is evidence that an intermediate level of selection exists in the liverwort Marchanda polymorpha. The role of selection is then further investigated in two comparative studies. It is shown that average relative codon bias is correlated with expression level and that, despite saturation levels of substitution, there is a strong correlation among the algae genomes in the degree of codon bias of homologous genes. All of these data indicate that selection for translation efficiency plays a significant role in determining the codon bias of chloroplast genes but that it acts with different intensities in different lineages. In general it is stronger in the algae than the higher plants, but within the algae Euglena is found to have several unusual features which are noted. The factors that might be responsible for this variation in intensity among the various genomes are discussed.

Journal ArticleDOI
TL;DR: The hypothesis regarding the relationship of bats to other eutherian mammals is concordant with previous molecular studies and contrasts with hypotheses based solely on morphological criteria and an incomplete fossil record.
Abstract: The complete mitochondrial genome was obtained from a microchiropteran bat, Artibeus jamaicensis. The presumptive amino acid sequence for the protein-coding genes was compared with predicted amino acid sequences from several representatives of other mammalian orders. Data were analyzed using maximum parsimony, maximum likelihood, and neighbor joining. All analyses placed bats as the sister group of carnivores, perissodactyls, artiodactyls, and cetaceans (e.g., 100% bootstrap value with both maximum parsimony and neighbor joining). The data strongly support a new hypothesis about the origin of bats, specifically a bat/ferungulate grouping. None of the analyses supported the superorder Archonta (bats, flying lemurs, primates, and tree shrews). Our hypothesis regarding the relationship of bats to other eutherian mammals is concordant with previous molecular studies and contrasts with hypotheses based solely on morphological criteria and an incomplete fossil record. The A. jamaicensis mitochondrial DNA control region has a complex pattern of tandem repeats that differs from previously reported chiropteran control regions.

Journal ArticleDOI
TL;DR: The phylogeny obtained in Medicago suggests that none of the three subsections included in the study is monophyletic, and ETS appears to be a valuable source of information for solidifying ITS plant phylogeny.
Abstract: We have estimated the potential phylogenetic utility of the ribosomal external transcribed spacer (ETS) from the nuclear ribosomal region. The ETS was sequenced from 13 annual Medicago (Fabaceae) species upstream a highly conserved motive which was found among many different organisms. In the genus Medicago, the ETS was found to evolve 1.5 times faster than the internal transcribed spacer and to be 1.5 times more informative. Reduced ribosomal maturation process constraints on ETS are proposed to explain the different evolutionary rates between the two spacers. Maximal phylogenetic resolution and support was obtained when the two spacers were analyzed together. No incongruence between the two spacers was found and ETS appears to be a valuable source of information for solidifying ITS plant phylogeny. The phylogeny obtained in Medicago suggests that none of the three subsections included in the study is monophyletic.

Journal ArticleDOI
TL;DR: It is shown that the detection of size homoplasy may alter phylogenetic reconstructions in the honey bee Apis mellifera and the bumble bee Bombus terrestris and the freshwater snail Bulinus truncatus and that this change in population structure led to a more marked population structure in B. terrestries and B. truncatus.
Abstract: Size homoplasy was analyzed at microsatellite loci by sequencing electromorphs, that is, variants of the same size (base pairs). This study was conducted using five interrupted and/or compound loci in three invertebrate species, the honey bee Apis mellifera, the bumble bee Bombus terrestris, and the freshwater snail Bulinus truncatus. The 15 electromorphs sequenced turned out to hide 31 alleles (i.e., variants identical in sequence). Variation in the amount of size homoplasy was detected among electromorphs and loci. From one to seven alleles were detected per electromorph, and one locus did not show any size homoplasy in both bee species. The amount of size homoplasy was related to the sequencing effort, since the number of alleles was correlated with the number of copies of electromorphs sequenced, but also with the molecular structure of the core sequence at each locus. Size homoplasy within populations was detected only three times, meaning that size homoplasy was detected mostly among populations. We analyzed population structure, estimating Fst and a genetic distance, based on either electromorphs or alleles. Whereas little difference was found in A. mellifera, uncovering size homoplasy led to a more marked population structure in B. terrestris and B. truncatus. We also showed in A. mellifera that the detection of size homoplasy may alter phylogenetic reconstructions.

Journal ArticleDOI
TL;DR: It is shown theoretically that the standard code may have been shaped by position-invariant forces such as mutation and base content, and may be a record of genetic process and patterns of mutation before the radiation of modern organisms and organelles.
Abstract: Distances between amino acids were derived from the polar requirement measure of amino acid polarity and Benner and co-workers' (1994) 74-100 PAM matrix. These distances were used to examine the average effects of amino acid substitutions due to single-base errors in the standard genetic code and equally degenerate randomized variants of the standard code. Second-position transitions conserved all distances on average, an order of magnitude more than did second-position transversions. In contrast, first-position transitions and transversions were about equally conservative. In comparison with randomized codes, second-position transitions in the standard code significantly conserved mean square differences in polar requirement and mean Benner matrix-based distances, but mean absolute value differences in polar requirement were not significantly conserved. The discrepancy suggests that these commonly used distance measures may be insufficient for strict hypothesis testing without more information. The translational consequences of single-base errors were then examined in different codon contexts, and similarities between these contexts explored with a hierarchical cluster analysis. In one cluster of codon contexts corresponding to the RNY and GNR codons, second-position transversions between C and G and transitions between C and U were most conservative of both polar requirement and the matrix-based distance. In another cluster of codon contexts, second-position transitions between A and G were most conservative. Despite the claims of previous authors to the contrary, it is shown theoretically that the standard code may have been shaped by position-invariant forces such as mutation and base content. These forces may have left heterogeneous signatures in the code because of differences in translational fidelity by codon position. A scenario for the origin of the code is presented wherein selection for error minimization could have occurred multiple times in disjoint parts of the code through a phyletic process of competition between lineages. This process permits error minimization without the disruption of previously useful messages, and does not predict that the code is optimally error-minimizing with respect to modern error. Instead, the code may be a record of genetic process and patterns of mutation before the radiation of modern organisms and organelles.