scispace - formally typeset
Search or ask a question

Showing papers in "Journal of Molecular Evolution in 2002"


Journal ArticleDOI
TL;DR: A comprehensive analysis of substitution rates in 50 RNA viruses using a recently developed maximum likelihood phylogenetic method revealed a significant relationship between genetic divergence and isolation time for an extensive array of RNA viruses, although more rate variation was usually present among lineages than would be expected under the constraints of a molecular clock.
Abstract: The study of rates of nucleotide substitution in RNA viruses is central to our understanding of their evolution. Herein we report a comprehensive analysis of substitution rates in 50 RNA viruses using a recently developed maximum likelihood phylogenetic method. This analysis revealed a significant relationship between genetic divergence and isolation time for an extensive array of RNA viruses, although more rate variation was usually present among lineages than would be expected under the constraints of a molecular clock. Despite the lack of a molecular clock, the range of statistically significant variation in overall substitution rates was surprisingly narrow for those viruses where a significant relationship between genetic divergence and time was found, as was the case when synonymous sites were considered alone, where the molecular clock was rejected less frequently. An analysis of the ecological and genetic factors that might explain this rate variation revealed some evidence of significantly lower substitution rates in vector-borne viruses, as well as a weak correlation between rate and genome length. Finally, a simulation study revealed that our maximum likelihood estimates of substitution rates are valid, even if the molecular clock is rejected, provided that sufficiently large data sets are analyzed.

632 citations


Journal ArticleDOI
TL;DR: The effect of recombination on phylogeny estimation depended on the relatedness of the sequences involved in the recombinational event and on the extent of the different regions with different phylogenetic histories.
Abstract: Phylogenetic studies based on DNA sequences typically ignore the potential occurrence of recombination, which may produce different alignment regions with different evolutionary histories. Traditional phylogenetic methods assume that a single history underlies the data. If recombination is present, can we expect the inferred phylogeny to represent any of the underlying evolutionary histories? We examined this question by applying traditional phylogenetic reconstruction methods to simulated recombinant sequence alignments. The effect of recombination on phylogeny estimation depended on the relatedness of the sequences involved in the recombinational event and on the extent of the different regions with different phylogenetic histories. Given the topologies examined here, when the recombinational event was ancient, or when recombination occurred between closely related taxa, one of the two phylogenies underlying the data was generally inferred. In this scenario, the evolutionary history corresponding to the majority of the positions in the alignment was generally recovered. Very different results were obtained when recombination occurred recently among divergent taxa. In this case, when the recombinational breakpoint divided the alignment in two regions of similar length, a phylogeny that was different from any of the true phylogenies underlying the data was inferred.

459 citations


Journal ArticleDOI
TL;DR: To test whether disordered protein evolves more rapidly than ordered protein, pairwise genetic distances were compared between the ordered and the disordered regions of 26 protein families having at least one member with a structurally characterized region of disorder of 30 or more consecutive residues.
Abstract: The dominant view in protein science is that a three-dimensional (3-D) structure is a prerequisite for protein function. In contrast to this dominant view, there are many counterexample proteins that fail to fold into a 3-D structure, or that have local regions that fail to fold, and yet carry out function. Protein without fixed 3-D structure is called intrinsically disordered. Motivated by anecdotal accounts of higher rates of sequence evolution in disordered protein than in ordered protein we are exploring the molecular evolution of disordered proteins. To test whether disordered protein evolves more rapidly than ordered protein, pairwise genetic distances were compared between the ordered and the disordered regions of 26 protein families having at least one member with a structurally characterized region of disorder of 30 or more consecutive residues. For five families, there were no significant differences in pairwise genetic distances between ordered and disordered sequences. The disordered region evolved significantly more rapidly than the ordered region for 19 of the 26 families. The functions of these disordered regions are diverse, including binding sites for protein, DNA, or RNA and also including flexible linkers. The functions of some of these regions are unknown. The disordered regions evolved significantly more slowly than the ordered regions for the two remaining families. The functions of these more slowly evolving disordered regions include sites for DNA binding. More work is needed to understand the underlying causes of the variability in the evolutionary rates of intrinsically ordered and disordered protein.

415 citations


Journal ArticleDOI
TL;DR: A phylogenetic analysis of the five major families of DNA polymerases is presented and it is proposed that the gamma DNA polymerase of the mitochondrion replication apparatus is of phage origin and that this gene replaced the one in the bacterial ancestor.
Abstract: A phylogenetic analysis of the five major families of DNA polymerase is presented. Viral and plasmid sequences are included in this compilation along with cellular enzymes. The classification by Ito and Braithwaite (Ito and Braithwaite 1991) of the A, B, C, D, and X families has been extended to accommodate the "Y family" of DNA polymerases that are related to the eukaryotic RAD30 and the bacterial UmuC gene products. After analysis, our data suggest that no DNA polymerase family was universally conserved among the three biological domains and no simple evolutionary scenario could explain that observation. Furthermore, viruses and plasmids carry a remarkably diverse set of DNA polymerase genes, suggesting that lateral gene transfer is frequent and includes non-orthologous gene displacements between cells and viruses. The relationships between viral and host genes appear very complex. We propose that the gamma DNA polymerase of the mitochondrion replication apparatus is of phage origin and that this gene replaced the one in the bacterial ancestor. Often there was no obvious relation between the viral and the host DNA polymerase, but an interesting exception concerned the family B enzymes: in which ancient gene exchange can be detected between the viruses and their hosts. Additional evidence for horizontal gene transfers between cells and viruses comes from an analysis of the small damage-inducible DNA polymerases. Taken together, these findings suggest a complex evolutionary history of the DNA replication apparatus that involved significant exchanges between viruses, plasmids, and their hosts.

244 citations


Journal ArticleDOI
TL;DR: A new substitution matrix for maximum likelihood (ML) phylogenetic analysis which has been optimized on a dataset of 33 amino acid sequences from the retroviral Pol proteins and yields higher log-likelihood values on a range of datasets.
Abstract: Retroviral and other reverse transcriptase (RT)-containing sequences may be subject to unique evolutionary pressures, and models of molecular sequence evolution developed using other kinds of sequences may not be optimal. Here we develop and present a new substitution matrix for maximum likelihood (ML) phylogenetic analysis which has been optimized on a dataset of 33 amino acid sequences from the retroviral Pol proteins. When compared to other matrices, this model (rtREV) yields higher log-likelihood values on a range of datasets including lentiviruses, spumaviruses, betaretroviruses, gammaretroviruses, and other elements containing reverse transcriptase. We provide evidence that rtREV is a more realistic evolutionary model for analyses of the pol gene, although it is inapplicable to analyses involving the gag gene.

224 citations


Journal ArticleDOI
TL;DR: Phylogenetic relationships among the NBS-LRR (nucleotide binding site–leucine-rich repeat) resistance gene homologues from 30 genera and nine families were evaluated relative to phylogenies for these taxa, suggesting preferential expansions or losses of certain RGH types within particular taxa and suggesting that no one species will provide models for all major sequence types in other taxa.
Abstract: Phylogenetic relationships among the NBS-LRR (nucleotide binding site–leucine-rich repeat) resistance gene homologues (RGHs) from 30 genera and nine families were evaluated relative to phylogenies for these taxa. More than 800 NBS-LRR RGHs were analyzed, primarily from Fabaceae, Brassicaceae, Poaceae, and Solanaceae species, but also from representatives of other angiosperm and gymnosperm families. Parsimony, maximum likelihood, and distance methods were used to classify these RGHs relative to previously observed gene subfamilies as well as within more closely related sequence clades. Grouping sequences using a distance cutoff of 250 PAM units (point accepted mutations per 100 residues) identified at least five ancient sequence clades with representatives from several plant families: the previously observed TIR gene subfamily and a minimum of four deep splits within the non-TIR gene subfamily. The deep splits in the non-TIR subfamily are also reflected in comparisons of amino acid substitution rates in various species and in ratios of nonsynonymous-to-synonymous nucleotide substitution rates (KA/KS values) in Arabidopsis thaliana. Lower KA/KS values in the TIR than the non-TIR sequences suggest greater functional constraints in the TIR subfamily. At least three of the five identified ancient clades appear to predate the angiosperm–gymnosperm radiation. Monocot sequences are absent from the TIR subfamily, as observed in previous studies. In both subfamilies, clades with sequences separated by approximately 150 PAM units are family but not genus specific, providing a rough measure of minimum dates for the first diversification event within these clades. Within any one clade, particular taxa may be dramatically over- or underrepresented, suggesting preferential expansions or losses of certain RGH types within particular taxa and suggesting that no one species will provide models for all major sequence types in other taxa.

219 citations


Journal ArticleDOI
TL;DR: The genomic structure of the gene encoding the lysozyme of Mytilus edulis, the common mussel is determined and it is proposed that the origin of this domain can be traced back in evolution to theorigin of bilaterian animals.
Abstract: We isolated and sequenced the cDNAs coding for lysozymes of six bivalve species. Alignment and phylogenetic analysis showed that, together with recently described bivalve lysozymes, the leech destabilase, and a number of putative proteins from extensive genomic and cDNA analyses, they belong to the invertebrate type of lysozymes (i type), first described by Jolles and Jolles (1975). We determined the genomic structure of the gene encoding the lysozyme of Mytilus edulis, the common mussel. We provide evidence that the central exon of this gene is homologous to the second exon of the chicken lysozyme gene, belonging to the c type. We propose that the origin of this domain can be traced back in evolution to the origin of bilaterian animals. Phylogenetic analysis suggests that i-type proteins form a monophyletic family.

204 citations


Journal ArticleDOI
TL;DR: In this article, the authors evaluate the general utility of sequences of the nuclear rDNA internal transcribed spacer (ITS) regions for phylogenetic analyses of animal species groups and their broader relationships, sequences were obtained for 19 species of the genus Haliotis plus a keyhole limpet and a more distantly related gastropod.
Abstract: To evaluate the general utility of sequences of the nuclear rDNA internal transcribed spacer (ITS) regions for phylogenetic analyses of animal species groups and their broader relationships, sequences were obtained for 19 species of the genus Haliotis plus a keyhole limpet and a more distantly related gastropod, the Chilean abalone. Three subclades of Haliotis species appear consistently, each encompassing little variation. They are (A) the North Pacific species, (B) the European species, and (C) the Australia species. The one Caribbean species examined clearly groups with the North Pacific clade, not the European clade. H. midae (South Africa) and H. diversicolor supertexta (Taiwan) both diverge basal to the European and Australian species groups in the phylogenetic trees. Sequence comparisons showed that one species of Haliotis, H. iris from New Zealand, is quite distant from the remaining Haliotis species, almost as much as the more obvious outgroup, the keyhole limpet, an observation common to other DNA sequence analyses of these taxa. Using the rate of nucleotide change calculated from the sister Caribbean-Pacific pair, the length of the H. iris long branch is compatible with the suggestion that its ancestry became isolated on New Zealand at Gondwandan breakup. Use of ITS permits a totally independent estimate of the phylogenetic relationships, yet branching order was very similar to that established using other DNA regions studied previously, including those under strong positive selection. Knowledge of the RNA transcript secondary structure is particularly useful in the optimal alignment of more distantly related taxa. The RNA transcript secondary structure of Haliotis ITS2 shows conservation of features found also in ITS2 of angiosperms and algal taxa. Since ITS, particularly ITS2, is not saturated with nucleotide changes even at the family level, it should be useful for phylogenetic reconstruction of animal groups, not just at the species and genus levels but perhaps also for families and above.

196 citations


Journal ArticleDOI
TL;DR: This analysis revealed that the MT-A70 family comprises four subfamilies with varying degrees of interrelatedness, and shows the permuted topology characteristic of the b class of MTases, which to date has only been known to include DNA MTases.
Abstract: MT-A70 is the S-adenosylmethionine-binding subunit of human mRNA:m(6)A methyl-transferase (MTase), an enzyme that sequence-specifically methylates adenines in pre-mRNAs. The physiological importance yet limited understanding of MT-A70 and its apparent lack of similarity to other known RNA MTases combined to make this protein an attractive target for bioinformatic analysis. The sequence of MT-A70 was subjected to extensive in silico analysis to identify orthologous and paralogous polypeptides. This analysis revealed that the MT-A70 family comprises four subfamilies with varying degrees of interrelatedness. One subfamily is a small group of bacterial DNA:m(6)A MTases. The other three subfamilies are paralogous eukaryotic lineages, two of which have not been associated with MTase activity but include proteins having substantial regulatory effects. Multiple sequence alignments and structure prediction for members of all four subfamilies indicated a high probability that a consensus MTase fold domain is present. Significantly, this consensus fold shows the permuted topology characteristic of the b class of MTases, which to date has only been known to include DNA MTases.

185 citations


Journal ArticleDOI
TL;DR: It is shown that aerobic prokaryotes display a significant increment in genomic GC% in relation to anaerobic ones, the first time that a link between a metabolic character and GC% has been found, independently of phylogenetic relationships and with a statistically significant amount of data.
Abstract: The huge variation in the genomic guanine plus cytosine content (GC%) among prokaryotes has been explained by two mutually exclusive hypotheses, namely, selectionist and neutralist. The former proposals have in common the assumption that this feature is a form of adaptation to some ecological or physiological condition. On the other hand, the neutralist interpretation states that the variations are due only to different mutational biases. Since all of the traits that have been proposed by the selectionists either appeared to be limited to certain genera or were invalidated by the availability of more data, they cannot be considered as a selective force influencing the genomic GC% across all prokaryotes. In this report we show that aerobic prokaryotes display a significant increment in genomic GC% in relation to anaerobic ones. This is the first time that a link between a metabolic character and GC% has been found, independently of phylogenetic relationships and with a statistically significant amount of data.

145 citations


Journal ArticleDOI
TL;DR: This work reports a uniform substitution rate in IR-less genomes, and finds this rate to be at the level otherwise reserved for SC genes, and proposes that this acceleration is a direct result of the decrease in the copy number of the sequence.
Abstract: The chloroplast genomes of some species of legumes lack the large inverted repeat (IR) that is a trademark of most land-plant chloroplasts. Our analysis of chloroplast genes in legume species that have an IR shows that the synonymous (silent) substitution rate in IR genes is 2.3-fold lower than in single-copy (SC) genes, which is largely in agreement with earlier findings. Given that all genes in species that lack the IR are single-copy, what level of synonymous substitution exists in these genes? We report a uniform substitution rate in IR-less genomes, and moreover, we find this rate to be at the level otherwise reserved for SC genes. In other words, the synonymous substitution rate has accelerated in the remaining copy of the duplicate region. We propose that this acceleration is a direct result of the decrease in the copy number of the sequence, rather than an intrinsic property of the genes normally located in the IR.

Journal ArticleDOI
TL;DR: A detailed genomic analysis of each of the sequences involved in the two lysine anabolic routes, as well as of genes from other routes related to them, demonstrates a clear relationship between the DAP and Arg routes, and between the AAA and Leu pathways.
Abstract: Among the different biosynthetic pathways found in extant organisms, lysine biosynthesis is peculiar because it has two different anabolic routes. One is the diaminopimelic acid pathway (DAP), and the other over the a-aminoadipic acid route (AAA). A variant of the AAA route that includes some enzymes involved in arginine and leucine biosyntheses has been recently reported in Thermus thermophilus (Nishida et al. 1999). Here we describe the results of a detailed genomic analysis of each of the sequences involved in the two lysine anabolic routes, as well as of genes from other routes related to them. No evidence was found of an evolutionary relationship between the DAP and AAA enzymes. Our results suggest that the DAP pathway is related to arginine metabolism, since the lysC, asd, dapC, dapE, and lysA genes from lysine biosynthesis are related to the argB, argC, argD, argE, and speAC genes, respectively, whose products catalyze different steps in arginine metabolism. This work supports previous reports on the relationship between AAA gene products and some enzymes involved in leucine biosynthesis and the tricarboxylic acid cycle (Irvin and Bhattacharjee 1998; Miyazaki et al. 2001). Here we discuss the significance of the recent finding that several genes involved in the arginine (Arg) and leucine (Leu) biosynthesis participate in a new alternative route of the AAA pathway (Miyazaki et al. 2001). Our results demonstrate a clear relationship between the DAP and Arg routes, and between the AAA and Leu pathways.

Journal ArticleDOI
TL;DR: Analysis of a Bayesian phylogeny of the OXA b-lactamase genes shows that much of the diversity is the result of ancient events and that the OxA genes were mobilized from chromosomes to plasmids on at least two independent occasions that occurred millions of years ago.
Abstract: The OXA genes encode a class of b-lactamases that confer resistance to a wide range of b-lactam antibiotics. To determine whether the diversity of the OXA b-lactamases is the result of recent or ancient events, and to determine whether mobilization of the OXA genes from chromosomes to plasmids occurred recently or long ago, we have constructed a Bayesian phylogeny of the OXA b-lactamase genes. Analysis of that phylogeny shows that much of the diversity is the result of ancient events and that the OXA genes were mobilized from chromosomes to plasmids on at least two independent occasions that occurred millions of years ago. That observation contradicts the commonly held impression that mobilization of antibiotic resistance genes is strictly the result of modern use of antibiotics.

Journal ArticleDOI
TL;DR: The phylogenetic distribution of recombination events among human HBV genotypes was examined and it was found that genotypes A plus D, and genotypes B plus C, had distinct patterns of recombinations suggesting differing epidemiological relationships among them, suggesting divergence in humans and apes has occurred only in the last 6000 years.
Abstract: Previous studies of the evolutionary history of hepatitis B virus (HBV) have been compromised by intergenotype recombination and complex patterns of nucleotide substitution, perhaps caused by differential selection pressures. We examined the phylogenetic distribution of recombination events among human HBV genotypes and found that genotypes A plus D, and genotypes B plus C, had distinct patterns of recombination suggesting differing epidemiological relationships among them. By analyzing the nonoverlapping regions of the viral genome we found strong bootstrap support for some intergenotypic groupings, with evidence of a division between human genotypes A–E from the viruses sampled from apes and human genotype F. However, the earliest events in the divergence of HBV remain uncertain. These uncertainties could not be explained by differential selection pressures, as the ratio of nonsynonymous-to-synonymous substitutions (d N/d S) did not vary extensively among lineages and there is no strong evidence for positive selection across the whole tree. Finally, we provide a new estimate of the mean substitution rate in HBV, 4.2 × 10−5, which suggests that divergence of HBV in humans and apes has occurred only in the last 6000 years.

Journal ArticleDOI
TL;DR: In this article, the authors constructed an explicit model for the evolution of regulatory sequences, making use of the known biophysics of the binding of regulatory proteins to DNA sequences, under the assumption that fitness of a sequence depends only on its binding affinity to the regulatory protein.
Abstract: The mutation and selection of regulatory DNA sequences are presented as an ideal model system of molecular evolution where genotype, phenotype, and fitness can be explicitly and independently characterized. In this theoretical study, we construct an explicit model for the evolution of regulatory sequences, making use of the known biophysics of the binding of regulatory proteins to DNA sequences, under the assumption that fitness of a sequence depends only on its binding affinity to the regulatory protein. The model is confined to the mean field (i.e., infinite population size) limit. Using realistic values for all parameters, we determine the minimum fitness advantage needed to maintain a binding sequence, demonstrating explicitly the "error threshold" below which a binding sequence cannot survive the accumulated effect of mutation over long time. The commonly observed "fuzziness" in binding motifs arises naturally as a consequence of the balance between selection and mutation in our model. In addition, we devise a simple model for the evolution of multiple binding sequences in a given regulatory region. We find the number of evolutionarily stable binding sequences to increase in a step-like fashion with increasing fitness advantage, if multiple regulatory proteins can synergistically enhance gene transcription. We discuss possible experimental approaches to resolve open questions raised by our study.

Journal ArticleDOI
TL;DR: Analysis of FST values, differentiation indexes, and geographic distances separating populations revealed that genetic differences between populations depended on the species' history of migration and colonization.
Abstract: Isoenzyme variation was assessed in 79 mosquito samples of Aedes aegypti, and susceptibility to a dengue 2 virus strain was evaluated in 83 samples. Analysis of FST values, differentiation indexes, and geographic distances separating populations revealed that genetic differences between populations depended on the species' history of migration and colonization. Three major clusters were identified: (1). the sylvan form, Ae. ae. formosus, from West Africa and some islands in the Indian Ocean; (2). the domestic form, Ae. ae. aegypti, from Southeast Asia and South America; and (3). Ae. ae. aegypti populations from the South Pacific islands. Two groups were identified on the basis of susceptibility to the dengue virus: (1). populations with high infection rates, mostly the Ae. ae. aegypti form, and (2). mosquitoes with lower infection rates, specifically Ae. ae. formosus. Other evolutionary and epidemiological implications of the genetic variability of Ae. aegypti are also discussed.

Journal ArticleDOI
TL;DR: Application of the same methodology to the cAMP-binding domains, and subsequently to the region delimited by β-strands 6 and 7 of the crystal structures of bovine RIα and rat RIIβ, proved that this highly conserved region was enough to classify unequivocally the members of the PKA-R family.
Abstract: The members of the PKA regulatory subunit family (PKA-R family) were analyzed by multiple sequence alignment and clustering based on phylogenetic tree construction. According to the phylogenetic trees generated from multiple sequence alignment of the complete sequences, the PKA-R family was divided into four subfamilies (types I to IV). Members of each subfamily were exclusively from animals (types I and II), fungi (type III), and alveolates (type IV). Application of the same methodology to the cAMP-binding domains, and subsequently to the region delimited by β-strands 6 and 7 of the crystal structures of bovine RIα and rat RIIβ (the phosphate-binding cassette; PBC), proved that this highly conserved region was enough to classify unequivocally the members of the PKA-R family. A single signature sequence, F–G–E–[LIV]–A–L–[LIMV]–x(3)–[PV]–R–[ANQV]–A, corresponding to the PBC was identified which is characteristic of the PKA-R family and is sufficient to distinguish it from other members of the cyclic nucleotide-binding protein superfamily. Specific determinants for the A and B domains of each R-subunit type were also identified. Conserved residues defining the signature motif are important for interaction with cAMP or for positioning the residues that directly interact with cAMP. Conversely, residues that define subfamilies or domain types are not conserved and are mostly located on the loop that connects α-helix B′ and β strand 7.

Journal ArticleDOI
TL;DR: This study describes the ysa locus from A127/90, another strain of serotype O:8, and extends the sequence to several new genes encoding Ysp proteins which are the substrates of this secretion system, and a putative chaperone SycB.
Abstract: Several Gram negative bacteria use a complex system called "type III secretion system" (TTSS) to engage their host. The archetype of TTSS is the plasmid-encoded "Yop virulon" shared by the three species of pathogenic Yersinia (Y. pestis, Y. pseudotuberculosis, and Y. enterocolitica). A second TTSS, called Ysa (for Yersinia secretion apparatus) was recently described in Y. enterocolitica 8081, a strain from serotype O:8. In this study, we describe the ysa locus from A127/90, another strain of serotype O:8, and we extend the sequence to several new genes encoding Ysp proteins which are the substrates of this secretion system, and a putative chaperone SycB. According to the deduced protein sequences, the ysa system from A127/90 is identical to that of 8081. It is different from the chromosome-encoded TTSS of Y. pestis but is instead closely related to the Mxi-Spa TTSS of Shigella and to the SPI-1 encoded TTSS of Salmonella enterica. We further demonstrated that the ysa locus is only present in biotype IB strains of Y. enterocolitica. Including this new Ysa system, a phylogenetic analysis of the 26 known TTSSs was carried out, based on the sequence analysis of three conserved proteins. All the TTSSs fall into five different clusters. The phylogenetic tree of these TTSSs is completely different from the evolutionary tree based on 16S RNA, indicating that TTSSs have been distributed by horizontal transfer.

Journal ArticleDOI
TL;DR: A refined evolutionary scenario of the NAD(P)-dependent malate and NAD-dependent lactate super-family is elaboration, in which the selection of L-LDH and the fate of L -MalDH during mitochrondrial genesis are presented.
Abstract: The NAD(P)-dependent malate (L-MalDH) and NAD-dependent lactate (L-LDH) form a large super-family that has been characterized in organisms belonging to the three domains of life. In the first part of this study, the group of [LDH-like] L-MalDH, which are malate dehydrogenases resembling lactate dehydrogenase, were analyzed and clearly defined with respect to the other enzymes. In the second part, the phylogenetic relationships of the whole super-family were presented by taking into account the [LDH-like] L-MalDH. The inferred tree unambiguously shows that two ancestral genes duplications, and not one as generally thought, are needed to explain both the distribution into two enzymatic functions and the observation of three main groups within the super-family: L-LDH, [LDH-like] L-MalDH, and dimeric L-MalDH. In addition, various cases of functional changes within each group were observed and analyzed. The direction of evolution was found to always be polarized: from enzymes with a high stringency of substrate recognition to enzymes with a broad substrate specificity. A specific phyletic distribution of the L-LDH, [LDH-like] L-MalDH, and dimeric L-MalDH over the Archaeal, Bacterial, and Eukaryal domains was observed. This was analyzed in the light of biochemical, structural, and genomic data available for the L-LDH, [LDH-like] L-MalDH, and dimeric L-MalDH. This analysis led to the elaboration of a refined evolutionary scenario of the super-family, in which the selection of L-LDH and the fate of L-MalDH during mitochrondrial genesis are presented.

Journal ArticleDOI
TL;DR: Comparisons between a wide range of Acropora species showed that a long hairpin predicted in rns-cox3 is phylogenetically conserved, and allowed the tentative identification of conserved sequence blocks.
Abstract: The complete nucleotide sequence of the mitochondrial genome of the coral Acropora tenuis has been determined. The 18,338 bp A. tenuis mitochondrial genome contains the standard metazoan complement of 13 protein-coding and two rRNA genes, but only the same two tRNA genes (trnM and trnW) as are present in the mtDNA of the sea anemone, Metridium senile. The A. tenuis nad5 gene is interrupted by a large group I intron which contains ten protein-coding genes and rns; M. senile has an intron at the same position but this contains only two protein-coding genes. Despite the large distance (about 11.5 kb) between the 5?-exon and 3?-exon boundaries, the A. tenuis nad5 gene is functional, as we were able to RT-PCR across the predicted intron splice site using total RNA from A. tenuis. As in M. senile, all of the genes in the A. tenuis mt genome have the same orientation, but their organization is completely different in these two zoantharians: The only common gene boundaries are those at each end of the group I intron and between trnM and rnl. Finally, we provide evidence that the rns-cox3 intergenic region in A. tenuis may correspond to the mitochondrial control region of higher animals. This region contains repetitive elements, and has the potential to form secondary structures of the type characteristic of vertebrate D-loops. Comparisons between a wide range of Acropora species showed that a long hairpin predicted in rns-cox3 is phylogenetically conserved, and allowed the tentative identification of conserved sequence blocks.

Journal ArticleDOI
TL;DR: This work reports the identification of a protein containing the MAC/perforin module from the invertebrate cephalochordate, amphioxus (Branchiostoma belcheri), using expressed sequence tag (EST) analysis of the notochord, the first molecular evidence for complement-mediated immunological cytotoxicity in invertebrates.
Abstract: The mammalian immune system has cytotoxic mechanisms, both cellular and humoral, that destroy the membrane integrity of target cells. The main effector molecules of these cytolytic mechanisms-perforin, used by killer lymphocytes, and the membrane attack complex (MAC) components of the complement system-share a unique module called the MAC/perforin module. Until now, both immunological cytotoxicity and the MAC/perforin module have been reported only in jawed vertebrates. Here, we report the identification of a protein containing the MAC/perforin module from the invertebrate cephalochordate, amphioxus ( Branchiostoma belcheri), using expressed sequence tag (EST) analysis of the notochord. The deduced amino acid sequence of this molecule is most similar to the primary structure of human complement component C6 and is designated AmphiC6. AmphiC6 shares a unique modular structure, including the MAC/perforin module, with human C6 and other MAC components. Another EST clone predicts the presence of a thioester-containing protein with the closest structural similarity to vertebrate C3 (therefore designated AmphiC3). AmphiC3 retains most of the functionally important residues of vertebrate C3 and is shown by phylogenetic analysis to be derived directly from the common ancestor of vertebrate C3, C4, and C5. Only opsonic activity has been assigned to the invertebrate complement system until now. Therefore, this is the first molecular evidence for complement-mediated immunological cytotoxicity in invertebrates.

Journal ArticleDOI
TL;DR: Artificial ``partial genomes'' were generated by randomly selecting ORFs from the complete genomes in order to test the ability to recover the tree generated by the whole genome sequences when only partial data are available, and indicated that partial genomic data, when sampled randomly, could robustly recover the family tree created by the entire genome sequences.
Abstract: Genomic trees have been constructed based on the presence and absence of families of protein-encoding genes observed in 27 complete genomes, including genomes of 15 free-living organisms. This method does not rely on the identification of suspected orthologs in each genome, nor the specific alignment used to compare gene sequences because the protein-encoding gene families are formed by grouping any protein with a pairwise similarity score greater than a preset value. Because of this all inclusive grouping, this method is resilient to some effects of lateral gene transfer because transfers of genes are masked when the recipient genome already has a homolog (not necessarily an ortholog) of the incoming gene. Of 71 genes suspected to have been laterally transferred to the genome of Aeropyrum pernix, only approximately 7 to 15 represent genes where a lateral gene transfer appears to have generated homoplasy in our character dataset. The genomic tree of the 15 free-living taxa includes six different bacterial orders, six different archaeal orders, and two different eukaryotic kingdoms. The results are remarkably similar to results obtained by analysis of rRNA. Inclusion of the other 12 genomes resulted in a tree only broadly similar to that suggested by rRNA with at least some of the differences due to artifacts caused by the small genome size of many of these species. Very small genomes, such as those of the two Mycoplasma genomes included, fall to the base of the Bacterial domain, a result expected due to the substantial gene loss inherent to these lineages. Finally, artificial "partial genomes" were generated by randomly selecting ORFs from the complete genomes in order to test our ability to recover the tree generated by the whole genome sequences when only partial data are available. The results indicated that partial genomic data, when sampled randomly, could robustly recover the tree generated by the whole genome sequences.

Journal ArticleDOI
TL;DR: The hypothesis that the three RNR classes diverged from a common ancestor currently represented by the anaerobic class III is supported and lateral transfer appears to have played a significant role in the evolution of this protein family.
Abstract: Ribonucleotide reductases (RNRs) are uniquely responsible for converting nucleotides to deoxynucleotides in all dividing cells. The three known classes of RNRs operate through a free radical mechanism but differ in the way in which the protein radical is generated. Class I enzymes depend on oxygen for radical generation, class II uses adenosylcobalamin, and the anaerobic class III requires S-adenosylmethionine and an iron–sulfur cluster. Despite their metabolic prominence, the evolutionary origin and relationships between these enzymes remain elusive. This gap in RNR knowledge can, to a major extent, be attributed to the fact that different RNR classes exhibit greatly diverged polypeptide chains, rendering homology assessments inconclusive. Evolutionary studies of RNRs conducted until now have focused on comparison of the amino acid sequence of the proteins, without considering how they fold into space. The present study is an attempt to understand the evolutionary history of RNRs taking into account their three-dimensional structure. We first infer the structural alignment by superposing the equivalent stretches of the three-dimensional structures of representatives of each family. We then use the structural alignment to guide the alignment of all publicly available RNR sequences. Our results support the hypothesis that the three RNR classes diverged from a common ancestor currently represented by the anaerobic class III. Also, lateral transfer appears to have played a significant role in the evolution of this protein family.

Journal ArticleDOI
TL;DR: A comparison of the tree topologies derived from SSU rDNA sequences with characters previously used in cryptophyte systematics revealed that the biliprotein type was congruent, but the type of inner periplast component incongruent, with the molecular trees, indicative of a hidden cellular dimorphism of presumably widespread occurrence throughout cryptophyta diversity.
Abstract: The plastid-bearing members of the Cryptophyta contain two functional eukaryotic genomes of different phylogenetic origin, residing in the nucleus and in the nucleomorph, respectively. These widespread and diverse protists thus offer a unique opportunity to study the coevolution of two different eukaryotic genomes within one group of organisms. In this study, the SSU rRNA genes of both genomes were PCR-amplified with specific primers and phylogenetic analyses were performed on different data sets using different evolutionary models. The results show that the composition of the principal clades obtained from the phylogenetic analyses of both genes was largely congruent, but striking differences in evolutionary rates were observed. These affected the topologies of the nuclear and nucleomorph phylogenies differently, resulting in long-branch attraction artifacts when simple evolutionary models were applied. Deletion of long-branch taxa stabilized the internal branching order in both phylogenies and resulted in a completely resolved topology in the nucleomorph phylogeny. A comparison of the tree topologies derived from SSU rDNA sequences with characters previously used in cryptophyte systematics revealed that the biliprotein type was congruent, but the type of inner periplast component incongruent, with the molecular trees. The latter is indicative of a hidden cellular dimorphism (cells with two periplast types present in a single clonal strain) of presumably widespread occurrence throughout cryptophyte diversity, which, in consequence, has far-reaching implications for cryptophyte systematics as it is practiced today.

Journal ArticleDOI
TL;DR: It is concluded that the universal genetic code originated not from a three-amino acid system but from a four-amINO acid system, the GNC code encoding [GADV]-proteins, as the most primitive genetic code.
Abstract: We have previously proposed an SNS hypothesis on the origin of the genetic code (Ikehara and Yoshida 1998). The hypothesis predicts that the universal genetic code originated from the SNS code composed of 16 codons and 10 amino acids (S and N mean G or C and either of four bases, respectively). But, it must have been very difficult to create the SNS code at one stroke in the beginning. Therefore, we searched for a simpler code than the SNS code, which could still encode water-soluble globular proteins with appropriate three-dimensional structures at a high probability using four conditions for globular protein formation (hydropathy, α-helix, β-sheet, and β-turn formations). Four amino acids (Gly [G], Ala [A], Asp [D], and Val [V]) encoded by the GNC code satisfied the four structural conditions well, but other codes in rows and columns in the universal genetic code table do not, except for the GNG code, a slightly modified form of the GNC code. Three three-amino acid systems ([D], Leu and Tyr; [D], Tyr and Met; Glu, Pro and Ile) also satisfied the above four conditions. But, some amino acids in the three systems are far more complex than those encoded by the GNC code. In addition, the amino acids in the three-amino acid systems are scattered in the universal genetic code table. Thus, we concluded that the universal genetic code originated not from a three-amino acid system but from a four-amino acid system, the GNC code encoding [GADV]-proteins, as the most primitive genetic code.

Journal ArticleDOI
TL;DR: The evolutionary patterns of hepatitis C virus, including the best-fitting nucleotide substitution model and the molecular clock hypothesis, were investigated by analyzing full-genome sequences available in the HCV database and the likelihood ratio test allowed us to discriminate among different evolutionary hypotheses.
Abstract: The evolutionary patterns of hepatitis C virus (HCV), including the best-fitting nucleotide substitution model and the molecular clock hypothesis, were investigated by analyzing full-genome sequences available in the HCV database. The likelihood ratio test allowed us to discriminate among different evolutionary hypotheses. The phylogeny of the six major HCV types was accurately inferred, and the final tree was rooted by reconstructing the hypothetical HCV common ancestor with the maximum likelihood method. The presence of phylogenetic noise and the relative nucleotide substitution rates in the different HCV genes were also examined. These results offer a general guideline for the future of HCV phylogenetic analysis and also provide important insights on HCV origin and evolution.

Journal ArticleDOI
TL;DR: It is concluded that glucose transporters in C. albicans comprise a family of 20 known members, with variable transcription in response to glucose concentration, with a possible functional convergence.
Abstract: We have identified a large family of glucose transporter genes (HGT1 to HGT20) from the human pathogenic yeast Candida albicans by screening of genomic sequences, reverse-transcription PCR assays, and phylogenetic analyses The putative glucose transporter ORF sequences share among themselves 10-93% pairwise sequence identity and, in comparative analyses of predicted amino acid sequences, exhibit similarities to human and yeast transporters of the major facilitator superfamily (MFS): the predicted 12-transmembrane domains and sugar transporter signatures align closely to those of HXT transporters of Saccharomyces cerevisiae and GLUT transporters of humans, with amino acid residues at certain positions highly conserved throughout the families Reverse-transcription PCR analyses demonstrated that the majority of the glucose transporters was transcribed in culture medium containing 2% glucose, while several were transcribed in the presence of low (02%) and/or high (5%) concentrations of glucose Phylogenetic analyses revealed that there were three distinct clades of 20 HGT genes, which might represent three possible subfamilies Additionally, HGT18 and HGT20 show a high overall sequence identity to the human GLUTs, indicating a possible functional convergence We conclude that glucose transporters in C albicans comprise a family of 20 known members, with variable transcription in response to glucose concentration

Journal ArticleDOI
TL;DR: Analysis of new data from the complete mitochondrial genomes of a second monotreme, the spiny anteater, and another marsupial, the wombat yielded clear support for the Marsupionta hypothesis, consistent with a basal split between eutherians and marsupials/monotremes among extant mammals.
Abstract: The monotremes, the duck-billed platypus and the echidnas, are characterized by a number of unique morphological characteristics, which have led to the common belief that they represent the living survivors of an ancestral stock of mammals. Analysis of new data from the complete mitochondrial (mt) genomes of a second monotreme, the spiny anteater, and another marsupial, the wombat, yielded clear support for the Marsupionta hypothesis. According to this hypothesis marsupials are more closely related to monotremes than to eutherians, consistent with a basal split between eutherians and marsupials/monotremes among extant mammals. This finding was also supported by analysis of new sequences from a nuclear gene—18S rRNA. The mt genome of the wombat shares some unique features with previously described marsupial mtDNAs (tRNA rearrangement, a missing tRNALys, and evidence for RNA editing of the tRNAAsp). Molecular estimates of genetic divergence suggest that the divergence between the platypus and the spiny anteater took place ≈34 million years before present (MYBP), and that between South American and Australian marsupials ≈72 MYBP.

Journal ArticleDOI
TL;DR: The present findings provided a basis on which to classify JCV into types or subtypes, they have several implications for the divergence and migration of human populations.
Abstract: The polyomavirus JC virus (JCV), the etiological agent of progressive multifocal leukoencephalopathy, is ubiquitous in the human population, infecting children asymptomatically, then persisting in the kidney The main mode of transmission of JCV is from parents to children through long-term cohabitation Twelve JCV subtypes that occupy unique domains in Europe, Africa, and Asia have been identified Here, we attempted to elucidate the evolutionary relationships among JCV strains worldwide using the whole-genome approach with which a highly reliable phylogeny of JCV strains can be reconstructed Sixty-five complete JCV DNA sequences, derived from various geographical regions and belonging to 11 of the 12 known subtypes, were subjected to phylogenetic analysis using three independent methods: the neighbor-joining, maximum parsimony, and maximum likelihood methods The trees obtained with these methods consistently indicated that ancestral JCVs were divided into three superclusters, designated as Types A, B, and C A split in Type A generated two subtypes, EU-a and -b, mainly containing European and Mediterranean strains The first split in Type B generated Af2 (the major African subtype) Subsequent splits in Type B generated B1-c (a minor European subtype) and all seven Asian subtypes (B1-a, -b, -d, B2, MY, CY, and SC) Type C generated a single subtype (Af1), consisting of strains derived from western Africa While the present findings provided a basis on which to classify JCV into types or subtypes, they have several implications for the divergence and migration of human populations

Journal ArticleDOI
TL;DR: It is suggested that the homogenizing and diversifying roles of conversion interact to drive dynamic concerted evolution of the hsp70 genes.
Abstract: We analyzed nucleotide variation in the hsp70 genes of Drosophila melanogaster (five genes) and D. simulans (four genes) to characterize the homogenizing and diversifying roles of gene conversion in their evolution. Gene conversion within and between the 87A7 and 87C1 gene clusters homogenize the hsp70 coding regions; in both D. melanogaster and D. simulans, same-cluster paralogues are virtually identical, and large intercluster conversion tracts diminish 87A7/87C1 divergence. Same-cluster paralogues share many polymorphisms, consistent with frequent intracluster conversion. Shared polymorphism is highly biased toward silent variation; homogenizing conversion interacts with purifying selection. In contrast to the coding regions, some hsp70 flanking regions show conversion-mediated diversification. Strong reductions of nucleotide variability and linkage disequilibria among conversion-mediated sites in hsp70Ab and hsp70Bb alleles sampled from a single natural population are consistent with a selective sweep. Comparison of the D. melanogaster and D. simulans hsp70 genes reveals whole-family fixed differences, consistent with rapid propagation of novel mutations among duplicate genes. These results suggest that the homogenizing and diversifying roles of conversion interact to drive dynamic concerted evolution of the hsp70 genes.