scispace - formally typeset
Search or ask a question

Showing papers in "Molecular Biology and Evolution in 2000"


Journal ArticleDOI
TL;DR: A computerized method is presented that reduces to a certain extent the necessity of manually editing multiple alignments, makes the automation of phylogenetic analysis of large data sets feasible, and facilitates the reproduction of the final alignment by other researchers.
Abstract: The use of some multiple-sequence alignments in phylogenetic analysis, particularly those that are not very well conserved, requires the elimination of poorly aligned positions and divergent regions, since they may not be homologous or may have been saturated by multiple substitutions. A computerized method that eliminates such positions and at the same time tries to minimize the loss of informative sites is presented here. The method is based on the selection of blocks of positions that fulfill a simple set of requirements with respect to the number of contiguous conserved positions, lack of gaps, and high conservation of flanking positions, making the final alignment more suitable for phylogenetic analysis. To illustrate the efficiency of this method, alignments of 10 mitochondrial proteins from several completely sequenced mitochondrial genomes belonging to diverse eukaryotes were used as examples. The percentages of removed positions were higher in the most divergent alignments. After removing divergent segments, the amino acid composition of the different sequences was more uniform, and pairwise distances became much smaller. Phylogenetic trees show that topologies can be different after removing conserved blocks, particularly when there are several poorly resolved nodes. Strong support was found for the grouping of animals and fungi but not for the position of more basal eukaryotes. The use of a computerized method such as the one presented here reduces to a certain extent the necessity of manually editing multiple alignments, makes the automation of phylogenetic analysis of large data sets feasible, and facilitates the reproduction of the final alignment by other researchers.

8,757 citations


Journal ArticleDOI
TL;DR: A new approximate method is proposed that takes into account two major features of DNA sequence evolution: transition/transversion rate bias and base/codon frequency bias and is superior to earlier approximate methods and may be useful for analyzing large data sets, although maximum likelihood appears to always be the method of choice.
Abstract: Approximate methods for estimating the numbers of synonymous and nonsynonymous substitutions between two DNA sequences involve three steps: counting of synonymous and nonsynonymous sites in the two sequences, counting of synonymous and nonsynonymous differences between the two sequences, and correcting for multiple substitutions at the same site. We examine complexities involved in those steps and propose a new approximate method that takes into account two major features of DNA sequence evolution: transition/transversion rate bias and base/codon frequency bias. We compare the new method with maximum likelihood, as well as several other approximate methods, by examining infinitely long sequences, performing computer simulations, and analyzing a real data set. The results suggest that when there are transition/transversion rate biases and base/codon frequency biases, previously described approximate methods for estimating the nonsynonymous/synonymous rate ratio may involve serious biases, and the bias can be both positive and negative. The new method is, in general, superior to earlier approximate methods and may be useful for analyzing large data sets, although maximum likelihood appears to always be the method of choice.

1,552 citations


Journal ArticleDOI
TL;DR: The authors analyzed sequence variation for chalcone synthase (Chs) and alcohol dehydrogenase (Adh) loci in 28 species in the genera Arabidopsis and Arabis and related taxa from tribe Arabideae.
Abstract: We analyzed sequence variation for chalcone synthase (Chs) and alcohol dehydrogenase (Adh) loci in 28 species in the genera Arabidopsis and Arabis and related taxa from tribe Arabideae. Chs was single-copy in nearly all taxa examined, while Adh duplications were found in several species. Phylogenies constructed from both loci confirmed that the closest relatives of Arabidopsis thaliana include Arabidopsis lyrata, Arabidopsis petraea, and Arabidopsis halleri (formerly in the genus Cardaminopsis). Slightly more distant are the North American n = 7 Arabis (Boechera) species. The genus Arabis is polyphyletic-some unrelated species appear within this taxonomic classification, which has little phylogenetic meaning. Fossil pollen data were used to compute a synonymous substitution rate of 1.5 x 10 substitutions per site per year for both Chs and Adh. Arabidopsis thaliana diverged from its nearest relatives about 5 MYA, and from Brassica roughly 24 MYA. Independent molecular and fossil data from several sources all provide similar estimates of evolutionary timescale in the Brassicaceae.

886 citations


Journal ArticleDOI
TL;DR: The intraspecific patterns of diversity and genetic differentiation observed in P. falciparum are strikingly similar to those seen in interspecific comparisons of plants and animals with differing levels of outcrossing, suggesting that similar processes may be involved.
Abstract: Multilocus genotyping of microbial pathogens has revealed a range of population structures, with some bacteria showing extensive recombination and others showing almost complete clonality. The population structure of the protozoan parasite Plasmodium falciparum has been harder to evaluate, since most studies have used a limited number of antigen-encoding loci that are known to be under strong selection. We describe length variation at 12 microsatellite loci in 465 infections collected from 9 locations worldwide. These data reveal dramatic differences in parasite population structure in different locations. Strong linkage disequilibrium (LD) was observed in six of nine populations. Significant LD occurred in all locations with prevalence <1% and in only two of five of the populations from regions with higher transmission intensities. Where present, LD results largely from the presence of identical multilocus genotypes within populations, suggesting high levels of self-fertilization in populations with low levels of transmission. We also observed dramatic variation in diversity and geographical differentiation in different regions. Mean heterozygosities in South American countries (0.3-0.4) were less than half those observed in African locations (0. 76-0.8), with intermediate heterozygosities in the Southeast Asia/Pacific samples (0.51-0.65). Furthermore, variation was distributed among locations in South America (F:(ST) = 0.364) and within locations in Africa (F:(ST) = 0.007). The intraspecific patterns of diversity and genetic differentiation observed in P. falciparum are strikingly similar to those seen in interspecific comparisons of plants and animals with differing levels of outcrossing, suggesting that similar processes may be involved. The differences observed may also reflect the recent colonization of non-African populations from an African source, and the relative influences of epidemiology and population history are difficult to disentangle. These data reveal a range of population structures within a single pathogen species and suggest intimate links between patterns of epidemiology and genetic structure in this organism.

737 citations


Journal ArticleDOI
TL;DR: It is shown that substitution rates at nonsynonymous sites are strongly negatively correlated with tissue distribution breadth: almost threefold lower in ubiquitous than in tissue-specific genes, and silent substitution rates do not vary with expression pattern, even in ubiquitously expressed genes.
Abstract: To determine whether gene expression patterns affect mutation rates and/or selection intensity in mammalian genes, we studied the relationships between substitution rates and tissue distribution of gene expression. For this purpose, we analyzed 2,400 human/rodent and 834 mouse/rat orthologous genes, and we measured (using expressed sequence tag data) their expression patterns in 19 tissues from three development states. We show that substitution rates at nonsynonymous sites are strongly negatively correlated with tissue distribution breadth: almost threefold lower in ubiquitous than in tissue-specific genes. Nonsynonymous substitution rates also vary considerably according to the tissues: the average rate is twofold lower in brain-, muscle-, retina- and neuron-specific genes than in lymphocyte-, lung-, and liver-specific genes. Interestingly, 5' and 3' untranslated regions (UTRs) show exactly the same trend. These results demonstrate that the expression pattern is an essential factor in determining the selective pressure on functional sites in both coding and noncoding regions. Conversely, silent substitution rates do not vary with expression pattern, even in ubiquitously expressed genes. This latter result thus suggests that synonymous codon usage is not constrained by selection in mammals. Furthermore, this result also indicates that there is no reduction of mutation rates in genes expressed in the germ line, contrary to what had been hypothesized based on the fact that transcribed DNA is more efficiently repaired than nontranscribed DNA.

518 citations


Journal ArticleDOI
TL;DR: The results provided support for the hypothesis that the Israel-Jordan area is the region in which barley was brought into culture and the Himalayas can be considered a region of domesticated barley diversification.
Abstract: Remains of barley (Hordeum vulgare) grains found at archaeological sites in the Fertile Crescent indicate that about 10,000 years ago the crop was domesticated there from its wild relative Hordeum spontaneum. The domestication history of barley is revisited based on the assumptions that DNA markers effectively measure genetic distances and that wild populations are genetically different and they have not undergone significant change since domestication. The monophyletic nature of barley domestication is demonstrated based on allelic frequencies at 400 AFLP polymorphic loci studied in 317 wild and 57 cultivated lines. The wild populations from Israel-Jordan are molecularly more similar than are any others to the cultivated gene pool. The results provided support for the hypothesis that the Israel-Jordan area is the region in which barley was brought into culture. Moreover, the diagnostic allele I of the homeobox gene BKn-3, rarely but almost exclusively found in Israel H. spontaneum, is pervasive in western landraces and modern cultivated varieties. In landraces from the Himalayas and India, the BKn-3 allele IIIa prevails, indicating that an allelic substitution has taken place during the migration of barley from the Near East to South Asia. Thus, the Himalayas can be considered a region of domesticated barley diversification.

493 citations


Journal ArticleDOI
TL;DR: Weighbor appears to be relatively immune to the "long branches attract" and "long branch distracts" drawbacks observed with neighbor joining, BIONJ, and parsimony, and is much faster, while building trees that are qualitatively and quantitatively similar.
Abstract: We introduce a distance-based phylogeny reconstruction method called ‘‘weighted neighbor joining,’’ or ‘‘Weighbor’’ for short. As in neighbor joining, two taxa are joined in each iteration; however, the Weighbor criterion for choosing a pair of taxa to join takes into account that errors in distance estimates are exponentially larger for longer distances. The criterion embodies a likelihood function on the distances, which are modeled as correlated Gaussian random variables with different means and variances, computed under a probabilistic model for sequence evolution. The Weighbor criterion consists of two terms, an additivity term and a positivity term, that quantify the implications of joining the pair. The first term evaluates deviations from additivity of the implied external branches, while the second term evaluates confidence that the implied internal branch has a positive branch length. Compared with maximum-likelihood phylogeny reconstruction, Weighbor is much faster, while building trees that are qualitatively and quantitatively similar. Weighbor appears to be relatively immune to the ‘‘long branches attract’’ and ‘‘long branch distracts’’ drawbacks observed with neighbor joining, BIONJ, and parsimony.

466 citations


Journal ArticleDOI
TL;DR: It is concluded that the calibration derived from the primate fossil record is too recent to be reliable and a number of problems in date estimation when the molecular clock does not hold are pointed out.
Abstract: Protein-coding genes of the mitochondrial genomes from 31 mammalian species were analyzed to estimate the speciation dates within primates and also between rats and mice. Three calibration points were used based on paleontological data: one at 20-25 MYA for the hominoid/cercopithecoid divergence, one at 53-57 MYA for the cetacean/artiodactyl divergence, and the third at 110-130 MYA for the metatherian/eutherian divergence. Both the nucleotide and the amino acid sequences were analyzed, producing conflicting results. The global molecular clock was clearly violated for both the nucleotide and the amino acid data. Models of local clocks were implemented using maximum likelihood, allowing different evolutionary rates for some lineages while assuming rate constancy in others. Surprisingly, the highly divergent third codon positions appeared to contain phylogenetic information and produced more sensible estimates of primate divergence dates than did the amino acid sequences. Estimated dates varied considerably depending on the data type, the calibration point, and the substitution model but differed little among the four tree topologies used. We conclude that the calibration derived from the primate fossil record is too recent to be reliable; we also point out a number of problems in date estimation when the molecular clock does not hold. Despite these obstacles, we derived estimates of primate divergence dates that were well supported by the data and were generally consistent with the paleontological record. Estimation of the mouse-rat divergence date, however, was problematic.

460 citations


Journal ArticleDOI
TL;DR: Remarkably, more than 21% of all of cysteines were found within the C-(X)(2)-C motifs in ARCHEA, which may indicate that cysteine appeared in ancient metal-binding proteins first and was introduced into other proteins later.
Abstract: The occurrence and relative positions of cysteine residues were investigated in proteins of various species. Considering random mathematical occurrence for an amino acid coded by two codons (3. 28%), cysteine is underrepresented in all organisms investigated. Representation of cysteine appears to correlate positively with the complexity of the organism, ranging between 2.26% in mammals and 0. 5% in some members of the Archeabacteria order. This observation, together with the results obtained from comparison of cysteine content of various ribosomal proteins, indicates that evolution takes advantage of increased use of cysteine residues. In all organisms studied except plants, two cysteines are frequently found two amino acid residues apart (C-(X)(2)-C motif). Such a motif is known to be present in a variety of metal-binding proteins and oxidoreductases. Remarkably, more than 21% of all of cysteines were found within the C-(X)(2)-C motifs in ARCHEA.: This observation may indicate that cysteine appeared in ancient metal-binding proteins first and was introduced into other proteins later.

335 citations


Journal ArticleDOI
TL;DR: A dynamic programming algorithm for maximum-likelihood reconstruction of the set of all ancestral amino acid sequences in a phylogenetic tree that scales linearly with the number of sequences and is applicable to practically any number of taxa.
Abstract: A dynamic programming algorithm is developed for maximum-likelihood reconstruction of the set of all ancestral amino acid sequences in a phylogenetic tree To date, exhaustive algorithms that find the most likely set of ancestral states (joint reconstruction) have running times that scale exponentially with the number of sequences and are thus limited to very few taxa The time requirement of our new algorithm scales linearly with the number of sequences and is therefore applicable to practically any number of taxa A detailed description of the new algorithm and an example of its application to cytochrome b sequences are provided

330 citations


Journal ArticleDOI
TL;DR: Overall, beta-tubulin phylogeny suggests that microsporidia evolved from a fungus sometime after the divergence of chytrids, and it is found thatChytrid alpha- and beta- Tubulins are much less divergent than are tubulins from other fungi or microsporaidia.
Abstract: Microsporidia are obligate intracellular parasites that were thought to be an ancient eukaryotic lineage based on molecular phylogenies using ribosomal RNA and translation elongation factors. However, this ancient origin of microsporidia has been contested recently, as several other molecular phylogenies suggest that microsporidia are closely related to fungi. Most of the protein trees that place microsporidia with fungi are not well sampled, however, and it is impossible to resolve whether microsporidia evolved from a fungus or from a protistan relative of fungi. We have sequenced beta-tubulins from 3 microsporidia, 4 chytrid fungi, and 12 zygomycete fungi, expanding the representation of beta-tubulin to include all four fungal divisions and a wide diversity of microsporidia. In phylogenetic trees including these new sequences, the overall topology of the fungal beta-tubulins generally matched the expected relationships among the four fungal divisions, although the zygomycetes were polyphyletic in some analyses. The microsporidia consistently fell within this fungal diversification, and not as a sister group to fungi. Overall, beta-tubulin phylogeny suggests that microsporidia evolved from a fungus sometime after the divergence of chytrids. We also found that chytrid alpha- and beta-tubulins are much less divergent than are tubulins from other fungi or microsporidia. In trees in which the only fungal representatives were the chytrids, microsporidia still branched with fungi (i.e., with chytrids), suggesting that the affiliation between microsporidian and fungal tubulins is not an artifact of long-branch attraction.

Journal ArticleDOI
TL;DR: The structural and functional similarity between TCST kinases and eukaryotic protein kinases raises the possibility of a distant evolutionary relationship.
Abstract: Two-component signal transduction (TCST) systems are the principal means for coordinating responses to environmental changes in bacteria as well as some plants, fungi, protozoa, and archaea These systems typically consist of a receptor histidine kinase, which reacts to an extracellular signal by phosphorylating a cytoplasmic response regulator, causing a change in cellular behavior Although several model systems, including sporulation and chemotaxis, have been extensively studied, the evolutionary relationships between specific TCST systems are not well understood, and the ancestry of the signal transduction components is unclear Phylogenetic trees of TCST components from 14 complete and 6 partial genomes, containing 183 histidine kinases and 220 response regulators, were constructed using distance methods The trees showed extensive congruence in the positions of 11 recognizable phylogenetic clusters Eukaryotic sequences were found almost exclusively in one cluster, which also showed the greatest extent of domain variability in its component proteins, and archaeal sequences mainly formed species-specific clusters Three clusters in different parts of the kinase tree contained proteins with serine-phosphorylating activity All kinases were found to be monophyletic with respect to other members of their superfamily, such as type II topoisomerases and Hsp90 Structural analysis further revealed significant similarity to the ATP-binding domain of eukaryotic protein kinases TCST systems are of bacterial origin and radiated into archaea and eukaryotes by lateral gene transfer Their components show extensive coevolution, suggesting that recombination has not been a major factor in their differentiation Although histidine kinase activity is prevalent, serine kinases have evolved multiple times independently within this family, accompanied by a loss of the cognate response regulator(s) The structural and functional similarity between TCST kinases and eukaryotic protein kinases raises the possibility of a distant evolutionary relationship

Journal ArticleDOI
TL;DR: The successive emergence and amplification of distinct Ta L1 subfamilies shows that L1 evolution has been as active in recent human history as it has been found to be for rodent L1 families.
Abstract: L1 (LINE-1) elements constitute a large family of mammalian retrotransposons that have been replicating and evolving in mammals for more than 100 Myr and now compose 20% or more of the DNA of some mammals. Here, we investigated the evolutionary dynamics of the active human Ta L1 family and found that it arose approximately 4 MYA and subsequently differentiated into two major subfamilies, Ta-0 and Ta-1, each of which contain additional subsets. Ta-1, which has not heretofore been described, is younger than Ta-0 and now accounts for at least 50% of the Ta family. Although Ta-0 contains some active elements, the Ta-1 subfamily has replaced it as the replicatively dominant subfamily in humans; 69% of the loci that contain Ta-1 inserts are polymorphic for the presence or absence of the insert in human populations, as compared with 29% of the loci that contain Ta-0 inserts. This value is 90% for loci that contain Ta-1d inserts, which are the youngest subset of Ta-1 and now account for about two thirds of the Ta-1 subfamily. The successive emergence and amplification of distinct Ta L1 subfamilies shows that L1 evolution has been as active in recent human history as it has been found to be for rodent L1 families. In addition, Ta-1 elements have been accumulating in humans at about the same rate per generation as recently evolved active rodent L1 subfamilies.

Journal ArticleDOI
TL;DR: Comparisons of relative gene arrangements and of the nucleotide and inferred amino acid sequences among these and other published taxa provide strong support for an annelid-mollusk clade that excludes arthropods, and for the inclusion of pogonophorans within Annelida, rather than giving them separate phylum status.
Abstract: We report a contiguous region of more than half (> 7,500 nt) of the mitochondrial genomes for Platynereis dumerii (Annelida: Polychaeta), Helobdella robusta (Annelida: Hirudinida), and Galathealinum brachiosum (Pogonophora: Perviata). The relative arrangements of all 22 genes identified for Helobdella and Galathealinum are identical to one another and to their arrangements in the mtDNA of the previously studied oligochaete annelid Lumbricus. In contrast, Platynereis differs from these taxa in the positions of several tRNA genes and in having two additional tRNA genes (trnC and trnM) and a large noncoding sequence in this region. Comparisons of relative gene arrangements and of the nucleotide and inferred amino acid sequences among these and other published taxa provide strong support for an annelid-mollusk clade that excludes arthropods, and for the inclusion of pogonophorans within Annelida, rather than giving them separate phylum status. Gene arrangement comparisons include the first use of a recently described method on previously unpublished data. Although a variety of alternative initiation codons are typically used by mitochondrial protein-encoding genes, ATG appears to be the initiator for all but one reported here. The large noncoding region (1,091 nt) identified in Platynereis has no significant sequence similarity to the noncoding region of Lumbricus, although each contains runs of TA dinucleotides and of homopolymers, which could potentially serve as signaling elements. There is strong bias for synonymous codon usage in Helobdella and especially in Galathealinum. In this latter taxon, 5 codons are completely unused, 13 are used three or fewer times, and G appears at third codon positions in only 26 of the 2,236 codons. Nucleotide composition bias appears to influence amino acid composition of the proteins.

Journal ArticleDOI
TL;DR: The connection between the different versions of MP and ML methods is explored, and links between the two methods are described, including how MP can be regarded as an ML method when there is no common mechanism between sites.
Abstract: Methods such as maximum parsimony (MP) are frequently criticized as being statistically unsound and not being based on any ‘‘model.’’ On the other hand, advocates of MP claim that maximum likelihood (ML) has some fundamental problems. Here, we explore the connection between the different versions of MP and ML methods, particularly in light of recent theoretical results. We describe links between the two methods—for example, we describe how MP can be regarded as an ML method when there is no common mechanism between sites (such as might occur with morphological data and certain forms of molecular data). In the process, we clarify certain historical points of disagreement between proponents of the two methodologies, including a discussion of several forms of the ML optimality criterion. We also describe some additional results that shed light on how much needs to be assumed about underling models of sequence evolution in order to successfully reconstruct evolutionary trees.

Journal ArticleDOI
TL;DR: It is shown that when ME methods are used, the simple p distance generally gives better results in phylogenetic inference than more complicated distance measures such as the Hasegawa-Kishino-Yano (HKY) distance, even when nucleotide substitution follows the HKY model.
Abstract: In phylogenetic inference by maximum-parsimony (MP), minimum-evolution (ME), and maximum-likelihood (ML) methods, it is customary to conduct extensive heuristic searches of MP, ME, and ML trees, examining a large number of different topologies. However, these extensive searches tend to give incorrect tree topologies. Here we show by extensive computer simulation that when the number of nucleotide sequences (m) is large and the number of nucleotides used (n) is relatively small, the simple MP or ML tree search algorithms such as the stepwise addition (SA) plus nearest neighbor interchange (NNI) search and the SA plus subtree pruning regrafting (SPR) search are as efficient as the extensive search algorithms such as the SA plus tree bisection-reconnection (TBR) search in inferring the true tree. In the case of ME methods, the simple neighbor-joining (NJ) algorithm is as efficient as or more efficient than the extensive NJ+TBR search. We show that when ME methods are used, the simple p distance generally gives better results in phylogenetic inference than more complicated distance measures such as the Hasegawa-Kishino-Yano (HKY) distance, even when nucleotide substitution follows the HKY model. When ML methods are used, the simple Jukes-Cantor (JC) model of phylogenetic inference generally shows a better performance than the HKY model even if the likelihood value for the HKY model is much higher than that for the JC model. This indicates that at least in the present case, selecting of a substitution model by using the likelihood ratio test or the AIC index is not appropriate. When n is small relative to m and the extent of sequence divergence is high, the NJ method with p distance often shows a better performance than ML methods with the JC model. However, when the level of sequence divergence is low, this is not the case.

Journal ArticleDOI
TL;DR: Results show that a significant amount of the observed covariation among amino acid sites is due to structural/functional constraints, over and above the covariation arising from phylogenetic constraints.
Abstract: An information theoretic approach is used to examine the magnitude and origin of associations among amino acid sites in the basic helix-loop-helix (bHLH) family of transcription factors. Entropy and mutual information values are used to summarize the variability and covariability of amino acids comprising the bHLH domain for 242 sequences. When these quantitative measures are integrated with crystal structure data and summarized using helical wheels, they provide important insights into the evolution of three-dimensional structure in these proteins. We show that amino acid sites in the bHLH domain known to pack against each other have very low entropy values, indicating little residue diversity at these contact sites. Noncontact sites, on the other hand, exhibit significantly larger entropy values, as well as statistically significant levels of mutual information or association among sites. High levels of mutual information indicate significant amounts of intercorrelation among amino acid residues at these various sites. Using computer simulations based on a parametric bootstrap procedure, we are able to partition the observed covariation among various amino acid sites into that arising from phylogenetic (common ancestry) and stochastic causes and those resulting from structural and functional constraints. These results show that a significant amount of the observed covariation among amino acid sites is due to structural/functional constraints, over and above the covariation arising from phylogenetic constraints. These quantitative analyses provide a highly integrated evolutionary picture of the multidimensional dynamics of sequence diversity and protein structure.

Journal ArticleDOI
TL;DR: The amount of variation in seven crayfish species, including three populations of Orconectes luteus and two of Procambarus clarkii, was assessed by sequencing 3, 5, or 10 clones from the same individuals, for a total of 77 sequences.
Abstract: Intragenomic variation in ITS1 and ITS2 is known to exit but is widely ignored in phylogenetic studies using these gene regions. The amount of variation in seven crayfish species, including three populations of Orconectes luteus and two of Procambarus clarkii, was assessed by sequencing 3, 5, or 10 clones from the same individuals, for a total of 77 sequences. The ITS1 and ITS2 sequences reported here are some of the longest known, with aligned lengths of 760 and 1,300 bp, respectively. They contain multiple microsatellite insertions, all of which show considerable intragenomic variation in the number of repeat elements. This variation is enough to obscure phylogenetic relationships at the population level, although relationships between species can be estimated. Given the hybridization techniques used to locate microsatellites, multiple-copy regions like ITS1 and ITS2 will be preferentially found if they contain microsatellites, and in these cases the microsatellites will not behave as typical Mendelian markers and could give spurious results.

Journal ArticleDOI
TL;DR: Intergeneric relationships of Pinaceae using sequences of the chloroplast matK gene, the mitochondrial nad5 gene, and the low-copy nuclear gene 4CL are studied to suggest morphology of both vegetative and reproductive organs has undergone convergent evolution within the pine family.
Abstract: In Pinaceae, the chloroplast, mitochondrial, and nuclear genomes are paternally, maternally, and biparentally inherited, respectively. Examining congruence and incongruence of gene phylogenies among the three genomes should provide insights into phylogenetic relationships within the family. Here we studied intergeneric relationships of Pinaceae using sequences of the chloroplast matK gene, the mitochondrial nad5 gene, and the low-copy nuclear gene 4CL. The 4CL gene may exist as a single copy in some species of Pinaceae, but it constitutes a small gene family with two or three members in others. Duplication and deletion of the 4CL gene occurred at a tempo such that paralogous loci are maintained within but not between genera. Exons of the 4CL gene have diverged approximately twice as fast as the matK gene and five times more rapidly than the nad5 gene. The partition-homogeneity test indicates that the three data sets are homogeneous. A combined analysis of the three gene sequences generated a well-resolved and strongly supported phylogeny. The combined phylogeny, which is topologically congruent with the three individual gene trees based on the Templeton test, is likely to represent the organismal phylogeny of Pinaceae. This phylogeny agrees to a certain extent with previous phylogenetic hypotheses based on morphological, anatomical, and immunological data. Disagreement between the previous hypotheses and the three-genome phylogeny suggests that morphology of both vegetative and reproductive organs has undergone convergent evolution within the pine family. The strongly supported monophyly of Nothotsuga longibracteata, Tsuga mertensiana, and Tsuga canadensis on all three gene phylogenies provides evidence against previous hypotheses of intergeneric hybrid origins of N. longibracteata and T. mertensiana. Divergence times of the genera were estimated based on sequence divergence of the matK gene, and they correspond well with the fossil record.

Journal ArticleDOI
TL;DR: It is shown that if theoretically possible code structures are limited to reflect plausible biological constraints, and amino acid similarity is quantified using empirical data of substitution frequencies, the canonical code is at or very close to a global optimum for error minimization across plausible parameter space.
Abstract: The evolutionary forces that produced the canonical genetic code before the last universal ancestor remain obscure. One hypothesis is that the arrangement of amino acid/codon assignments results from selection to minimize the effects of errors (e.g., mistranslation and mutation) on resulting proteins. If amino acid similarity is measured as polarity, the canonical code does indeed outperform most theoretical alternatives. However, this finding does not hold for other amino acid properties, ignores plausible restrictions on possible code structure, and does not address the naturally occurring nonstandard genetic codes. Finally, other analyses have shown that significantly better code structures are possible. Here, we show that if theoretically possible code structures are limited to reflect plausible biological constraints, and amino acid similarity is quantified using empirical data of substitution frequencies, the canonical code is at or very close to a global optimum for error minimization across plausible parameter space. This result is robust to variation in the methods and assumptions of the analysis. Although significantly better codes do exist under some assumptions, they are extremely rare and thus consistent with reports of an adaptive code: previous analyses which suggest otherwise derive from a misleading metric. However, all extant, naturally occurring, secondarily derived, nonstandard genetic codes do appear less adaptive. The arrangement of amino acid assignments to the codons of the standard genetic code appears to be a direct product of natural selection for a system that minimizes the phenotypic impact of genetic error. Potential criticisms of previous analyses appear to be without substance. That known variants of the standard genetic code appear less adaptive suggests that different evolutionary factors predominated before and after fixation of the canonical code. While the evidence for an adaptive code is clear, the process by which the code achieved this optimization requires further attention.

Journal ArticleDOI
TL;DR: The nucleotide contents of several completely sequenced genomes are analyzed, and it is shown that nucleotide bias can have a dramatic effect on the amino acid composition of the encoded proteins.
Abstract: We analyzed the nucleotide contents of several completely sequenced genomes, and we show that nucleotide bias can have a dramatic effect on the amino acid composition of the encoded proteins. By surveying the genes in 21 completely sequenced eubacterial and archaeal genomes, along with the entire Saccharomyces cerevisiae genome and two Plasmodium falciparum chromosomes, we show that biased DNA encodes biased proteins on a genomewide scale. The predicted bias affects virtually all genes within the genome, and it could be clearly seen even when we limited the analysis to sets of homologous gene sequences. Parallel patterns of compositional bias were found within the archaea and the eubacteria. We also found a positive correlation between the degree of amino acid bias and the magnitude of protein sequence divergence. We conclude that mutational bias can have a major effect on the molecular evolution of proteins. These results could have important implications for the interpretation of protein-based molecular phylogenies and for the inference of functional protein adaptation from comparative sequence data.

Journal ArticleDOI
TL;DR: Maximum-likelihood models of codon substitution were used to analyze sperm lysin genes of 25 abalone species to identify lineages and amino acid sites under diversifying selection, and selective pressure indicated by the omega ratio was found to vary greatly among amino Acid sites in lysIn.
Abstract: Maximum-likelihood models of codon substitution were used to analyze sperm lysin genes of 25 abalone (HALIOTIS:) species to identify lineages and amino acid sites under diversifying selection. The models used the nonsynonymous/synonymous rate ratio (omega = d(N)/d(S)) as an indicator of selective pressure and allowed the ratio to vary among lineages or sites. Likelihood ratio tests suggested significant variation in selective pressure among lineages. The variable selective pressure provided an explanation for the previous observation that the omega ratio is >1 in comparisons of closely related species and <1 in comparisons of distantly related species. Computer simulations demonstrated that saturation of nonsynonymous substitutions and constraint on lysin structure were unlikely to account for the observed pattern. Lineages linking closely related sympatric species appeared to be under diversifying selection, while lineages separating distantly related species from different geographic locations were associated with low evolutionary rates. The selective pressure indicated by the omega ratio was found to vary greatly among amino acid sites in lysin. Sites under potential diversifying selection were identified. Ancestral lysins were inferred to trace the route of evolution at individual sites and to provide lysin sequences for future laboratory studies.

Journal ArticleDOI
TL;DR: A widely held view of land plant relationships places liverworts as the first branch of the land plant tree, whereas some molecular analyses and a cladistic study of morphological characters indicate that hornworts are the earliest land plants are in conflict.
Abstract: A widely held view of land plant relationships places liverworts as the first branch of the land plant tree, whereas some molecular analyses and a cladistic study of morphological characters indicate that hornworts are the earliest land plants. To help resolve this conflict, we used parsimony and likelihood methods to analyze a 6, 095-character data set composed of four genes (chloroplast rbcL and small-subunit rDNA from all three plant genomes) from all major land plant lineages. In all analyses, significant support was obtained for the monophyly of vascular plants, lycophytes, ferns (including PSILOTUM: and EQUISETUM:), seed plants, and angiosperms. Relationships among the three bryophyte lineages were unresolved in parsimony analyses in which all positions were included and weighted equally. However, in parsimony and likelihood analyses in which rbcL third-codon-position transitions were either excluded or downweighted (due to apparent saturation), hornworts were placed as sister to all other land plants, with mosses and liverworts jointly forming the second deepest lineage. Decay analyses and Kishino-Hasegawa tests of the third-position-excluded data set showed significant support for the hornwort-basal topology over several alternative topologies, including the commonly cited liverwort-basal topology. Among the four genes used, mitochondrial small-subunit rDNA showed the lowest homoplasy and alone recovered essentially the same topology as the multigene tree. This molecular phylogeny presents new opportunities to assess paleontological evidence and morphological innovations that occurred during the early evolution of terrestrial plants.

Journal ArticleDOI
TL;DR: Analyses of SSU rDNA from the plastid and the nuclear genome of these dinoflagellate species indicate that they have acquired their plastids via endosymbiosis of a haptophyte, and distance, parsimony, and maximum-likelihood phylogenetic analyses of plastido rRNA gene sequences place the three species within the haptophical clade.
Abstract: The three anomalously pigmented dinoflagellates Gymnodinium galatheanum, Gyrodinium aureolum, and Gymnodinium breve have plastids possessing 19'-hexanoyloxy-fucoxanthin as the major carotenoid rather than peridinin, which is characteristic of the majority of the dinoflagellates. Analyses of SSU rDNA from the plastid and the nuclear genome of these dinoflagellate species indicate that they have acquired their plastids via endosymbiosis of a haptophyte. The dinoflagellate plastid sequences appear to have undergone rapid sequence evolution, and there is considerable divergence between the three species. However, distance, parsimony, and maximum-likelihood phylogenetic analyses of plastid SSU rRNA gene sequences place the three species within the haptophyte clade. Pavlova gyrans is the most basal branching haptophyte and is the outgroup to a clade comprising the dinoflagellate sequences and those of other haptophytes. The haptophytes themselves are thought to have plastids of a secondary origin; hence, these dinoflagellates appear to have tertiary plastids. Both molecular and morphological data divide the plastids into two groups, where G. aureolum and G. breve have similar plastid morphology and G. galatheanum has plastids with distinctive features.

Journal ArticleDOI
TL;DR: Phylogenetic results suggest that the earliest neornithines were heavy-bodied, ground-dwelling, nonmarine birds, and this inference provides a possible explanation for the large gap in the early fossil record of birds.
Abstract: The traditional view of avian evolution places ratites and tinamous at the base of the phylogenetic tree of modern birds (Neornithes). In contrast, most recent molecular studies suggest that neognathous perching birds (Passeriformes) compose the oldest lineage of modern birds. Here, we report significant molecular support for the traditional view of neognath monophyly based on sequence analyses of nuclear and mitochondrial DNA (4.4 kb) from every modern avian order. Phylogenetic analyses further show that the ducks and gallinaceous birds are each other's closest relatives and together form the basal lineage of neognathous birds. To investigate why other molecular studies sampling fewer orders have reached different conclusions regarding neognath monophyly, we performed jackknife analyses on our mitochondrial data. Those analyses indicated taxon-sampling effects when basal galloanserine birds were included in combination with sparse taxon sampling. Our phylogenetic results suggest that the earliest neornithines were heavy-bodied, ground-dwelling, nonmarine birds. This inference, coupled with a fossil bias toward marine environments, provides a possible explanation for the large gap in the early fossil record of birds.

Journal ArticleDOI
TL;DR: Analysis of the effect of nucleotide bias on codon composition across the Arthropoda reveals a trend with the crustaceans represented showing the lowest proportion of AT-rich codons in mitochondrial protein genes, supporting the possibility that Crustacea are paraphyletic.
Abstract: The complete sequence of the mitochondrial genome of the giant tiger prawn, Penaeus monodon (Arthropoda, Crustacea, Malacostraca), is presented. The gene content and gene order are identical to those observed in Drosophila yakuba. The overall AT composition is lower than that observed in the known insect mitochondrial genomes, but higher than that observed in the other two crustaceans for which complete mitochondrial sequence is available. Analysis of the effect of nucleotide bias on codon composition across the Arthropoda reveals a trend with the crustaceans represented showing the lowest proportion of AT-rich codons in mitochondrial protein genes. Phylogenetic analysis among arthropods using concatenated protein-coding sequences provides further support for the possibility that Crustacea are paraphyletic. Furthermore, in contrast to data from the nuclear gene EF1alpha, the first complete sequence of a malacostracan mitochondrial genome supports the possibility that Malacostraca are more closely related to Insecta than to Branchiopoda.

Journal ArticleDOI
TL;DR: Meta-analysis of both GS genes from the different genera of rhizobia and other reference organisms suggests that the divergence times of the different rhizobium genera predate the existence of legumes, their host plants.
Abstract: Glutamine synthetase exists in at least two related forms, GSI and GSII, the sequences of which have been used in evolutionary molecular clock studies. GSI has so far been found exclusively in bacteria, and GSII has been found predominantly in eukaryotes. To date, only a minority of bacteria, including rhizobia, have been shown to express both forms of GS. The sequences of equivalent internal fragments of the GSI and GSII genes for the type strains of 16 species of rhizobia have been determined and analyzed. The GSI and GSII data sets do not produce congruent phylogenies with either neighbor-joining or maximum-likelihood analyses. The GSI phylogeny is broadly congruent with the 16S rDNA phylogeny for the same bacteria; the GSII phylogeny is not. There are three striking rearrangements in the GSII phylograms, all of which might be explained by horizontal gene transfer to Bradyrhizobium (probably from Mesorhizobium), to Rhizobium galegae (from Rhizobium), and to Mesorhizobium huakuii (perhaps from Rhizobium). There is also evidence suggesting intrageneric DNA transfer within Mesorhizobium. Meta-analysis of both GS genes from the different genera of rhizobia and other reference organisms suggests that the divergence times of the different rhizobium genera predate the existence of legumes, their host plants.

Journal ArticleDOI
TL;DR: Investigations on the mitochondrial genomes of two groups of human blood flukes within the genus Schistosoma revealed striking divergences in mitochondrial gene order, startled by the remarkable differences which came to light between the two groups.
Abstract: [Extract] Mitochondrial genomes have been used in numerous studies to investigate phylogenetic relationships among eukaryotes at many levels (e.g., Smith et al. 1993; Boore et al. 1995; Boore, Lavrov, and Brown 1998). In recent years, the arrangement of genes in the mitochondrial genome has been regarded as a powerful record of historical relationships (Boore 1999). Changes in mitochondrial gene order are infrequent, even over considerable spans of time (Boore 1999), and are unlikely to exhibit homoplasy. Our research has focused on the relationships between two groups of human blood flukes within the genus Schistosoma. Our investigations on the mitochondrial genomes of these worms revealed striking divergences in mitochondrial gene order within the genus. The schistosomes are among the most significant parasites of humans in the developing world. The disease they cause, schistosomiasis, is second only to malaria in public health importance, affecting some 200 million people in 75 countries and giving rise to severe morbidity or mortality in tens of millions. Recent molecular studies (Barker and Blair 1996) have demonstrated that the deepest split in the genus is between East the Asian species utilizing prosobranch snail hosts and the African species utilizing pulmonate snails. The depth of this split has led some authors to propose an early Tertiary divergence (Despres et al. 1992). Species closely allied with the African group also occur in the Middle East, India, and parts of Southern Asia. One African species, Schistosoma mansoni, was probably introduced into the Americas by the slave trade during the 18th and 19th centuries (Despres, Imbert-Establet, and Monnerot 1993). The Asian group contains fewer recognized species, and these are found primarily in East Asia (the Philippines, China, Malaysia, Indonesia, Cambodia, and Laos). There is a growing realization that African and East Asian schistosomes differ in many biological attributes, including morphological characters, infectivity to snails, range of definitive hosts, growth rates, egg production, prepatency periods, pathogenicity, and immunogenicity (McManus and Hope 1993). We expected our investigations of mitochondrial genomes in these two groups of species to provide more evidence of their phylogenetic distance. However, we were startled by the remarkable differences in mitochondrial gene order which came to light between the two groups and which we report here.

Journal ArticleDOI
TL;DR: The complete 14,985-nt sequence of the mitochondrial DNA of the horseshoe crab Limulus polyphemus is determined and it is suggested that the changes observed are not independent and that the stem-loop structure found in the noncoding regions of Limulus and Ixodes mtDNA may play the same role as that between trnN and trnC in vertebrates, i.e., the role of lagging strand origin of replication.
Abstract: We determined the complete 14,985-nt sequence of the mitochondrial DNA of the horseshoe crab Limulus polyphemus (Arthropoda: Xiphosura). This mtDNA encodes the 13 protein, 2 rRNA, and 22 tRNA genes typical for metazoans. The arrangement of these genes and about half of the sequence was reported previously; however, the sequence contained a large number of errors, which are corrected here. The two strands of Limulus mtDNA have significantly different nucleotide compositions. The strand encoding most mitochondrial proteins has 1. 25 times as many A's as T's and 2.33 times as many C's as G's. This nucleotide bias correlates with the biases in amino acid content and synonymous codon usage in proteins encoded by different strands and with the number of non-Watson-Crick base pairs in the stem regions of encoded tRNAs. The sizes of most mitochondrial protein genes in Limulus are either identical to or slightly smaller than those of their Drosophila counterparts. The usage of the initiation and termination codons in these genes seems to follow patterns that are conserved among most arthropod and some other metazoan mitochondrial genomes. The noncoding region of Limulus mtDNA contains a potential stem-loop structure, and we found a similar structure in the noncoding region of the published mtDNA of the prostriate tick Ixodes hexagonus. A simulation study was designed to evaluate the significance of these secondary structures; it revealed that they are statistically significant. No significant, comparable structure can be identified for the metastriate ticks Rhipicephalus sanguineus and Boophilus microplus. The latter two animals also share a mitochondrial gene rearrangement and an unusual structure of mt-tRNA(C) that is exactly the same association of changes as previously reported for a group of lizards. This suggests that the changes observed are not independent and that the stem-loop structure found in the noncoding regions of Limulus and Ixodes mtDNA may play the same role as that between trnN and trnC in vertebrates, i.e., the role of lagging strand origin of replication.

Journal ArticleDOI
TL;DR: Analysis of human DNA sequence data confirms that this is the case for 5-methylcytosine and that cytosine deamination causes the majority of all C-->T and G-->A transitions in mammals, and that DNA base composition each affect the other, forming a positive feedback loop that facilitates divergent genetic drift.
Abstract: DNA melting is rate-limiting for cytosine deamination, from which we infer that the rate of cytosine deamination should decline twofold for each 10% increase in GC content. Analysis of human DNA sequence data confirms that this is the case for 5-methylcytosine. Several lines of evidence further confirm that it is also the case for unmethylated cytosine and that cytosine deamination causes the majority of all C-->T and G-->A transitions in mammals. Thus, cytosine deamination and DNA base composition each affect the other, forming a positive feedback loop that facilitates divergent genetic drift to high or low GC content. Because a 10 degrees C increase in temperature in vitro increases the rate of cytosine deamination 5. 7-fold, cytosine deamination must be highly dependent on body temperature, which is consistent with the dramatic differences between the isochores of warm-blooded versus cold-blooded vertebrates. Because this process involves both DNA melting and positive feedback, it would be expected to spread progressively (in evolutionary time) down the length of the chromosome, which is consistent with the large size of isochores in modern mammals.