scispace - formally typeset
Search or ask a question

Showing papers in "Molecular Biology and Evolution in 1998"


Journal ArticleDOI
TL;DR: The likelihood analysis confirmed most, but not all, conclusions Messier and Stewart reached using reconstructed ancestral sequences to estimate synonymous and nonsynonymous rates for different lineages.
Abstract: An excess of nonsynonymous substitutions over synonymous ones is an important indicator of positive selection at the molecular level. A lineage that underwent Darwinian selection may have a nonsynonymous/synonymous rate ratio (dN/dS) that is different from those of other lineages or greater than one. In this paper, several codon-based likelihood models that allow for variable dN/dS ratios among lineages were developed. They were then used to construct likelihood ratio tests to examine whether the dN/dS ratio is variable among evolutionary lineages, whether the ratio for a few lineages of interest is different from the background ratio for other lineages in the phylogeny, and whether the dN/dS ratio for the lineages of interest is greater than one. The tests were applied to the lysozyme genes of 24 primate species. The dN/dS ratios were found to differ significantly among lineages, indicating that the evolution of primate lysozymes is episodic, which is incompatible with the neutral theory. Maximum-likelihood estimates of parameters suggested that about nine nonsynonymous and zero synonymous nucleotide substitutions occurred in the lineage leading to hominoids, and the dN/dS ratio for that lineage is significantly greater than one. The corresponding estimates for the lineage ancestral to colobine monkeys were nine and one, and the dN/dS ratio for the lineage is not significantly greater than one, although it is significantly higher than the background ratio. The likelihood analysis thus confirmed most, but not all, conclusions Messier and Stewart reached using reconstructed ancestral sequences to estimate synonymous and nonsynonymous rates for different lineages.

1,373 citations


Journal ArticleDOI
TL;DR: A simple model for the evolution of the rate of molecular evolution with a Bayesian approach can serve as the basis for estimating dates of important evolutionary events even in the absence of the assumption of constant rates among evolutionary lineages.
Abstract: A simple model for the evolution of the rate of molecular evolution is presented. With a Bayesian approach, this model can serve as the basis for estimating dates of important evolutionary events even in the absence of the assumption of constant rates among evolutionary lineages. The method can be used in conjunction with any of the widely used models for nucleotide substitution or amino acid replacement. It is illustrated by analyzing a data set of rbcL protein sequences.

1,241 citations


Journal ArticleDOI
TL;DR: There is rather poor equivalency of taxonomic rank across some of the vertebrates, by the yardstick of genetic divergence in this mtDNA gene, as well as genetic distances in allozymes.
Abstract: Mitochondrial cytochrome b (cytb) is among the most extensively sequenced genes to date across the vertebrates. Here, we employ nearly 2,000 cytb gene sequences from GenBank to calculate and compare levels of genetic distance between sister species, congeneric species, and confamilial genera within and across the major vertebrate taxonomic classes. The results of these analyses parallel and reinforce some of the principal trends in genetic distance estimates previously reported in a summary of the multilocus allozyme literature. In particular, surveyed avian taxa on average show significantly less genetic divergence than do same-rank taxa surveyed in other vertebrate groups, notably amphibians and reptiles. Various biological possibilities and taxonomic "artifacts" are considered that might account for this pattern. Regardless of the explanation, by the yardstick of genetic divergence in this mtDNA gene, as well as genetic distances in allozymes, there is rather poor equivalency of taxonomic rank across some of the vertebrates.

679 citations


Journal ArticleDOI
R Peakall1, S Gilmore, W Keys, M Morgante, A Rafalski 
TL;DR: These findings and the emerging patterns in other plant studies suggest that in contrast to animals, successful cross-species amplification of SSRs in plants is largely restricted to congeners or closely related genera.
Abstract: We investigated the transferability of 31 soybean (Glycine max) simple sequence repeat (SSR) loci to wild congeners and to other legume genera. Up to 65% of the soybean primer pairs amplified SSRs within Glycine, but frequently, the SSRs were short and interrupted compared with those of soybeans. Nevertheless, 85% of the loci were polymorphic within G. clandestina. Cross-species amplification outside of the genus was much lower (3%-13%), with polymorphism restricted to one primer pair, AG81. AG81 amplified loci in Glycine, Kennedia, and Vigna (Phaseoleae), Vicia (Vicieae), Trifolium (Trifolieae), and Lupinus (Genisteae) within the Papilionoideae, and in Albizia within the Mimosoideae. The primer conservation at AG81 may be explained by its apparent proximity to the seryl-tRNA synthetase gene. Interspecific differences in allele size at AG81 loci reflected repeat length variation within the SSR region and indels in the flanking region. Alleles of identical size with different underlying sequences (size homoplasy) were observed. Our findings and the emerging patterns in other plant studies suggest that in contrast to animals, successful cross-species amplification of SSRs in plants is largely restricted to congeners or closely related genera. Because mutations in both the SSR region and the flanking region contribute to variation in allele size among species, knowledge of DNA sequence is essential before SSR loci can be meaningfully used to address applied and evolutionary questions.

467 citations


Journal ArticleDOI
TL;DR: This paper compares several definitions of FST which are relevant to DNA sequence data, and shows that these must be used with care when estimating migration parameters, and shown that FST is strongly influenced by the level of within-population diversity.
Abstract: Wright's FST and related statistics are often used to measure the extent of divergence among populations of the same species relative to the net genetic diversity within the species. This paper compares several definitions of FST which are relevant to DNA sequence data, and shows that these must be used with care when estimating migration parameters. It is also pointed out that FST is strongly influenced by the level of within-population diversity. In situations where factors such as selection on closely linked sites are expected to have stronger effects on within-population diversity at some loci than at others, differences among loci can result entirely from differences in within-population diversities. It is shown that several published cases of differences in FST among regions of high and low recombination in Drosophila may be caused in this way. For the purpose of comparisons of levels of between-population differences among loci or species which are subject to different intensities of forces that reduce variability within local populations, absolute measures of divergence between populations should be used in preference to relative measures such as FST.

461 citations


Journal ArticleDOI
TL;DR: The mechanistic models were found to fit the data better than empirical models derived from large databases, and the mutational distance between amino acids and the physicochemical distance are found to have strong effects on amino acid substitution rates.
Abstract: Models of amino acid substitution were developed and compared using maximum likelihood. Two kinds of models are considered. "Empirical" models do not explicitly consider factors that shape protein evolution, but attempt to summarize the substitution pattern from large quantities of real data. "Mechanistic" models are formulated at the codon level and separate mutational biases at the nucleotide level from selective constraints at the amino acid level. They account for features of sequence evolution, such as transition-transversion bias and base or codon frequency biases, and make use of physicochemical distances between amino acids to specify nonsynonymous substitution rates. A general approach is presented that transforms a Markov model of codon substitution into a model of amino acid replacement. Protein sequences from the entire mitochondrial genomes of 20 mammalian species were analyzed using different models. The mechanistic models were found to fit the data better than empirical models derived from large databases. Both the mutational distance between amino acids (determined by the genetic code and mutational biases such as the transition-transversion bias) and the physicochemical distance are found to have strong effects on amino acid substitution rates. A significant proportion of amino acid substitutions appeared to have involved more than one codon position, indicating that nucleotide substitutions at neighboring sites may be correlated. Rates of amino acid substitution were found to be highly variable among sites.

386 citations


Journal ArticleDOI
TL;DR: It is inferred that one of the oldest events in the nested cladistic analysis was a range expansion out of Africa which resulted in the complete replacement of Y chromosomes throughout the Old World, a finding consistent with many versions of the Out of Africa Replacement Model.
Abstract: We surveyed nine diallelic polymorphic sites on the Y chromosomes of 1,544 individuals from Africa, Asia, Europe, Oceania, and the New World Phylogenetic analyses of these nine sites resulted in a tree for 10 distinct Y haplotypes with a coalescence time of approximately 150,000 years The 10 haplotypes were unevenly distributed among human populations: 5 were restricted to a particular continent, 2 were shared between Africa and Europe, 1 was present only in the Old World, and 2 were found in all geographic regions surveyed The ancestral haplotype was limited to African populations Random permutation procedures revealed statistically significant patterns of geographical structuring of this paternal genetic variation The results of a nested cladistic analysis indicated that these geographical associations arose through a combination of processes, including restricted, recurrent gene flow (isolation by distance) and range expansions We inferred that one of the oldest events in the nested cladistic analysis was a range expansion out of Africa which resulted in the complete replacement of Y chromosomes throughout the Old World, a finding consistent with many versions of the Out of Africa Replacement Model A second and more recent range expansion brought Asian Y chromosomes back to Africa without replacing the indigenous African male gene pool Thus, the previously observed high levels of Y chromosomal genetic diversity in Africa may be due in part to bidirectional population movements Finally, a comparison of our results with those from nested cladistic analyses of human mtDNA and beta-globin data revealed different patterns of inferences for males and females concerning the relative roles of population history (range expansions) and population structure (recurrent gene flow), thereby adding a new sex-specific component to models of human evolution

370 citations


Journal ArticleDOI
TL;DR: It is hypothesized that one of the crucial processes for the origin of asymmetric and biased base composition of mammalian mitochondrial genomes is the spontaneous deamination of C and A in the H strand during replication.
Abstract: The base composition of 25 complete mammalian mitochondrial (mt) genomes has been analyzed taking into account all three codon positions (P1230 and fourfold degenerate sites (P4FD) of H-strand genes. In the nontranscribed L strand, G is the less represented base and A is the most represented one in all cases, while C and T differ among species. H-strand protein-coding genes show an asymmetric distribution of the four bases between the two strands. The asymmetry indexes AT and GC skews on P4FD are much higher than those on P123, suggesting the existence of asymmetrical directional mutation pressure. Relationships between the compositional features and transcription of replication processes have been investigated in order to find a possible mechanism that could explain the origin of this asymmetry. AT and GC skews, the base composition in fourfold degenerate sites, and the number of variable sites for each gene are significantly correlated with the duration of single-stranded state of the H-stranded genes during replication. We tested different replication-related hypotheses, such as the existence of biased dNTP pools, gamma DNA polymerase mispairing, and the asymmetric replication itself. Most of them failed to explain the observed results, hydrolytic deaminations being the only one in agreement with our data. Thus, we hypothesize that one of the crucial processes for the origin of asymmetric and biased base composition of mammalian mitochondrial genomes is the spontaneous deamination of C and A in the H strand during replication.

365 citations


Journal ArticleDOI
TL;DR: A significant amount of information about past evolutionary modes can be extracted from DNA sequences, suggesting that process (rates of distinct kinds of nucleotide substitutions) and pattern (the evolutionary tree) can be simultaneously inferred.
Abstract: A nonhomogeneous, nonstationary stochastic model of DNA sequence evolution allowing varying equilibrium G + C contents among lineages is devised in order to deal with sequences of unequal base compositions. A maximum-likelihood implementation of this model for phylogenetic analyses allows handling of a reasonable number of sequences. The relevance of the model and the accuracy of parameter estimates are theoretically and empirically assessed, using real or simulated data sets. Overall, a significant amount of information about past evolutionary modes can be extracted from DNA sequences, suggesting that process (rates of distinct kinds of nucleotide substitutions) and pattern (the evolutionary tree) can be simultaneously inferred. G + C contents at ancestral nodes are quite accurately estimated. The new method appears to be useful for phylogenetic reconstruction when base composition varies among compared sequences. It may also be suitable for molecular evolution studies.

334 citations


Journal ArticleDOI
TL;DR: A codon-level model of coding sequence evolution in which position-specific amino acid frequencies are free parameters is introduced, which produces linear distance estimates over a wide range of distances, while several alternative models underestimate long distances relative to short distances.
Abstract: Estimation of evolutionary distances from coding sequences must take into account protein-level selection to avoid relative underestimation of longer evolutionary distances. Current modeling of selection via site-to-site rate heterogeneity generally neglects another aspect of selection, namely position-specific amino acid frequencies. These frequencies determine the maximum dissimilarity expected for highly diverged but functionally and structurally conserved sequences, and hence are crucial for estimating long distances. We introduce a codon-level model of coding sequence evolution in which position-specific amino acid frequencies are free parameters. In our implementation, these are estimated from an alignment using methods described previously. We use simulations to demonstrate the importance and feasibility of modeling such behavior; our model produces linear distance estimates over a wide range of distances, while several alternative models underestimate long distances relative to short distances. Site-to-site differences in rates, as well as synonymous/nonsynonymous and first/second/third-codon-position differences, arise as a natural consequence of the site-to-site differences in amino acid frequencies.

317 citations


Journal ArticleDOI
TL;DR: The combined results of amino acid sequence comparisons, maximum-likelihood analyses, and phylogenetic studies underscore factors that might affect phylogenetic reconstruction.
Abstract: *Department of Biological Sciences, University of Idaho; and †Department of Organismic and Evolutionary Biology, HarvardUniversityInterest in the use of low-copy nuclear genes for phylogenetic analyses of plants has grown rapidly, because highlyrepetitive genes such as those commonly used are limited in number. Furthermore, because low-copy genes aresubject to different evolutionary processes than are plastid genes or highly repetitive nuclear markers, they providea valuable source of independent phylogenetic evidence. The gene for granule-bound starch synthase (GBSSI orwaxy) exists in a single copy in nearly all plants examined so far. Our study of GBSSI had three parts: (1) Aminoacid sequences were compared across a broad taxonomic range, including grasses, four dicotyledons, and themicrobial homologs of GBSSI. Inferred structural information was used to aid in the alignment of these verydivergent sequences. The informed alignments highlight amino acids that are conserved across all sequences, anddemonstrate that structural motifs can be highly conserved in spite of marked divergence in amino acid sequence.(2) Maximum-likelihood (ML) analyses were used to examine exon sequence evolution throughout grasses. Differ-ences in probabilities among substitution types and marked among-site rate variation contributed to the observedpattern of variation. Of the parameters examined in our set of likelihood models, the inclusion of among-site ratevariation following a gamma distribution caused the greatest improvement in likelihood score. (3) We performedcladistic parsimony analyses of GBSSI sequences throughout grasses, within tribes, and within genera to examinethe phylogenetic utility of the gene. Introns provide useful information among very closely related species, butquickly become difficult to align among more divergent taxa. Exons are variable enough to provide extensiveresolution within the family, but with low bootstrap support. The combined results of amino acid sequence com-parisons, maximum-likelihood analyses, and phylogenetic studies underscore factors that might affect phylogeneticreconstruction. In this case, accommodation of the variable rate of evolution among sites might be the first step inmaximizing the phylogenetic utility of GBSSI.

Journal ArticleDOI
TL;DR: A population of baker's yeast that underwent 450 generations of glucose-limited growth is analyzed and the existence of multiple tandem duplications involving two highly similar, high-affinity hexose transport loci, HXT6 and HXT7, is revealed.
Abstract: When microbes evolve in a continuous, nutrient-limited environment, natural selection can be predicted to favor genetic changes that give cells greater access to limiting substrate. We analyzed a population of baker's yeast that underwent 450 generations of glucose-limited growth. Relative to the strain used as the inoculum, the predominant cell type at the end of this experiment sustains growth at significantly lower steady-state glucose concentrations and demonstrates markedly enhanced cell yield per mole glucose, significantly enhanced high-affinity glucose transport, and greater relative fitness in pairwise competition. These changes are correlated with increased levels of mRNA hybridizing to probe generated from the hexose transport locus HXT6. Further analysis of the evolved strain reveals the existence of multiple tandem duplications involving two highly similar, high-affinity hexose transport loci, HXT6 and HXT7. Selection appears to have favored changes that result in the formation of more than three chimeric genes derived from the upstream promoter of the HXT7 gene and the coding sequence of HXT6. We propose a genetic mechanism to account for these changes and speculate as to their adaptive significance in the context of gene duplication as a common response of microorganisms to nutrient limitation.

Journal ArticleDOI
TL;DR: Six out of 12 independent replicate populations of Escherichia coli maintained in long-term glucose-limited continuous culture for up to approximately 1,750 generations evolve polymorphisms maintained by acetate crossfeeding.
Abstract: Six out of 12 independent replicate populations of Escherichia coli maintained in long-term glucose-limited continuous culture for up to approximately 1,750 generations evolve polymorphisms maintained by acetate crossfeeding. In all cases, the acetate-crossfeeding phenotype is associated with semiconstitutive overexpression of acetyl CoA synthetase, which allows for the enhanced uptake of low levels of exogenous acetate. Mutations in the 5' regulatory region of the acetyl CoA synthetase locus are responsible for all the acetate crossfeeding phenotypes found. These changes were either transposable-element insertions or a single T-->A nucleotide substitution at position -93 relative to the acs gene translation start site.

Journal ArticleDOI
TL;DR: This work presents a maximum-likelihood approach to estimating divergence times that deals explicitly with the problem of rate variation, and presents tests of the accuracy of the method, which show it to be robust to the effects of some modes of rate variations.
Abstract: The ability to date the time of divergence between lineages using molecular data provides the opportunity to answer many important questions in evolutionary biology. However, molecular dating techniques have previously been criticized for failing to adequately account for variation in the rate of molecular evolution. We present a maximum-likelihood approach to estimating divergence times that deals explicitly with the problem of rate variation. This method has many advantages over previous approaches including the following: (1) a rate constancy test excludes data for which rate heterogeneity is detected; (2) date estimates are generated with confidence intervals that allow the explicit testing of hypotheses regarding divergence times; and (3) a range of sequences and fossil dates are used, removing the reliance on a single calculated calibration rate. We present tests of the accuracy of our method, which show it to be robust to the effects of some modes of rate variation. In addition, we test the effect of substitution model and length of sequence on the accuracy of the dating technique. We believe that the method presented here offers solutions to many of the problems facing molecular dating and provides a platform for future improvements to such analyses.

Journal ArticleDOI
TL;DR: A computer program implementing the algorithm has been developed, and examples with simulated and natural sequences are given to demonstrate the sensitivity and accuracy of the method for identifying recombinant sequences and their recombination junctions as well as detecting hot spots of recombinational activity.
Abstract: Phylogenetic profiles constitute a novel way of graphically displaying the coherence of the sequence relationships over the entire length of a set of aligned homologous sequences. Using a sliding-window technique, this method determines the pairwise distances of all sequences in the windows and evaluates, for each sequence, the degree to which the patterns of distances in these regions agree. This method is suited for exploring data consistency as well as detecting recombinant sequences. A computer program implementing the algorithm has been developed, and examples with simulated and natural sequences are given to demonstrate the sensitivity and accuracy of the method for identifying recombinant sequences and their recombination junctions as well as detecting hot spots of recombinational activity.

Journal ArticleDOI
TL;DR: The Homoplasy Test is effective in detecting relatively frequent recombination between a set of rather similar strains, in contrast to previous methods which detect rare or unique transfers between more distant strains.
Abstract: In this article, a method is proposed for detecting recombination in the sequences of a gene from a set of closely related organisms. The method, the Homoplasy Test, is appropriate when the sequences are rather similar, differing by 1%-5% of nucleotides. It is effective in detecting relatively frequent recombination between a set of rather similar strains, in contrast to previous methods which detect rare or unique transfers between more distant strains. It is based on the fact that, if there is no recombination and if no repeated mutations have occurred (homoplasy), then the number of polymorphic sites, v, is equal to the number of steps, t, in a most-parsimonious tree. If the number of "apparent homoplasies" in the most-parsimonious tree, h = t-v, is greater than zero, then either homoplasies have occurred by mutation or there has been recombination. An estimate of the distribution of h expected on the null hypothesis of no recombination depends on Se, the "effective site number," defined as follows: if ps is the probability that two independent substitutions in the gene occur at the same site, then Se = 1/ps. Se can be estimated if a suitable outgroup is available. The Homoplasy Test is applied to three bacterial genes and to simulated gene trees with varying amounts of recombination. Methods of estimating the rate, as opposed to the occurrence, of recombination are discussed.

Journal ArticleDOI
TL;DR: Six studies are used to illustrate how phylogenies, site-directed mutagenesis, and a knowledge of protein structure combine to provide much deeper insights into the adaptive process than has hitherto been possible.
Abstract: The study of molecular adaptation has long been fraught with difficulties, not the least of which is identifying out of hundreds of amino acid replacements those few directly responsible for major adaptations. Six studies are used to illustrate how phylogenies, site-directed mutagenesis, and a knowledge of protein structure combine to provide much deeper insights into the adaptive process than has hitherto been possible. Ancient genes can be reconstructed, and the phenotypes can be compared to modern proteins. Out of hundreds of amino acid replacements accumulated over billions of years those few responsible for discriminating between alternative substrates are identified. An amino acid replacement of modest effect at the molecular level causes a dramatic expansion in an ecological niche. These and other topics are creating the emerging field of "paleomolecular biochemistry."

Journal ArticleDOI
TL;DR: It is argued that, in order to explain the observed distributions, gene families have to behave in a coherent fashion within the genome; i.e., the probabilities of duplications of genes within a gene family are not independent of each other.
Abstract: We compare the frequency distribution of gene family sizes in the complete genomes of six bacteria (Escherichia coli, Haemophilus influenzae, Helicobacter pylori, Mycoplasma genitalium, Mycoplasma pneumoniae, and Synechocystis sp. PCC6803), two Archaea (Methanococcus jannaschii and Methanobacterium thermoautotrophicum), one eukaryote (Saccharomyces cerevisiae), the vaccinia virus, and the bacteriophage T4. The sizes of the gene families versus their frequencies show power-law distributions that tend to become flatter (have a larger exponent) as the number of genes in the genome increases. Power-law distributions generally occur as the limit distribution of a multiplicative stochastic process with a boundary constraint. We discuss various models that can account for a multiplicative process determining the sizes of gene families in the genome. In particular, we argue that, in order to explain the observed distributions, gene families have to behave in a coherent fashion within the genome; i.e., the probabilities of duplications of genes within a gene family are not independent of each other. Likewise, the probabilities of deletions of genes within a gene family are not independent of each other.

Journal ArticleDOI
TL;DR: The usefulness of mtDNA for nematode phylogeny reconstruction is examined and data that can be used for a priori character weighting or for parameter specification in models of sequence evolution is provided.
Abstract: Only relatively recently have researchers turned to molecular methods for nematode phylogeny reconstruction. Thus, we lack the extensive literature on evolutionary patterns and phylogenetic usefulness of different DNA regions for nematodes that exists for other taxa. Here, we examine the usefulness of mtDNA for nematode phylogeny reconstruction and provide data that can be used for a priori character weighting or for parameter specification in models of sequence evolution. We estimated the substitution pattern for the mitochondrial ND4 gene from intraspecific comparisons in four species of parasitic nematodes from the family Trichostrongylidae (38-50 sequences per species). The resulting pattern suggests a strong mutational bias toward A and T, and a lower transition/transversion ratio than is typically observed in other taxa. We also present information on the relative rates of substitution at first, second, and third codon positions and on relative rates of saturation of different types of substitutions in comparisons ranging from intraspecific to interordinal. Silent sites saturate extremely quickly, presumably owing to the substitution bias and, perhaps, to an accelerated mutation rate. Results emphasize the importance of using only the most closely related sequences in order to infer patterns of substitution accurately for nematodes or for other taxa having strongly composition-biased DNA. ND4 also shows high amino acid polymorphism at both the intra- and interspecific levels, and in higher level comparisons, there is evidence of saturation at variable amino acid sites. In general, we recommend using mtDNA coding genes only for phylogenetics of relatively closely related nematode species and, even then, using only nonsynonymous substitutions and the more conserved mitochondrial genes (e.g., cytochrome oxidases). On the other hand, the high substitution rate in genes such as ND4 should make them excellent for population genetics studies, identifying cryptic species, and resolving relationships among closely related congeners when other markers show insufficient variation.

Journal ArticleDOI
TL;DR: Two new estimators of admixture proportions based on a coalescent approach that explicitly takes into account molecular information as well as gene frequencies are derived and mY proves to be less biased than conventional estimators over a wide range of situations and especially for microsatellite data.
Abstract: We derive here two new estimators of admixture proportions based on a coalescent approach that explicitly takes into account molecular information as well as gene frequencies. These estimators can be applied to any type of molecular data (such as DNA sequences, restriction fragment length polymorphisms [RFLPs], or microsatellite data) for which the extent of molecular diversity is related to coalescent times. Monte Carlo simulation studies are used to analyze the behavior of our estimators. We show that one of them (mY) appears suitable for estimating admixture from molecular data because of its absence of bias and relatively low variance. We then compare it to two conventional estimators that are based on gene frequencies. mY proves to be less biased than conventional estimators over a wide range of situations and especially for microsatellite data. However, its variance is larger than that of conventional estimators when parental populations are not very differentiated. The variance of mY becomes smaller than that of conventional estimators only if parental populations have been kept separated for about N generations and if the mutation rate is high. Simulations also show that several loci should always be studied to achieve a drastic reduction of variance and that, for microsatellite data, the mean square error of mY rapidly becomes smaller than that of conventional estimators if enough loci are surveyed. We apply our new estimator to the case of admixed wolflike Canid populations tested for microsatellite data.

Journal ArticleDOI
TL;DR: A positive correlation between repeat unit length and allelic variation suggests that mutation rate increases as the repeat unit lengths of microsatellites increase, and this work estimated the relative difference in microsatellite mutation rate among di-, tri-, and tetranucleotide repeats in the genome of D. melanogaster using a method based on population variation.
Abstract: In a recent study, we reported that the combined average mutation rate of 10 di-, 6 tri-, and 8 tetranucleotide repeats in Drosophila melanogaster was 6.3 x 10(-6) mutations per locus per generation, a rate substantially below that of microsatellite repeat units in mammals studied to date (range = 10(-2)-10(-5) per locus per generation). To obtain a more precise estimate of mutation rate for dinucleotide repeat motifs alone, we assayed 39 new dinucleotide repeat microsatellite loci in the mutation accumulation lines from our earlier study. Our estimate of mutation rate for a total of 49 dinucleotide repeats is 9.3 x 10(-6) per locus per generation, only slightly higher than the estimate from our earlier study. We also estimated the relative difference in microsatellite mutation rate among di-, tri-, and tetranucleotide repeats in the genome of D. melanogaster using a method based on population variation, and we found that tri- and tetranucleotide repeats mutate at rates 6.4 and 8.4 times slower than that of dinucleotide repeats, respectively. The slower mutation rates of tri- and tetranucleotide repeats appear to be associated with a relatively short repeat unit length of these repeat motifs in the genome of D. melanogaster. A positive correlation between repeat unit length and allelic variation suggests that mutation rate increases as the repeat unit lengths of microsatellites increase.

Journal ArticleDOI
TL;DR: The entire mitochondrial genome was sequenced in a prostriate tick, Ixodes hexagonus, and a metastriate Tick, Rhipicephalus sanguineus as mentioned in this paper.
Abstract: The entire mitochondrial genome was sequenced in a prostriate tick, Ixodes hexagonus, and a metastriate tick, Rhipicephalus sanguineus. Both genomes encode 22 tRNAs, 13 proteins, and two ribosomal RNAs. Prostriate ticks are basal members of Ixodidae and have the same gene order as Limulus polyphemus. In contrast, in R. sanguineus, a block of genes encoding NADH dehydrogenase subunit 1 (ND1), tRNA(Leu)(UUR), tRNA(Leu)(CUN), 16S rDNA, tRNA(Val), 12S rDNA, the control region, and the tRNA(Ile) and tRNA(Gln) have translocated to a position between the tRNA(Glu) and tRNA(Phe) genes. The tRNA(Cys) gene has translocated between the control region and the tRNA(Met) gene, and the tRNA(Leu)(CUN) gene has translocated between the tRNA(Ser)(UCN) gene and the control region. Furthermore, the control region is duplicated, and both copies undergo concerted evolution. Primers that flank these rearrangements confirm that this gene order is conserved in all metastriate ticks examined. Correspondence analysis of amino acid and codon use in the two ticks and in nine other arthropod mitochondrial genomes indicate a strong bias in R. sanguineus towards amino acids encoded by AT-rich codons.

Journal ArticleDOI
TL;DR: There is a large phylogenetic component to the observed size variation: natural isolates from certain subgroups of E. coli have consistently larger chromosome, suggesting that much of the additional DNA in larger chromosomes is shared through common ancestry.
Abstract: Large-scale variation in chromosome size was analyzed in 35 natural isolates of Escherichia coli by physical mapping with a restriction enzyme whose sites are restricted to rDNA operons. Although the genetic maps and chromosome lengths of the laboratory strains E. coli K12 and Salmonella enterica sv. Typhimurium LT2 are highly congruent, chromosome lengths among natural strains of E. coli can differ by as much as 1 Mb, ranging from 4.5 to 5.5 Mb in length. This variation has been generated by multiple changes dispersed throughout the genome, and these alterations are correlated; i.e., additions to one portion of the chromosome are often accompanied by additions to other chromosomal regions. This pattern of variation is most probably the result of selection acting to maintain equal distances between the replication origin and terminus on each side of the circular chromosome. There is a large phylogenetic component to the observed size variation: natural isolates from certain subgroups of E. coli have consistently larger chromosome, suggesting that much of the additional DNA in larger chromosomes is shared through common ancestry. There is no significant correlation between genome sizes and growth rates, which counters the view that the streamlining of bacterial genomes is a response to selection for faster growth rates in natural populations.

Journal ArticleDOI
TL;DR: The results strongly suggest that the high rate of DNA loss is a general feature of Drosophila and not a peculiar property of a particular stretch of DNA in a particular species group.
Abstract: We recently proposed that patterns of evolution of non-LTR retrotransposable elements can be used to study patterns of spontaneous mutation. Transposition of non-LTR retrotransposable elements commonly results in creation of 5' truncated, "dead-on-arrival" copies. These inactive copies are effectively pseudogenes and, according to the neutral theory, their molecular evolution ought to reflect rates and patterns of spontaneous mutation. Maximum parsimony can be used to separate the evolution of active lineages of a non-LTR element from the fate of the "dead-on-arrival" insertions and to directly assess the relative frequencies of different types of spontaneous mutations. We applied this approach using a non-LTR element, Helena, in the Drosophila virilis group and have demonstrated a surprisingly high incidence of large deletions and the virtual absence of insertions. Based on these results, we suggested that Drosophila in general may exhibit a high rate of spontaneous large deletions and have hypothesized that such a high rate of DNA loss may help to explain the puzzling dearth of bona fide pseudogenes in Drosophila. We also speculated that variation in the rate of spontaneous deletion may contribute to the divergence of genome size in different taxa by affecting the amount of superfluous "junk" DNA such as, for example, pseudogenes or long introns. In this paper, we extend our analysis to the D. melanogaster subgroup, which last shared a common ancestor with the D. virilis group approximately 40 MYA. In a different region of the same transposable element, Helena, we demonstrate that inactive copies accumulate deletions in species of the D. melanogaster subgroup at a rate very similar to that of the D. virilis group. These results strongly suggest that the high rate of DNA loss is a general feature of Drosophila and not a peculiar property of a particular stretch of DNA in a particular species group.

Journal ArticleDOI
TL;DR: Interestingly, the most primitive taxon within E. coli in terms of branching pattern, i.e., the B2 group, includes highly virulent extraintestinal strains with derived characters (extraintestinal virulence determinants) occurring on its own branch.
Abstract: Molecular phylogeny of the species Escherichia coli using the E. coli reference (ECOR) collection strains has been hampered by (1) the absence of rooting in the commonly used phenogram obtained from multilocus enzyme electrophoresis (MLEE) data and (2) the existence of recombination events between strains that scramble phylogenetic trees reconstructed from the nucleotide sequences of genes. We attempted to determine the phylogeny for E. coli based on the ECOR strain data by extracting from GenBank the nucleotide sequences of 11 chromosomal structural and 2 plasmid genes for which the Salmonella enterica homologous gene sequences were available. For each of the 13 DNA data sets studied, incongruence with a nonnucleotide whole-genome data set including MLEE, random amplified polymorphic DNA, and rrn restriction fragment length polymorphism data was measured using the incongruence length difference (ILD) test of Farris et al. As previously reported, the incongruence observed between the gnd and plasmid gene data and the whole-genome data was multiple, indicating numerous horizontal transfer and/or recombination events. In five cases, the incongruence detected by the ILD test was punctual, and the donor group was identified. Congruence was not rejected for the remaining data sets. The strains responsible for incongruences with the whole-genome data set were removed, leading to a "prior-agreement" approach, i.e., the determination of a phylogeny for E. coli based on several genes, excluding (1) the genes with multiple incongruences with the whole genome data, (2) the strains responsible for punctual incongruences, and (3) the genes incongruent with each other. The obtained phylogeny shows that the most basal group of E. coli strains is the B2 group rather than the A group, as generally thought. The D group then emerges as the sister group of the rest. Finally, the A and B1 groups are sister groups. Interestingly, the most primitive taxon within E. coli in terms of branching pattern, i.e., the B2 group, includes highly virulent extraintestinal strains with derived characters (extraintestinal virulence determinants) occurring on its own branch.

Journal ArticleDOI
TL;DR: The genetic algorithm described here required only 6% of the computational effort required by a conventional heuristic search using tree bisection/reconnection (TBR) branch swapping to obtain the same maximum-likelihood topology.
Abstract: Phylogeny reconstruction is a difficult computational problem, because the number of possible solutions increases with the number of included taxa. For example, for only 14 taxa, there are more than seven trillion possible unrooted phylogenetic trees. For this reason, phylogenetic inference methods commonly use clustering algorithms (e.g., the neighbor-joining method) or heuristic search strategies to minimize the amount of time spent evaluating nonoptimal trees. Even heuristic searches can be painfully slow, especially when computationally intensive optimality criteria such as maximum likelihood are used. I describe here a different approach to heuristic searching (using a genetic algorithm) that can tremendously reduce the time required for maximum-likelihood phylogenetic inference, especially for data sets involving large numbers of taxa. Genetic algorithms are simulations of natural selection in which individuals are encoded solutions to the problem of interest. Here, labeled phylogenetic trees are the individuals, and differential reproduction is effected by allowing the number of offspring produced by each individual to be proportional to that individual's rank likelihood score. Natural selection increases the average likelihood in the evolving population of phylogenetic trees, and the genetic algorithm is allowed to proceed until the likelihood of the best individual ceases to improve over time. An example is presented involving rbcL sequence data for 55 taxa of green plants. The genetic algorithm described here required only 6% of the computational effort required by a conventional heuristic search using tree bisection/reconnection (TBR) branch swapping to obtain the same maximum-likelihood topology.

Journal ArticleDOI
TL;DR: The hypothesis of parallel gene duplication is supported both by congruence among nucleotide and amino acid data sets and by topology-dependent permutation tail probability tests: that EF-1 alpha underwent parallel gene duplications in the Diptera and the Hymenoptera.
Abstract: We report the complete sequence of a paralogous copy of elongation factor-1 alpha (EF-1 alpha) in the honeybee, Apis mellifera (Hymenoptera: Apidae). This copy differs from a previously described copy in the positions of five introns and in 25% of the nucleotide sites in the coding regions. The existence of two paralogous copies of EF-1 alpha in Drosophila and Apis suggests that two copies of EF-1 alpha may be widespread in the holometabolous insect orders. To distinguish between a single, ancient gene duplication and parallel, independent fly and bee gene duplications, we performed a phylogenetic analysis of hexapod EF-1 alpha sequences. Unweighted parsimony analysis of nucleotide sequences suggests an ancient gene duplication event, whereas weighted parsimony analysis of nucleotides and unweighted parsimony analysis of amino acids suggests the contrary: that EF-1 alpha underwent parallel gene duplications in the Diptera and the Hymenoptera. The hypothesis of parallel gene duplication is supported both by congruence among nucleotide and amino acid data sets and by topology-dependent permutation tail probability (T-PTP) tests. The resulting tree topologies are also congruent with current views on the relationships among the holometabolous orders included in this study (Diptera, Hymenoptera, and Lepidoptera). More sequences, from diverse orders of holometabolous insects, will be needed to more accurately assess the historical patterns of gene duplication in EF-1 alpha.

Journal ArticleDOI
TL;DR: All nine mutations in Drosophila melanogaster occurred at the same allele of one locus (DROYANETSB), which is among the longest microsatellite reported in D. melanogasters and within the range of mammalian mutation rates.
Abstract: Within recent years, microsatellite have become one of the most powerful genetic markers in biology. For several mammalian species, microsatellite mutation rates have been estimated on the order of 10(-3)-10(-5). A recent study, however, demonstrated mutation rates in Drosophila melanogaster of at least one order of magnitude lower than those in mammals. To further test this result, we examined mutation rates of different microsatellite loci using a larger sample size. We screened 24 microsatellite loci in 119 D. melanogaster lines maintained for approximately 250 generations and detected 9 microsatellite mutations. The average mutation rate of 6.3 x 10(-6) is identical to the mutation rate from a previous study. Most interestingly, all nine mutations occurred at the same allele of one locus (DROYANETSB). This hypermutable allele has 28 dinucleotide repeats and is among the longest microsatellite reported in D. melanogaster. The allele-specific mutation rate of 3.0 x 10(-4) per generation is within the range of mammalian mutation rates. Future microsatellite analyses will have to account for the dramatic differences in allele-specific mutation rates.

Journal ArticleDOI
TL;DR: It is shown that comparing large numbers of species significantly improves the power of the relative-rate test, and results are more accurate when phylogenetic relationships between the investigated sequences are taken into account.
Abstract: Relative-rate tests may be used to compare substitution rates between more than two sequences, which yields two main questions: What influence does the number of sequences have on relative-rate tests and what is the influence of the sampling strategy as characterized by the phylogenetic relationships between sequences? Using both simulations and analysis of real data from murids (APRT and LCAT nuclear genes), we show that comparing large numbers of species significantly improves the power of the test. This effect is stronger if species are more distantly related. On the other hand, it appears to be less rewarding to increase outgroup sampling than to use the single nearest outgroup sequence. Rates may be compared between paraphyletic ingroups and using paraphyletic outgroups, but unbalanced taxonomic sampling can bias the test. We present a simple phylogenetic weighting scheme which takes taxonomic sampling into account and significantly improves the relative-rate test in cases of unbalanced sampling. The answers are thus: (1) large taxonomic sampling of compared groups improves relative-rate tests, (2) sampling many outgroups does not bring significant improvement, (3) the only constraint on sampling strategy is that the outgroup be valid, and (4) results are more accurate when phylogenetic relationships between the investigated sequences are taken into account. Given current limitations of the maximum-likelihood and nonparametric approaches, the relative-rate test generalized to any number of species with phylogenetic weighting appears to be the most general test available to compare rates between lineages.