scispace - formally typeset
Search or ask a question

Showing papers in "Molecular Biology and Evolution in 2003"


Journal ArticleDOI
TL;DR: The evolutionary dynamics of promoter, or cis-regulatory, sequences and the evolutionary mechanisms that shape them are reviewed.
Abstract: Gene expression is central to the genotype-phenotype relationship in all organisms, and it is an important component of the genetic basis for evolutionary change in diverse aspects of phenotype. However, the evolution of transcriptional regulation remains understudied and poorly understood. Here we review the evolutionary dynamics of promoter, or cis-regulatory, sequences and the evolutionary mechanisms that shape them. Existing evidence indicates that populations harbor extensive genetic variation in promoter sequences, that a substantial fraction of this variation has consequences for both biochemical and organismal phenotype, and that some of this functional variation is sorted by selection. As with protein-coding sequences, rates and patterns of promoter sequence evolution differ considerably among loci and among clades for reasons that are not well understood. Studying the evolution of transcriptional regulation poses empirical and conceptual challenges beyond those typically encountered in analyses of coding sequence evolution: promoter organization is much less regular than that of coding sequences, and sequences required for the transcription of each locus reside at multiple other loci in the genome. Because of the strong context-dependence of transcriptional regulation, sequence inspection alone provides limited information about promoter function. Understanding the functional consequences of sequence differences among promoters generally requires biochemical and in vivo functional assays. Despite these challenges, important insights have already been gained into the evolution of transcriptional regulation, and the pace of discovery is accelerating.

1,147 citations


Journal ArticleDOI
TL;DR: Computer simulation is used to investigate the behavior of three phylogenetic confidence methods: Bayesian posterior probabilities calculated via Markov chain Monte Carlo sampling (BMCMC-PP), maximum likelihood bootstrap proportion (ML-BP), and maximum parsimony boot strap proportion (MP-BP).
Abstract: Bayesian Markov chain Monte Carlo sampling has become increasingly popular in phylogenetics as a method for both estimating the maximum likelihood topology and for assessing nodal confidence. Despite the growing use of posterior probabilities, the relationship between the Bayesian measure of confidence and the most commonly used confidence measure in phylogenetics, the nonparametric bootstrap proportion, is poorly understood. We used computer simulation to investigate the behavior of three phylogenetic confidence methods: Bayesian posterior probabilities calculated via Markov chain Monte Carlo sampling (BMCMC-PP), maximum likelihood bootstrap proportion (ML-BP), and maximum parsimony bootstrap proportion (MP-BP). We simulated the evolution of DNA sequence on 17-taxon topologies under 18 evolutionary scenarios and examined the performance of these methods in assigning confidence to correct monophyletic and incorrect monophyletic groups, and we examined the effects of increasing character number on support value. BMCMC-PP and ML-BP were often strongly correlated with one another but could provide substantially different estimates of support on short internodes. In contrast, BMCMC-PP correlated poorly with MP-BP across most of the simulation conditions that we examined. For a given threshold value, more correct monophyletic groups were supported by BMCMC-PP than by either ML-BP or MP-BP. When threshold values were chosen that fixed the rate of accepting incorrect monophyletic relationship as true at 5%, all three methods recovered most of the correct relationships on the simulated topologies, although BMCMC-PP and ML-BP performed better than MP-BP. BMCMC-PP was usually a less biased predictor of phylogenetic accuracy than either bootstrapping method. BMCMC-PP provided high support values for correct topological bipartitions with fewer characters than was needed for nonparametric bootstrap.

949 citations


Journal ArticleDOI
TL;DR: Based on the current characterization of a limited number of plant bHLH proteins, it is predicted that this family of TFs has a range of different roles in plant cell and tissue development as well as plant metabolism.
Abstract: Basic helix-loop-helix (bHLH) transcription factors (TFs) belong to a family of transcriptional regulators present in three eukaryotic kingdoms. Many different functions have been identified for these proteins in animals, including the control of cell proliferation and development of specific cell lineages. Their mechanism for controlling gene transcription often involves homodimerization or heterodimerization. In plants, little is known about the bHLH family, but we have determined that there are 133 bHLH genes in Arabidopsis thaliana and have confirmed that at least 113 of them are expressed. The AtbHLH genes constitute one of the largest families of transcription factors in A. thaliana with significantly more members than are found in most animal species and about an equivalent number to those in vertebrates. Comparisons with animal sequences suggest that the majority of plant bHLH genes have evolved from the ancestral group B class of bHLH genes. By studying the AtbHLH genes collectively, twelve subfamilies have been identified. Within each of these main groups, there are conserved amino acid sequence motifs outside the DNA binding domain. Potential gene redundancy among members of smaller subgroups has been analyzed, and the resulting information is presented to provide a simplified visual interpretation of the gene family, identifying related genes that are likely to share similar functions. Based on the current characterization of a limited number of plant bHLH proteins, we predict that this family of TFs has a range of different roles in plant cell and tissue development as well as plant metabolism.

875 citations


Journal ArticleDOI
TL;DR: The analysis of 2977 pairwise sequence comparisons from 176 nuclear genes reveals a long-term fruit fly mutation clock ticking at a rate of 11.1 mutations per kilobase pair per Myr.
Abstract: Drosophila melanogaster has been a canonical model organism to study genetics, development, behavior, physiology, evolution, and population genetics for nearly a century Despite this emphasis and the completion of its nuclear genome sequence, the timing of major speciation events leading to the origin of this fruit fly remain elusive because of the paucity of extensive fossil records and biogeographic data Use of molecular clocks as an alternative has been fraught with non-clock-like accumulation of nucleotide and amino-acid substitutions Here we present a novel methodology in which genomic mutation distances are used to overcome these limitations and to make use of all available gene sequence data for constructing a fruit fly molecular time scale Our analysis of 2977 pairwise sequence comparisons from 176 nuclear genes reveals a long-term fruit fly mutation clock ticking at a rate of 111 mutations per kilobase pair per Myr Genomic mutation clock-based timings of the landmark speciation events leading to the evolution of D melanogaster show that it shared most recent common ancestry 54 MYA with D simulans, 126 MYA with D erecta+D orena, 128 MYA with D yakuba+D teisseri, 356 MYA with the takahashii subgroup, 413 MYA with the montium subgroup, 442 MYA with the ananassae subgroup, 549 MYA with the obscura group, 622 MYA with the willistoni group, and 629 MYA with the subgenus Drosophila These and other estimates are compatible with those known from limited biogeographic and fossil records The inferred temporal pattern of fruit fly evolution shows correspondence with the cooling patterns of paleoclimate changes and habitat fragmentation in the Cenozoic

601 citations


Journal ArticleDOI
TL;DR: A simulation study examining the effect of a recent spatial expansion on the pattern of molecular diversity within a deme finds that the shape of the gene genealogies and the overall pattern of diversity within demes depend not only on the age of the expansion but also on the level of gene flow between neighboring demes, as measured by the product Nm.
Abstract: We report here a simulation study examining the effect of a recent spatial expansion on the pattern of molecular diversity within a deme. We first simulate a range expansion in a virtual world consisting in a two-dimensional array of demes exchanging a given proportion of migrants (m) with their neighbors. The recorded demographic and migration histories are then used under a coalescent approach to generate the genetic diversity in a sample of genes. We find that the shape of the gene genealogies and the overall pattern of diversity within demes depend not only on the age of the expansion but also on the level of gene flow between neighboring demes, as measured by the product Nm, where N is the size of a deme. For small Nm values (< approximately 20 migrants sent outwards per generation), a substantial proportion of coalescent events occur early in the genealogy, whereas with larger levels of gene flow, most coalescent events occur around the time of the onset of the spatial expansion. Gene genealogies are star shaped, and mismatch distributions are unimodal after a range expansion for large Nm values. In contrast, gene genealogies present a mixture of both very short and very long branch lengths, and mismatch distributions are multimodal for small Nm values. It follows that statistics used in tests of selective neutrality like Tajima's D statistic or Fu's F(S) statistic will show very significant negative values after a spatial expansion only in demes with high Nm values. In the context of human evolution, this difference could explain very simply the fact that analyses of samples of mitochondrial DNA sequences reveal multimodal mismatch distributions in hunter-gatherers and unimodal distributions in post-Neolithic populations. Indeed, the current simulations show that a recent increase in deme size (resulting in a larger Nm value) is sufficient to prevent recent coalescent events and thus lead to unimodal mismatch distributions, even if deme sizes (and therefore Nm values) were previously much smaller. The fact that molecular diversity within deme is so dependent on recent levels of gene flow suggests that it should be possible to estimate Nm values from samples drawn from a single deme.

562 citations


Journal ArticleDOI
TL;DR: The nonparametric bootstrap resampling procedure is applied to the Bayesian approach and shows that the relation between posterior probabilities and bootstrapped maximum likelihood percentages is highly variable but that very strong correlations always exist when Bayesian node support is estimated onbootstrapped character matrices.
Abstract: Owing to the exponential growth of genome databases, phylogenetic trees are now widely used to test a variety of evolutionary hypotheses. Nevertheless, computation time burden limits the application of methods such as maximum likelihood nonparametric bootstrap to assess reliability of evolutionary trees. As an alternative, the much faster Bayesian inference of phylogeny, which expresses branch support as posterior probabilities, has been introduced. However, marked discrepancies exist between nonparametric bootstrap proportions and Bayesian posterior probabilities, leading to difficulties in the interpretation of sometimes strongly conflicting results. As an attempt to reconcile these two indices of node reliability, we apply the nonparametric bootstrap resampling procedure to the Bayesian approach. The correlation between posterior probabilities, bootstrap maximum likelihood percentages, and bootstrapped posterior probabilities was studied for eight highly diverse empirical data sets and were also investigated using experimental simulation. Our results show that the relation between posterior probabilities and bootstrapped maximum likelihood percentages is highly variable but that very strong correlations always exist when Bayesian node support is estimated on bootstrapped character matrices. Moreover, simulations corroborate empirical observations in suggesting that, being more conservative, the bootstrap approach might be less prone to strongly supporting a false phylogenetic hypothesis. Thus, apparent conflicts in topology recovered by the Bayesian approach were reduced after bootstrapping. Both posterior probabilities and bootstrap supports are of great interest to phylogeny as potential upper and lower bounds of node reliability, but they are surely not interchangeable and cannot be directly compared.

501 citations


Journal ArticleDOI
TL;DR: It is demonstrated that the evolution of seven genes involved in mammalian fertilization is promoted by positive Darwinian selection by using likelihood ratio tests (LRTs).
Abstract: Mammalian fertilization exhibits species specificity, and the proteins mediating sperm-egg interactions evolve rapidly between species. In this study, we demonstrate that the evolution of seven genes involved in mammalian fertilization is promoted by positive Darwinian selection by using likelihood ratio tests (LRTs). Several of these proteins are sperm proteins that have been implicated in binding the mammalian egg coat zona pellucida glycoproteins, which were shown previously to be subjected to positive selection. Taken together, these represent the major candidates involved in mammalian fertilization, indicating positive selection is pervasive amongst mammalian reproductive proteins. A new LRT is implemented to determine if the d(N)/d(S) ratio is significantly greater than one. This is a more refined test of positive selection than the previous LRTs which only identified if there was a class of sites with a d(N)/d(S) ratio >1 but did not test if that ratio was significantly greater than one.

484 citations


Journal ArticleDOI
TL;DR: It is advisable to concatenate many gene sequences and use a multigene gamma distance for estimating divergence times rather than using the individual gene approach, and nuclear proteins are generally more suitable than mitochondrial proteins for time estimation.
Abstract: Although the phylogenetic relationships of major lineages of primate species are relatively well established, the times of divergence of these lineages as estimated by molecular data are still controversial. This controversy has been generated in part because different authors have used different types of molecular data, different statistical methods, and different calibration points. We have therefore examined the effects of these factors on the estimates of divergence times and reached the following conclusions: (1) It is advisable to concatenate many gene sequences and use a multigene gamma distance for estimating divergence times rather than using the individual gene approach. (2) When sequence data from many nuclear genes are available, protein sequences appear to give more robust estimates than DNA sequences. (3) Nuclear proteins are generally more suitable than mitochondrial proteins for time estimation. (4) It is important first to construct a phylogenetic tree for a group of species using some outgroups and then estimate the branch lengths. (5) It appears to be better to use a few reliable calibration points rather than many unreliable ones. Considering all these factors and using two calibration points, we estimated that the human lineage diverged from the chimpanzee, gorilla, orangutan, Old World monkey, and New World monkey lineages approximately 6 MYA (with a range of 5-7), 7 MYA (range, 6-8), 13 MYA (range, 12-15), 23 MYA (range, 21-25), and 33 MYA (range 32-36).

376 citations


Journal ArticleDOI
TL;DR: Extensions are presented to standard phylogenetic models that allow for better handling of context-dependent substitution, yet still permit exact inference at reasonable computational cost, and improve goodness of fit substantially for both coding and noncoding data.
Abstract: Nucleotide substitution in both coding and noncoding regions is context-dependent, in the sense that substitution rates depend on the identity of neighboring bases. Context-dependent substitution has been modeled in the case of two sequences and an unrooted phylogenetic tree, but it has only been accommodated in limited ways with more general phylogenies. In this article, extensions are presented to standard phylogenetic models that allow for better handling of context-dependent substitution, yet still permit exact inference at reasonable computational cost. The new models improve goodness of fit substantially for both coding and noncoding data. Considering context dependence leads to much larger improvements than does using a richer substitution model or allowing for rate variation across sites, under the assumption of site independence. The observed improvements appear to derive from three separate properties of the models: their explicit characterization of context-dependent substitution within N-tuples of adjacent sites, their ability to accommodate overlapping N-tuples, and their rich parameterization of the substitution process. Parameter estimation is accomplished using an expectation maximization algorithm, with a quasi-Newton algorithm for the maximization step; this approach is shown to be preferable to ordinary Newton methods for parameter-rich models. Overlapping tuples are efficiently handled by assuming Markov dependence of the observed bases at each site on those at the N - 1 preceding sites, and the required conditional probabilities are computed with an extension of Felsenstein's algorithm. Estimated substitution rates based on a data set of about 160,000 noncoding sites in mammalian genomes indicate a pronounced CpG effect, but they also suggest a complex overall pattern of context-dependent substitution, comprising a variety of subtle effects. Estimates based on about 3 million sites in coding regions demonstrate that amino acid substitution rates can be learned at the nucleotide level, and suggest that context effects across codon boundaries are significant.

369 citations


Journal ArticleDOI
TL;DR: Analysis of 20,000 genes contained in eight free-living prokaryotic genomes indicates that HGT occurs among organisms that share similar factors, including genome size, genome G/C composition, carbon utilization, and oxygen tolerance.
Abstract: Horizontal gene transfer (HGT) spreads genetic diversity by moving genes across species boundaries. By rapidly introducing newly evolved genes into existing genomes, HGT circumvents the slow step of ab initio gene creation and accelerates genome innovation. However, HGT can only affect organisms that readily exchange genes (exchange communities). In order to define exchange communities and understand the internal and external environmental factors that regulate HGT, we analyzed approximately 20,000 genes contained in eight free-living prokaryotic genomes. These analyses indicate that HGT occurs among organisms that share similar factors. The most significant are genome size, genome G/C composition, carbon utilization, and oxygen tolerance.

332 citations


Journal ArticleDOI
TL;DR: Scanning the malaria parasite genome for evidence of recent selection may prove an extremely effective way to locate genes underlying recently evolved traits such as drug resistance, as well as providing an opportunity to study the dynamics of selective events that have occurred recently or are currently in progress.
Abstract: Malaria parasites (Plasmodium falciparum) provide an excellent system in which to study the genomic effects of strong selection in a recombining eukaryote because the rapid spread of resistance to multiple drugs during the last the past 50 years has been well documented, the full genome sequence and a microsatellite map are now available, and haplotype data can be easily generated. We examined microsatellite variation around the dihydrofolate reductase (dhfr) gene on chromosome 4 of P. falciparum. Point mutations in dhfr are known to be responsible for resistance to the antimalarial drug pyrimethamine, and resistance to this drug has spread rapidly in Southeast (SE) Asia after its introduction in 1970s. We genotyped 33 microsatellite markers distributed across chromosome 4 in 61 parasites from a location on the Thailand/Myanmar border. We observed minimal microsatellite length variation in a 12-kb (0.7-cM) region flanking the dhfr gene and diminished variation for approximately 100 kb (6 cM), indicative of a single origin of resistant alleles. Furthermore, we found the same or similar microsatellite haplotypes flanked resistant dhfr alleles sampled from 11 parasite populations in five SE Asian countries indicating recent invasion of a single lineage of resistant dhfr alleles in locations 2000 km apart. Three features of these data are of especially interest. (1). Pyrimethamine resistance is generally assumed to have evolved multiple times because the genetic basis is simple and resistance can be selected easily in the laboratory. Yet our data clearly indicate a single origin of resistant dhfr alleles sampled over a large region of SE Asia. (2). The wide valley ( approximately 6 cM) of reduced variation around dhfr provides "proof-of-principle" that genome-wide association may be an effective way to locate genes under strong recent selection. (3). The width of the selective valley is consistent with predictions based on independent measures of recombination, mutation, and selection intensity, suggesting that we have reasonable estimates of these parameters. We conclude that scanning the malaria parasite genome for evidence of recent selection may prove an extremely effective way to locate genes underlying recently evolved traits such as drug resistance, as well as providing an opportunity to study the dynamics of selective events that have occurred recently or are currently in progress.

Journal ArticleDOI
TL;DR: Evidence from approximately 200,000 nucleotides suggests that polyploidy in Gossypium led to a modest enhancement in rates of nucleotide substitution, suggesting an absence of gene conversion or recombination among homoeologs subsequent to allopolyploid formation.
Abstract: Molecular evolutionary rate variation in Gossypium (cotton) was characterized using sequence data for 48 nuclear genes from both genomes of allotetraploid cotton, models of its diploid progenitors, and an outgroup. Substitution rates varied widely among the 48 genes, with silent and replacement substitution levels varying from 0.018 to 0.162 and from 0.000 to 0.073, respectively, in comparisons between orthologous Gossypium and outgroup sequences. However, about 90% of the genes had silent substitution rates spanning a more narrow threefold range. Because there was no evidence of rate heterogeneity among lineages for any gene and because rates were highly correlated in independent tests, evolutionary rate is inferred to be a property of each gene or its genetic milieu rather than the clade to which it belongs. Evidence from approximately 200,000 nucleotides (40,000 per genome) suggests that polyploidy in Gossypium led to a modest enhancement in rates of nucleotide substitution. Phylogenetic analysis for each gene yielded the topology expected from organismal history, indicating an absence of gene conversion or recombination among homoeologs subsequent to allopolyploid formation. Using the mean synonymous substitution rate calculated across the 48 genes, allopolyploid cotton is estimated to have formed circa 1.5 million years ago (MYA), after divergence of the diploid progenitors about 6.7 MYA.

Journal ArticleDOI
TL;DR: The data support VS as the ancestral state in birds and show that UVS has evolved independently at least four times, indicating that interactions between predators and prey may explain the presence of UVS in Laridae and Passeriformes.
Abstract: To gain insights into the evolution and ecology of visually acute animals such as birds, biologists often need to understand how these animals perceive colors. This poses a problem, since the human eye is of a different design than that of most other animals. The standard solution is to examine the spectral sensitivity properties of animal retinas through microspectophotometry-a procedure that is rather complicated and therefore only has allowed examinations of a limited number of species to date. We have developed a faster and simpler molecular method, which can be used to estimate the color sensitivities of a bird by sequencing a part of the gene coding for the ultraviolet or violet absorbing opsin in the avian retina. With our method, there is no need to sacrifice the animal, and it thereby facilitates large screenings, including rare and endangered species beyond the reach of microspectrophotometry. Color vision in birds may be categorized into two classes: one with a short-wavelength sensitivity biased toward violet (VS) and the other biased toward ultraviolet (UVS). Using our method on 45 species from 35 families, we demonstrate that the distribution of avian color vision is more complex than has previously been shown. Our data support VS as the ancestral state in birds and show that UVS has evolved independently at least four times. We found species with the UVS type of color vision in the orders Psittaciformes and Passeriformes, in agreement with previous findings. However, species within the families Corvidae and Tyrannidae did not share this character with other passeriforms. We also found UVS type species within the Laridae and Struthionidae families. Raptors (Accipitridae and Falconidae) are of the violet type, giving them a vision system different from their passeriform prey. Intriguing effects on the evolution of color signals can be expected from interactions between predators and prey. Such interactions may explain the presence of UVS in Laridae and Passeriformes.

Journal ArticleDOI
TL;DR: The hypothesis of a single migration of a polymorphic founding population better fits the expanded database and traced both lineages to a probable ancestral homeland in the vicinity of the Altai Mountains in Southwest Siberia.
Abstract: A total of 63 binary polymorphisms and 10 short tandem repeats (STRs) were genotyped on a sample of 2,344 Y chromosomes from 18 Native American, 28 Asian, and 5 European populations to investigate the origin(s) of Native American paternal lineages. All three of Greenberg's major linguistic divisions (including 342 Amerind speakers, 186 Na-Dene speakers, and 60 Aleut-Eskimo speakers) were represented in our sample of 588 Native Americans. Single-nucleotide polymorphism (SNP) analysis indicated that three major haplogroups, denoted as C, Q, and R, accounted for nearly 96% of Native American Y chromosomes. Haplogroups C and Q were deemed to represent early Native American founding Y chromosome lineages; however, most haplogroup R lineages present in Native Americans most likely came from recent admixture with Europeans. Although different phylogeographic and STR diversity patterns for the two major founding haplogroups previously led to the inference that they were carried from Asia to the Americas separately, the hypothesis of a single migration of a polymorphic founding population better fits our expanded database. Phylogenetic analyses of STR variation within haplogroups C and Q traced both lineages to a probable ancestral homeland in the vicinity of the Altai Mountains in Southwest Siberia. Divergence dates between the Altai plus North Asians versus the Native American population system ranged from 10,100 to 17,200 years for all lineages, precluding a very early entry into the Americas.

Journal ArticleDOI
TL;DR: To determine the origin of cultivated rice, subfamily members of the rice retroposon p-SINE1, which show insertion polymorphism in the O. sativa -O.
Abstract: The wild rice species Oryza rufipogon with wide intraspecific variation is thought to be the progenitor of the cultivated rice species Oryza sativa with two ecotypes, japonica and indica. To determine the origin of cultivated rice, subfamily members of the rice retroposon p-SINE1, which show insertion polymorphism in the O. sativa -O. rufipogon population, were identified and used to "bar code" each of 101 cultivated and wild rice strains based on the presence or absence of the p-SINE1 members at the respective loci. A phylogenetic tree constructed based on the bar codes given to the rice strains showed that O. sativa strains were classified into two groups corresponding to japonica and indica, whereas O. rufipogon strains were in four groups, in which annual O. rufipogon strains formed a single group, differing from the perennial O. rufipogon strains of the other three groups. Japonica strains were closely related to the O. rufipogon perennial strains of one group, and the indica strains were closely related to the O. rufipogon annual strains, indicating that O. sativa has been derived polyphyletically from O. rufipogon. The subfamily members of p-SINE1 constitute a powerful tool for studying the classification and relationship of rice strains, even when one has limited knowledge of morphology, taxonomy, physiology, and biochemistry of rice strains.

Journal ArticleDOI
TL;DR: The results indicate that the effective number of HCV infections in Egypt underwent rapid exponential growth between 1930 and 1955, and it is concluded that HCV is likely to remain prevalent in Egypt for several decades.
Abstract: Hepatitis C virus (HCV) is a leading cause of liver cancer and cirrhosis, and Egypt has possibly the highest HCV prevalence worldwide. In this article we use a newly developed Bayesian inference framework to estimate the transmission dynamics of HCV in Egypt from sampled viral gene sequences, and to predict the public health impact of the virus. Our results indicate that the effective number of HCV infections in Egypt underwent rapid exponential growth between 1930 and 1955. The timing and speed of this spread provides quantitative genetic evidence that the Egyptian HCV epidemic was initiated and propagated by extensive antischistosomiasis injection campaigns. Although our results show that HCV transmission has since decreased, we conclude that HCV is likely to remain prevalent in Egypt for several decades. Our combined population genetic and epidemiological analysis provides detailed estimates of historical changes in Egyptian HCV prevalence. Because our results are consistent with a demographic scenario specified a priori, they also provide an objective test of inference methods based on the coalescent process.

Journal ArticleDOI
TL;DR: A phylogenetic analysis of the T2R genes suggests that they can be classified into three main groups, which are designated A, B, and C, and reveals that phylogenetically closely related T1R genes are close in their chromosomal locations, demonstrating tandem gene duplication as the primary source of new T2Rs.
Abstract: The diversity and evolution of bitter taste perception in mammals is not well understood. Recent discoveries of bitter taste receptor (T2R) genes provide an opportunity for a genetic approach to this question. We here report the identification of 10 and 30 putative T2R genes from the draft human and mouse genome sequences, respectively, in addition to the 23 and 6 previously known T2R genes from the two species. A phylogenetic analysis of the T2R genes suggests that they can be classified into three main groups, which are designated A, B, and C. Interestingly, while the one-to-one gene orthology between the human and mouse is common to group B and C genes, group A genes show a pattern of species- or lineage-specific duplication. It is possible that group B and C genes are necessary for detecting bitter tastants common to both humans and mice, whereas group A genes are used for species-specific bitter tastants. The analysis also reveals that phylogenetically closely related T2R genes are close in their chromosomal locations, demonstrating tandem gene duplication as the primary source of new T2Rs. For closely related paralogous genes, a rate of nonsynonymous nucleotide substitution significantly higher than the rate of synonymous substitution was observed in the extracellular regions of T2Rs, which are presumably involved in tastant-binding. This suggests the role of positive selection in the diversification of newly duplicated T2R genes. Because many natural poisonous substances are bitter, we conjecture that the mammalian T2R genes are under diversifying selection for the ability to recognize a diverse array of poisons that the organisms may encounter in exploring new habitats and diets.

Journal ArticleDOI
TL;DR: This study estimates microsatellite slippage mutation rates from public sequence databases and uses the least squares method with constraints to estimate expansion and contraction mutation rates, which agree with the length-dependent mutation pattern observed from experimental data.
Abstract: Microsatellite markers are widely used for genetic studies, but the relationship between microsatellite slippage mutation rate and the number of repeat units remains unclear. In this study, microsatellite distributions in the human genome are collected from public sequence databases. We observe that there is a threshold size for slippage mutations. We consider a model of microsatellite mutation consisting of point mutations and single stepwise slippage mutations. From two sets of equations based on two stochastic processes and equilibrium assumptions, we estimate microsatellite slippage mutation rates without assuming any relationship between microsatellite slippage mutation rate and the number of repeat units. We use the least squares method with constraints to estimate expansion and contraction mutation rates. The estimated slippage mutation rate increases exponentially as the number of repeat units increases. When slippage mutations happen, expansion occurs more frequently for short microsatellites and contraction occurs more frequently for long microsatellites. Our results agree with the length-dependent mutation pattern observed from experimental data, and they explain the scarcity of long microsatellites.

Journal ArticleDOI
TL;DR: The majority of phylogenetic analyses of protein-coding genes of this chloroplast DNA suggests that Amborella is not the basal angiosperm and not even the most basal among dicots.
Abstract: Phylogenetic analyses based on comparison of a limited number of genes recently suggested that Amborella trichopoda is the most ancient angiosperm. Here we present the complete sequence of the chloroplast genome of this plant. It does not display any of the genes characteristic of chloroplast DNA of the gymnosperm Pinus thunbergii (chlB, chlL, chlN, psaM, and ycf12). The majority of phylogenetic analyses of protein-coding genes of this chloroplast DNA suggests that Amborella is not the basal angiosperm and not even the most basal among dicots.

Journal ArticleDOI
Lior David1, Shula Blum, Marcus W. Feldman, Uri Lavi, Jossi Hillel 
TL;DR: Phylogenetic analysis of several cyprinid species suggests an evolutionary model for this tetraploidization, with a role for polyploidized in speciation and diversification.
Abstract: Genome duplications may have played a role in the early stages of vertebrate evolution, near the time of divergence of the lamprey lineage. Additional genome duplication, specifically in ray-finned fish, may have occurred before the divergence of the teleosts. The common carp (Cyprinus carpio) has been considered tetraploid because of its chromosome number (2n = 100) and its high DNA content. We studied variation using 59 microsatellite primer pairs to better understand the ploidy level of the common carp. Based on the number of PCR amplicons per individual, about 60% of these primer pairs are estimated to amplify duplicates. Segregation patterns in families suggested a partially duplicated genome structure and disomic inheritance. This could suggest that the common carp is tetraploid and that polyploidy occurred by hybridization (allotetraploidy). From sequences of microsatellite flanking regions, we estimated the difference per base between pairs of alleles and between pairs of paralogs. The distribution of differences between paralogs had two distinct modes suggesting one whole-genome duplication and a more recent wave of segmental duplications. The genome duplication was estimated to have occurred about 12 MYA, with the segmental duplications occurring between 2.3 and 6.8 MYA. At 12 MYA, this would be one of the most recent genome duplications among vertebrates. Phylogenetic analysis of several cyprinid species suggests an evolutionary model for this tetraploidization, with a role for polyploidization in speciation and diversification.

Journal ArticleDOI
TL;DR: Analysis of conserved regions and secondary structures of the ITS region provided no evidence that, in this system, hybrid ITS evolution is predominantly driven in a particular direction, however, two regions in the ITS1 and ITS2, respectively, show higher mutation rates than expected from outgroup comparisons.
Abstract: DNA sequence variation of the internal transcribed spacer (ITS) region of nuclear ribosomal DNA from Arabis holboellii, A. drummondii, and its putative hybrid A. divaricarpa was analyzed to study hybrid speciation in a species system geographically covering nearly the entire North American continent. Based on molecular systematics the investigated species are better combined under the genus Boechera. Multiple intraindividual ITS copies were detected in numerous accessions of A. divaricarpa, and, to a minor extent, in the parental taxa. Comparative phylogenetic analysis demonstrates that reticulate evolution is common. Consequently, concerted evolution of ITS regions resulted in different types of ITS fragments not only in hybrid populations but also in one of the parental taxa, A. holboellii. Hybrid formation often occurred independently at different sites and at different times, which is reflected by ITS copies resampling the original parental sequence variation in different ways. Some biogeographic structuring of genetic diversity is apparent and mirrors postglacial migration routes. Hybridization, reticulation, and apomixis are assumed to be the major forces driving speciation processes in this species complex. Analysis of conserved regions and secondary structures of the ITS region provided no evidence that, in this system, hybrid ITS evolution is predominantly driven in a particular direction. However, two regions in the ITS1 and ITS2, respectively, show higher mutation rates than expected from outgroup comparisons. Strong evidence for the occurrence of apomixis in A. holboellii and A. divaricarpa has come from pollen size measurements and estimations of pollen quality, which favor the hypothesis that A. drummondii served as paternal hybridization partner more frequently than A. holboellii.

Journal ArticleDOI
TL;DR: This analysis reveals that dengue virus generally evolves according to a molecular clock, although some serotypes-specific and genotype-specific rate differences were observed, and that its origin is more recent than previously suggested, with the virus appearing approximately 1,000 years ago.
Abstract: Dengue is often referred to as an emerging disease because of the rapid increases in incidence and prevalence that have been observed in recent decades. To understand the rate at which genetic diversification occurs in dengue virus and to infer the time-scale of its evolution, we employed a maximum likelihood method that uses information about times of virus sampling to estimate the rate of molecular evolution in a large number of viral envelope (E) gene sequences and to place bounds around the dates of appearance of all serotypes and specific genotypes. Our analysis reveals that dengue virus generally evolves according to a molecular clock, although some serotype-specific and genotype-specific rate differences were observed, and that its origin is more recent than previously suggested, with the virus appearing approximately 1,000 years ago. Furthermore, we estimate that the zoonotic transfer of dengue from sylvatic (monkey) to sustained human transmission occurred between 125 and 320 years ago, that the current global genetic diversity in the four serotypes of dengue virus only appeared during the past century, and that the recent rise in genetic diversity can be loosely correlated both to human activities such as population growth, urbanization, and mass transport and to the emergence of dengue hemorrhagic fever as a major disease problem.

Journal ArticleDOI
TL;DR: It is proved that the BME principle is a special case of the weighted least-squares approach, with biologically meaningful variances of the distance estimates, and it is demonstrated that FASTME only produces trees with positive branch lengths, a feature that separates this approach from NJ (and related methods) that may produce trees with branches with biologically meaningless negative lengths.
Abstract: Due to its speed, the distance approach remains the best hope for building phylogenies on very large sets of taxa. Recently (R. Desper and O. Gascuel, J. Comp. Biol. 9:687-705, 2002), we introduced a new "balanced" minimum evolution (BME) principle, based on a branch length estimation scheme of Y. Pauplin (J. Mol. Evol. 51:41-47, 2000). Initial simulations suggested that FASTME, our program implementing the BME principle, was more accurate than or equivalent to all other distance methods we tested, with running time significantly faster than Neighbor-Joining (NJ). This article further explores the properties of the BME principle, and it explains and illustrates its impressive topological accuracy. We prove that the BME principle is a special case of the weighted least-squares approach, with biologically meaningful variances of the distance estimates. We show that the BME principle is statistically consistent. We demonstrate that FASTME only produces trees with positive branch lengths, a feature that separates this approach from NJ (and related methods) that may produce trees with branches with biologically meaningless negative lengths. Finally, we consider a large simulated data set, with 5,000 100-taxon trees generated by the Aldous beta-splitting distribution encompassing a range of distributions from Yule-Harding to uniform, and using a covarion-like model of sequence evolution. FASTME produces trees faster than NJ, and much faster than WEIGHBOR and the weighted least-squares implementation of PAUP*. Moreover, FASTME trees are consistently more accurate at all settings, ranging from Yule-Harding to uniform distributions, and all ranges of maximum pairwise divergence and departure from molecular clock. Interestingly, the covarion parameter has little effect on the tree quality for any of the algorithms. FASTME is freely available on the web.

Journal ArticleDOI
TL;DR: It is shown that genome engineering is a feasible strategy for functional analysis of large gene clusters, and that removal of dispensable genomic regions may pave the way toward an optimized Bacillus cell factory.
Abstract: Bacterial genomes contain 250 to 500 essential genes, as suggested by single gene disruptions and theoretical considerations. If this view is correct, the remaining nonessential genes of an organism, such as Bacillus subtilis, have been acquired during evolution in its perpetually changing ecological niches. Notably, approximately 47% of the approximately 4,100 genes of B. subtilis belong to paralogous gene families in which several members have overlapping functions. Thus, essential gene functions will outnumber essential genes. To answer the question to what extent the most recently acquired DNA contributes to the life of B. subtilis under standard laboratory growth conditions, we initiated a "reconstruction" of the B. subtilis genome by removing prophages and AT-rich islands. Stepwise deletion of two prophages (SPbeta, PBSX), three prophage-like regions, and the largest operon of B. subtilis (pks) resulted in a genome reduction of 7.7% and elimination of 332 genes. The resulting strain was phenotypically characterized by metabolic flux analysis, proteomics, and specific assays for protein secretion, competence development, sporulation, and cell motility. We show that genome engineering is a feasible strategy for functional analysis of large gene clusters, and that removal of dispensable genomic regions may pave the way toward an optimized Bacillus cell factory.

Journal ArticleDOI
TL;DR: The focus here is evolutionary dependence among codons that is associated with protein structure and could be applied to diverse cases of evolutionary dependence where surrogates for sequence fitness can be measured or modeled.
Abstract: Markovian models of protein evolution that relax the assumption of independent change among codons are considered With this comparatively realistic framework, an evolutionary rate at a site can depend both on the state of the site and on the states of surrounding sites By allowing a relatively general dependence structure among sites, models of evolution can reflect attributes of tertiary structure To quantify the impact of protein structure on protein evolution, we analyze protein-coding DNA sequence pairs with an evolutionary model that incorporates effects of solvent accessibility and pairwise interactions among amino acid residues By explicitly considering the relationship between nonsynonymous substitution rates and protein structure, this approach can lead to refined detection and characterization of positive selection Analyses of simulated sequence pairs indicate that parameters in this evolutionary model can be well estimated Analyses of lysozyme c and annexin V sequence pairs yield the biologically reasonable result that amino acid replacement rates are higher when the replacements lead to energetically favorable proteins than when they destabilize the proteins Although the focus here is evolutionary dependence among codons that is associated with protein structure, the statistical approach is quite general and could be applied to diverse cases of evolutionary dependence where surrogates for sequence fitness can be measured or modeled

Journal ArticleDOI
TL;DR: It is argued that TEs in these families attain high population frequencies and even reach fixation as a result of low family-wide transposition rates leading to low TE copy numbers and consequently reduced strength of selection acting on individual TE copies.
Abstract: The Drosophila melanogaster genome contains approximately 100 distinct families of transposable elements (TEs). In the euchromatic part of the genome, each family is present in a small number of copies (5-150 copies), with individual copies of TEs often present at very low frequencies in populations. This pattern is likely to reflect a balance between the inflow of TEs by transposition and the removal of TEs by natural selection. The nature of natural selection acting against TEs remains controversial. We provide evidence that selection against chromosome abnormalities caused by ectopic recombination limits the spread of some TEs. We also demonstrate for the first time that some TE families in the Drosophila euchromatin appear to be only marginally affected by purifying selection and contain many copies at high population frequencies. We argue that TEs in these families attain high population frequencies and even reach fixation as a result of low family-wide transposition rates leading to low TE copy numbers and consequently reduced strength of selection acting on individual TE copies. Fixation of TEs in these families should provide an upward pressure on the size of intergenic sequences counterbalancing rapid DNA loss through small deletions. Copy-number-dependent selection on TE families caused by ectopic recombination may also promote diversity among TEs in the Drosophila genome.

Journal ArticleDOI
TL;DR: The potential of hemipteroid insects as a model system for studies of the evolution of animal mt genomes is discussed and some fundamental questions that may be addressed with this system are outlined.
Abstract: To help understand the mechanisms of gene rearrangement in the mitochondrial (mt) genomes of hemipteroid insects, we sequenced the mt genome of the plague thrips, Thrips imaginis (Thysanoptera). This genome is circular, 15,407 bp long, and has many unusual features, including (1) rRNA genes inverted and distant from one another, (2) an extra gene for tRNA-Ser, (3) a tRNA-Val lacking a D-arm, (4) two pseudo-tRNA genes, (5) duplicate control regions, and (6) translocations and/or inversions of 24 of the 37 genes. The mechanism of rRNA gene transcription in T. imaginis may be different from that of other arthropods since the two rRNA genes have inverted and are distant from one another. Further, the rRNA genes are not adjacent or even close to either of the two control regions. Tandem duplication and deletion is a plausible model for the evolution of duplicate control regions and for the gene translocations, but intramitochondrial recombination may account for the gene inversions in T. imaginis. All the 18 genes between control regions #1 and #2 have translocated and/or inverted, whereas only six of the 20 genes outside this region have translocated and/or inverted. Moreover, the extra tRNA gene and the two pseudo-tRNA genes are either in this region or immediately adjacent to one of the control regions. These observations suggest that tandem duplication and deletion may be facilitated by the duplicate control regions and may have occurred a number of times in the lineage leading to T. imaginis. T. imaginis shares two novel gene boundaries with a lepidopsocid species from another order of hemipteroid insects, the Psocoptera. The evidence available suggests that these shared gene boundaries evolved by convergence and thus are not informative for the interordinal phylogeny of hemipteroid insects. We discuss the potential of hemipteroid insects as a model system for studies of the evolution of animal mt genomes and outline some fundamental questions that may be addressed with this system.

Journal ArticleDOI
TL;DR: This study demonstrates viral genetic turnover within a focal population and the potential importance of adaptive evolution in viral epidemic expansion and the role of viral molecular evolution in emergent disease dynamics.
Abstract: In the last four decades the incidence of dengue fever has increased 30-fold worldwide, and over half the world's population is now threatened with infection from one or more of four co-circulating viral serotypes (DEN-1 through DEN-4). To determine the role of viral molecular evolution in emergent disease dynamics, we sequenced 40% of the genome of 82 DEN-4 isolates collected from Puerto Rico over the 20 years since the onset of endemic dengue on the island. Isolates were derived from years with varying levels of DEN-4 prevalence. Over our sampling period there were marked evolutionary shifts in DEN-4 viral populations circulating in Puerto Rico; viral lineages were temporally clustered and the most common genotype at a particular sampling time often arose from a previously rare lineage. Expressed changes in structural genes did not appear to drive this lineage turnover, even though these regions include primary determinants of viral antigenic properties. Instead, recent dengue evolution can be attributed in part to positive selection on the nonstructural gene 2A (NS2A), whose functions may include replication efficiency and antigenicity. During the latest and most severe DEN-4 epidemic in Puerto Rico, in 1998, viruses were distinguished by three amino acid changes in NS2A that were fixed far faster than expected by drift alone. Our study therefore demonstrates viral genetic turnover within a focal population and the potential importance of adaptive evolution in viral epidemic expansion.

Journal ArticleDOI
TL;DR: The inclusion of more strains into the existing phylogenetic trees of the Alexandrium tamarense species complex from large subunit rDNA sequences has confirmed that geographic distribution is consistent with the molecular clades but not with the three morphologically defined species that constitute the complex.
Abstract: The cosmopolitan dinoflagellate genus Alexandrium, and especially the A. tamarense species complex, contain both toxic and nontoxic strains. An understanding of their evolution and paleogeography is a necessary precursor to unraveling the development and spread of toxic forms. The inclusion of more strains into the existing phylogenetic trees of the Alexandrium tamarense species complex from large subunit rDNA sequences has confirmed that geographic distribution is consistent with the molecular clades but not with the three morphologically defined species that constitute the complex. In addition, a new clade has been discovered, representing Mediterranean nontoxic strains. The dinoflagellates fossil record was used to calibrate a molecular clock: key dates used in this calibration are the origins of the Peridiniales (estimated at 190 MYA), Gonyaulacaceae (180 MYA), and Ceratiaceae (145 MYA). Based on the data set analyzed, the origin of the genus Alexandrium was estimated to be around late Cretaceous (77 MYA), with its earliest possible origination in the mid Cretaceous (119 MYA). The A. tamarense species complex potentially diverged around the early Neogene (23 MYA), with a possible first appearance in the late Paleogene (45 MYA). A paleobiogeographic scenario for Alexandrium is based on (1) the calculated possible ages of origination for the genus and its constituent groups; (2) paleogeographic events determined by plate movements, changing ocean configurations and currents, as well as climatic fluctuations; and (3) the present geographic distribution of the various clades of the Alexandrium tamarense species complex.

Journal ArticleDOI
TL;DR: The results support the idea that the stem group of modern cockroaches radiated sometime between the late Jurassic and early Cretaceous-not the Carboniferous, as has been suggested on the basis of roach-like fossils from this epoch.
Abstract: Bacteria of the genus Blattabacterium are intracellular symbionts that reside in specialized cells of cockroaches and the termite Mastotermes darwiniensis. They appear to be obligate mutualists, and are transmitted vertically in the eggs. Such characteristics are expected to lead to equivalent phylogenies for host and symbiont, and we tested this hypothesis using recently accumulated data on relationships among termites and cockroaches and their Blattabacterium spp. Host and symbiont topologies were found to be highly similar, and various tests indicated that they were not statistically different. A close relationship between endosymbionts from termites and members of the wood-feeding cockroach genus Cryptocercus was found, supporting the hypothesis that the former evolved from subsocial, wood-dwelling cockroaches. The majority of the Blattabacterium spp. sequences appear to have undergone similar rates of evolution since their divergence from a common ancestor, and an estimate of this rate was determined based on early Cretaceous host fossils. The results support the idea that the stem group of modern cockroaches radiated sometime between the late Jurassic and early Cretaceous-not the Carboniferous, as has been suggested on the basis of roach-like fossils from this epoch.