scispace - formally typeset
Search or ask a question

Showing papers in "Molecular Biology and Evolution in 2007"


Journal ArticleDOI
TL;DR: Version 4 of MEGA software expands on the existing facilities for editing DNA sequence data from autosequencers, mining Web-databases, performing automatic and manual sequence alignment, analyzing sequence alignments to estimate evolutionary distances, inferring phylogenetic trees, and testing evolutionary hypotheses.
Abstract: We announce the release of the fourth version of MEGA software, which expands on the existing facilities for editing DNA sequence data from autosequencers, mining Web-databases, performing automatic and manual sequence alignment, analyzing sequence alignments to estimate evolutionary distances, inferring phylogenetic trees, and testing evolutionary hypotheses. Version 4 includes a unique facility to generate captions, written in figure legend format, in order to provide natural language descriptions of the models and methods used in the analyses. This facility aims to promote a better understanding of the underlying assumptions used in analyses, and of the results generated. Another new feature is the Maximum Composite Likelihood (MCL) method for estimating evolutionary distances between all pairs of sequences simultaneously, with and without incorporating rate variation among sites and substitution pattern heterogeneities among lineages. This MCL method also can be used to estimate transition/transversion bias and nucleotide substitution pattern without knowledge of the phylogenetic tree. This new version is a native 32-bit Windows application with multi-threading and multi-user supports, and it is also available to run in a Linux desktop environment (via the Wine compatibility layer) and on Intel-based Macintosh computers under the Parallels program. The current version of MEGA is available free of charge at (http://www.megasoftware.net).

29,021 citations


Journal ArticleDOI
TL;DR: PAML, currently in version 4, is a package of programs for phylogenetic analyses of DNA and protein sequences using maximum likelihood (ML), which can be used to estimate parameters in models of sequence evolution and to test interesting biological hypotheses.
Abstract: PAML, currently in version 4, is a package of programs for phylogenetic analyses of DNA and protein sequences using maximum likelihood (ML). The programs may be used to compare and test phylogenetic trees, but their main strengths lie in the rich repertoire of evolutionary models implemented, which can be used to estimate parameters in models of sequence evolution and to test interesting biological hypotheses. Uses of the programs include estimation of synonymous and nonsynonymous rates (d(N) and d(S)) between two protein-coding DNA sequences, inference of positive Darwinian selection through phylogenetic comparison of protein-coding genes, reconstruction of ancestral genes and proteins for molecular restoration studies of extinct life forms, combined analysis of heterogeneous data sets from multiple gene loci, and estimation of species divergence times incorporating uncertainties in fossil calibrations. This note discusses some of the major applications of the package, which includes example data sets to demonstrate their use. The package is written in ANSI C, and runs under Windows, Mac OSX, and UNIX systems. It is available at -- (http://abacus.gene.ucl.ac.uk/software/paml.html).

10,773 citations


Journal ArticleDOI
TL;DR: F(ST) estimation from corrected genotype frequencies performed well when restricted to visible allele sizes, and the use of the genetic distance of Cavalli-Sforza and Edwards (1967) corrected by the conventional method gave better estimates than those obtained without correction.
Abstract: Microsatellite null alleles are commonly encountered in population genetics studies, yet little is known about their impact on the estimation of population differentiation. Computer simulations based on the coalescent were used to investigate the evolutionary dynamics of null alleles, their impact on F(ST) and genetic distances, and the efficiency of estimators of null allele frequency. Further, we explored how the existing method for correcting genotype data for null alleles performed in estimating F(ST) and genetic distances, and we compared this method with a new method proposed here (for F(ST) only). Null alleles were likely to be encountered in populations with a large effective size, with an unusually high mutation rate in the flanking regions, and that have diverged from the population from which the cloned allele state was drawn and the primers designed. When populations were significantly differentiated, F(ST) and genetic distances were overestimated in the presence of null alleles. Frequency of null alleles was estimated precisely with the algorithm presented in Dempster et al. (1977). The conventional method for correcting genotype data for null alleles did not provide an accurate estimate of F(ST) and genetic distances. However, the use of the genetic distance of Cavalli-Sforza and Edwards (1967) corrected by the conventional method gave better estimates than those obtained without correction. F(ST) estimation from corrected genotype frequencies performed well when restricted to visible allele sizes. Both the proposed method and the traditional correction method have been implemented in a program that is available free of charge at http://www.montpellier.inra.fr/URLB/. We used 2 published microsatellite data sets based on original and redesigned pairs of primers to empirically confirm our simulation results.

2,470 citations


Journal ArticleDOI
TL;DR: This work reimplemented most of the already existing models, including the popular lognormal model, as well as various prior choices for divergence times (birth-death, Dirichlet, uniform), in a common Bayesian statistical framework, and proposes a new autocorrelated model, called the "CIR" process, with well-defined stationary properties.
Abstract: Several models have been proposed to relax the molecular clock in order to estimate divergence times. However, it is unclear which model has the best fit to real data and should therefore be used to perform molecular dating. In particular, we do not know whether rate autocorrelation should be considered or which prior on divergence times should be used. In this work, we propose a general bench mark of alternative relaxed clock models. We have reimplemented most of the already existing models, including the popular lognormal model, as well as various prior choices for divergence times (birth-death, Dirichlet, uniform), in a common Bayesian statistical framework. We also propose a new autocorrelated model, called the "CIR" process, with well-defined stationary properties. We assess the relative fitness of these models and priors, when applied to 3 different protein data sets from eukaryotes, vertebrates, and mammals, by computing Bayes factors using a numerical method called thermodynamic integration. We find that the 2 autocorrelated models, CIR and lognormal, have a similar fit and clearly outperform uncorrelated models on all 3 data sets. In contrast, the optimal choice for the divergence time prior is more dependent on the data investigated. Altogether, our results provide useful guidelines for model choice in the field of molecular dating while opening the way to more extensive model comparisons.

457 citations


Journal ArticleDOI
TL;DR: It is proposed that natural selection tends to decrease the mitochondrial mutation rate in long-lived species, in agreement with the mitochondrial theory of aging.
Abstract: Mitochondrial DNA (mtDNA) is the most popular marker of molecular diversity in animals, primarily because of its elevated mutation rate. After >20 years of intensive usage, the extent of mitochondrial evolutionary rate variations across species, their practical consequences on sequence analysis methods, and the ultimate reasons for mtDNA hypermutability are still largely unresolved issues. Using an extensive cytochrome b data set, fossil data, and taking advantage of the decoupled dynamics of synonymous and nonsynonymous substitutions, we measure the lineage-specific mitochondrial mutation rate across 1,696 mammalian species and compare it with the nuclear rate. We report an unexpected 2 orders of magnitude mitochondrial mutation rate variation between lineages: cytochrome b third codon positions are renewed every 1-2 Myr, in average, in the fastest evolving mammals, whereas it takes >100 Myr in slow-evolving lineages. This result has obvious implications in the fields of molecular phylogeny, molecular dating, and population genetics. Variations of mitochondrial substitution rate across species are partly explained by body mass, longevity, and age of female sexual maturity. The classical metabolic rate and generation time hypothesis, however, do not fully explain the observed patterns, especially a stronger effect of longevity in long-lived than in short-lived species. We propose that natural selection tends to decrease the mitochondrial mutation rate in long-lived species, in agreement with the mitochondrial theory of aging.

416 citations


Journal ArticleDOI
TL;DR: Analyzing Rho mRNA expression patterns in mouse tissues shows that recent subfamilies have tissue-specific and low-level expression that supports their implication only in narrow time windows or in differentiated metabolic functions, and provides guides for future structure and evolution studies of other components of Rho signaling pathways, in particular regulators of the RhoGEF family.
Abstract: GTPases of the Rho family are molecular switches that play important roles in converting and amplifying external signals into cellular effects Originally demonstrated to control the dynamics of the F-actin cytoskeleton, Rho GTPases have been implicated in many basic cellular processes that influence cell proliferation, differentiation, motility, adhesion, survival, or secretion To elucidate the evolutionary history of the Rho family, we have analyzed over 20 species covering major eukaryotic clades from unicellular organisms to mammals, including platypus and opossum, and have reconstructed the ontogeny and the chronology of emergence of the different subfamilies Our data establish that the 20 mammalian Rho members are structured into 8 subfamilies, among which Rac is the founder of the whole family Rho, Cdc42, RhoUV, and RhoBTB subfamilies appeared before Coelomates and RhoJQ, Cdc42 isoforms, RhoDF, and Rnd emerged in chordates In vertebrates, gene duplications and retrotranspositions increased the size of each chordate Rho subfamily, whereas RhoH, the last subfamily, arose probably by horizontal gene transfer Rac1b, a Rac1 isoform generated by alternative splicing, emerged in amniotes, and RhoD, only in therians Analysis of Rho mRNA expression patterns in mouse tissues shows that recent subfamilies have tissue-specific and low-level expression that supports their implication only in narrow time windows or in differentiated metabolic functions These findings give a comprehensive view of the evolutionary canvas of the Rho family and provide guides for future structure and evolution studies of other components of Rho signaling pathways, in particular regulators of the RhoGEF family

415 citations


Journal ArticleDOI
TL;DR: Consistent with the dramatic reduction in nucleotide diversity, a severe domestication bottleneck is detected and the sequence diversity currently found in the rice genome could be explained by a founding population of 1,500 individuals if the initial domestication event occurred over a 3,000-year period.
Abstract: Varying degrees of reduction of genetic diversity in crops relative to their wild progenitors occurred during the process of domestication. Such information, however, has not been available for the Asian cultivated rice (Oryza sativa) despite its importance as a staple food and a model organism. To reveal levels and patterns of nucleotide diversity and to elucidate the genetic relationship and demographic history of O. sativa and its close relatives (Oryza rufipogon and Oryza nivara), we investigated nucleotide diversity data from 10 unlinked nuclear loci in species-wide samples of these species. The results indicated that O. rufipogon and O. nivara possessed comparable levels of nucleotide variation ((sil) = 0.0077 approximately 0.0095) compared with the relatives of other crops. In contrast, nucleotide diversity of O. sativa was as low as (sil) = 0.0024 and even lower ((sil) = 0.0021 for indica and 0.0011 for japonica), if we consider the 2 subspecies separately. Overall, only 20-10% of the diversity in the wild species was retained in 2 subspecies of the cultivated rice (indica and japonica), respectively. Because statistic tests did not reject the assumption of neutrality for all 10 loci, we further used coalescent to simulate bottlenecks under various lengths and population sizes to better understand the domestication process. Consistent with the dramatic reduction in nucleotide diversity, we detected a severe domestication bottleneck and demonstrated that the sequence diversity currently found in the rice genome could be explained by a founding population of 1,500 individuals if the initial domestication event occurred over a 3,000-year period. Phylogenetic analyses revealed close genetic relationships and ambiguous species boundary of O. rufipogon and O. nivara, providing additional evidence to treat them as 2 ecotypes of a single species. Lowest linkage disequilibrium (LD) was found in the perennial O. rufipogon where the r(2) value dropped to a negligible level within 400 bp, and the highest in the japonica rice where LD extended to the entirely sequenced region ( approximately 900 bp), implying that LD mapping by genome scans may not be feasible in wild rice due to the high density of markers needed.

351 citations


Journal ArticleDOI
TL;DR: Whether some of the genes departed from the empirical distribution of most loci are investigated, suggesting that they might have been selected during domestication or breeding, and a departure from the null model of demographic bottleneck for the hypothetical gene HgA is detected.
Abstract: Several demographic and selective events occurred during the domestication of wheat from the allotetraploid wild emmer (Triticum turgidum ssp. dicoccoides). Cultivated wheat has since been affected by other historical events. We analyzed nucleotide diversity at 21 loci in a sample of 10 1 individuals representing 4 taxa corresponding to representative steps in the recent evolution of wheat (wild, domesticated, cultivated durum, and bread wheats) to unravel the evolutionary history of cultivated wheats and to quantify its impact on genetic diversity. Sequence relationships are consistent with a single domestication event and identify 2 genetically different groups of bread wheat. The wild group is not highly polymorphic, with only 212 polymorphic sites among the 21,720 bp sequenced, and, during domestication, diversity was further reduced in cultivated forms-by 69% in bread wheat and 84% in durum wheat-with considerable differences between loci, some retaining no polymorphism at all. Coalescent simulations were performed and compared with our data to estimate the intensity of the bottlenecks associated with domestication and subsequent selection. Based oil our 2 1 locus analysis, the average intensity of domestication bottleneck was estimated at about 3-giving a population size for the domesticated form about one third that of wild dicoccoides. The most severe bottleneck, with an intensity of about 6, occurred in the evolution of durum wheat. We investigated whether some of the genes departed from the empirical distribution of most loci, suggesting that they might have been selected during domestication or breeding. We detected a departure from the null model of demographic bottleneck for the hypothetical gene HgA. However, the atypical pattern of polymorphism at this locus might reveal selection on the linked locus Gsp/A, which may affect grain softness-an important trait for end-use quality in wheat.

341 citations


Journal ArticleDOI
TL;DR: The transcriptional repressive function of insect CRY2 descended from a light-sensitive photolyase-like ancestral gene, probably lacking the ability to repress CLOCK:CYCLE-mediated transcription, providing an evolutionary context for proposing novel circadian clock mechanisms in insects.
Abstract: Cryptochrome (CRY) proteins are components of the central circadian clockwork of metazoans. Phylogenetic analyses show at least 2 rounds of gene duplication at the base of the metazoan radiation, as well as several losses, gave rise to 2 cryptochrome (cry) gene families in insects, a Drosophila-like cry1 gene family and a vertebrate-like cry2 family. Previous studies have shown that insect CRY1 is photosensitive, whereas photo-insensitive CRY2 functions to potently inhibit clock-relevant CLOCK:CYCLE-mediated transcription. Here, we extended the transcriptional repressive function of insect CRY2 to 2 orders--Hymenoptera (the honeybee Apis mellifera and the bumblebee Bombus impatiens) and Coleoptera (the red flour beetle Tribolium castaneum). Importantly, the bee and beetle CRY2 proteins are not light sensitive in culture, in either degradation of protein levels or inhibitory transcriptional response, suggesting novel light input pathways into their circadian clocks as Apis and Tribolium do not have CRY1. By mapping the functional data onto a cryptochrome/6-4 photolyase gene tree, we find that the transcriptional repressive function of insect CRY2 descended from a light-sensitive photolyase-like ancestral gene, probably lacking the ability to repress CLOCK:CYCLE-mediated transcription. These data provide an evolutionary context for proposing novel circadian clock mechanisms in insects.

328 citations


Journal ArticleDOI
TL;DR: Accumulation of genetic diversity within the radiating lineages of the African Lakes Malawi, Victoria and Barombi Mbo, and Palaeolake Makgadikgadi began around or after the time of lake basin formation, suggesting lakes may have captured preexisting cichlid diversity from multiple sources from which adaptive radiations have evolved.
Abstract: Timing divergence events allow us to infer the conditions under which biodiversity has evolved and gain important insights into the mechanisms driving evolution. Cichlid fishes are a model system for studying speciation and adaptive radiation, yet, we have lacked reliable timescales for their evolution. Phylogenetic reconstructions are consistent with cichlid origins prior to Gondwanan landmass fragmentation 121-165 MYA, considerably earlier than the first known fossil cichlids (Eocene). We examined the timing of cichlid evolution using a relaxed molecular clock calibrated with geological estimates for the ages of 1) Gondwanan fragmentation and 2) cichlid fossils. Timescales of cichlid evolution derived from fossil-dated phylogenies of other bony fishes most closely matched those suggested by Gondwanan breakup calibrations, suggesting the Eocene origins and marine dispersal implied by the cichlid fossil record may be due to its incompleteness. Using Gondwanan calibrations, we found accumulation of genetic diversity within the radiating lineages of the African Lakes Malawi, Victoria and Barombi Mbo, and Palaeolake Makgadikgadi began around or after the time of lake basin formation. These calibrations also suggest Lake Tanganyika was colonized independently by the major radiating cichlid tribes that then began to accumulate genetic diversity thereafter. These results contrast with the widely accepted theory that diversification into major lineages took place within the Tanganyika basin. Together, this evidence suggests that ancient lake habitats have played a key role in generating and maintaining diversity within radiating lineages and also that lakes may have captured preexisting cichlid diversity from multiple sources from which adaptive radiations have evolved.

304 citations


Journal ArticleDOI
TL;DR: Evidence is offered that the genetic architecture of ecological speciation is associated with signatures of selection in nature, providing strong support for the hypothesis that divergent natural selection is currently maintaining adaptive differentiation and promoting ecological Speciation in lake whitefish species pairs.
Abstract: Adaptive evolutionary change is contingent on variation and selection; thus, understanding adaptive divergence and ultimately speciation requires information on both the genetic basis of adaptive traits as well as an understanding of the role of divergent natural selection on those traits. The lake whitefish (Coregonus clupeaformis) consists of several sympatric "dwarf" (limnetic) and normal (benthic) species pairs that co-inhabit northern postglacial lakes. These young species pairs have evolved independently and display parallelism in life history, behavioral, and morphological divergence associated with the use of distinct trophic resources. We identified phenotype-environment associations and determined the genetic architecture and the role of selection modulating population genetic divergence in sympatric dwarf and normal lake whitefish. The genetic architecture of 9 adaptive traits was analyzed in 2 hybrid backcrosses individually phenotyped throughout their life history. Significant quantitative trait loci (QTL) were associated with swimming behavior (habitat selection and predator avoidance), growth rate, morphology (condition factor and gill rakers), and life history (onset of maturity and fecundity). Genome scans among 4 natural sympatric pairs, using loci segregating in the map, revealed a signature of selection for 24 loci. Loci exhibiting a signature of selection were associated with QTL relative to other regions of the genome more often than expected by chance alone. Two parallel QTL outliers for growth and condition factor exhibited segregation distortion in both mapping families, supporting the hypothesis that adaptive divergence contributing to parallel reductions of gene flow among natural populations may cause genetic incompatibilities. Overall, these findings offer evidence that the genetic architecture of ecological speciation is associated with signatures of selection in nature, providing strong support for the hypothesis that divergent natural selection is currently maintaining adaptive differentiation and promoting ecological speciation in lake whitefish species pairs.

Journal ArticleDOI
TL;DR: It is proposed that a large and diverse human population has persisted in eastern Africa and that eastern Africa may have been an ancient source of dispersion of modern humans both within and outside of Africa.
Abstract: Studies of human mitochondrial (mt) DNA genomes demonstrate that the root of the human phylogenetic tree occurs in Africa. Although 2 mtDNA lineages with an African origin (haplogroups M and N) were the progenitors of all non-African haplogroups, macrohaplogroup L (including haplogroups L0-L6) is limited to sub-Saharan Africa. Several L haplogroup lineages occur most frequently in eastern Africa (e.g., L0a, L0f, L5, and L3g), but some are specific to certain ethnic groups, such as haplogroup lineages L0d and L0k that previously have been found nearly exclusively among southern African "click" speakers. Few studies have included multiple mtDNA genome samples belonging to haplogroups that occur in eastern and southern Africa but are rare or absent elsewhere. This lack of sampling in eastern Africa makes it difficult to infer relationships among mtDNA haplogroups or to examine events that occurred early in human history. We sequenced 62 complete mtDNA genomes of ethnically diverse Tanzanians, southern African Khoisan speakers, and Bakola Pygmies and compared them with a global pool of 226 mtDNA genomes. From these, we infer phylogenetic relationships amongst mtDNA haplogroups and estimate the time to most recent common ancestor (TMRCA) for haplogroup lineages. These data suggest that Tanzanians have high genetic diversity and possess ancient mtDNA haplogroups, some of which are either rare (L0d and L5) or absent (L0f) in other regions of Africa. We propose that a large and diverse human population has persisted in eastern Africa and that eastern Africa may have been an ancient source of dispersion of modern humans both within and outside of Africa.

Journal ArticleDOI
TL;DR: In this paper, the authors focused on E. coli B2 phylogenetic group strains that encompass both commensal and pathogenic (extra-and intraintestinal) strains and quantified extraintestinal virulence using a mouse model of septicemia.
Abstract: The selective pressures leading to the evolution and maintenance of virulence in the case of facultative pathogens are quite unclear. For example, Escherichia coli, a commensal of the gut of warm-blooded animals and humans, can cause severe extraintestinal diseases, such as septicemia and meningitis, which represent evolutionary dead ends for the pathogen as they are associated to rapid host death and poor interhost transmission. Such infectious process has been linked to the presence of so-called "virulence genes." To understand the evolutionary forces that select and maintain these genes, we focused our study on E. coli B2 phylogenetic group strains that encompass both commensal and pathogenic (extra- and intraintestinal) strains. Multilocus sequence typing (MLST), comparative genomic hybridization of the B2 flexible gene pool, and quantification of extraintestinal virulence using a mouse model of septicemia were performed on a panel of 60 B2 strains chosen for their genetic and ecologic diversity. The phylogenetic history of the strains reconstructed from the MLST data indicates the emergence of at least 9 subgroups of strains. A high polymorphism is observed in the B2 flexible gene pool among the strains with a good correlation between the MLST-inferred phylogenetic history of the strains and the presence/absence of specific genomic regions, indicating coevolution between the chromosomal background and the flexible gene pool. Virulence in the mouse model is a highly prevalent and widespread character present in all subgroups except one. Association studies reveal that extraintestinal virulence is a multigenic process with a common set of "virulence determinants" encompassing genes involved in transcriptional regulation, iron metabolism, adhesion, lipopolysaccharide (LPS) biosynthesis, and the recently reported peptide polyketide hybrid synthesis system. Interestingly, these determinants can also be viewed as intestinal colonization and survival factors linked to commensalism as they can increase the fitness of the strains within the normal gut environment. Altogether, these data argue for an ancestral emergence of the extraintestinal virulence character that is a coincidental by-product of commensalism. Furthermore, the phenotypic and genotypic markers identified in this work will allow further epidemiological studies devoted to test the niche specialization hypothesis for the B2 phylogenetic subgroups.

Journal ArticleDOI
TL;DR: Maximum likelihood and Bayesian analyses of phylogenomics with expressed sequence tag data from the ecologically important coccolithophore-forming alga Emiliania huxleyi and the plastid-lacking cryptophyte Goniomonas cf.
Abstract: Here we use phylogenomics with expressed sequence tag (EST) data from the ecologically important coccolithophore-forming alga Emiliania huxleyi and the plastid-lacking cryptophyte Goniomonas cf. pacifica to establish their phylogenetic positions in the eukaryotic tree. Haptophytes and cryptophytes are members of the putative eukaryotic supergroup Chromalveolata (chromists [cryptophytes, haptophytes, stramenopiles] and alveolates [apicomplexans, ciliates, and dinoflagellates]). The chromalveolates are postulated to be monophyletic on the basis of plastid pigmentation in photosynthetic members, plastid gene and genome relationships, nuclear "host" phylogenies of some chromalveolate lineages, unique gene duplication and replacements shared by these taxa, and the evolutionary history of components of the plastid import and translocation systems. However the phylogenetic position of cryptophytes and haptophytes and the monophyly of chromalveolates as a whole remain to be substantiated. Here we assess chromalveolate monophyly using a multigene dataset of nuclear genes that includes members of all 6 eukaryotic supergroups. An automated phylogenomics pipeline followed by targeted database searches was used to assemble a 16-protein dataset (6,735 aa) from 46 taxa for tree inference. Maximum likelihood and Bayesian analyses of these data support the monophyly of haptophytes and cryptophytes. This relationship is consistent with a gene replacement via horizontal gene transfer of plastid-encoded rpl36 that is uniquely shared by these taxa. The haptophytes + cryptophytes are sister to a clade that includes all other chromalveolates and, surprisingly, two members of the Rhizaria, Reticulomyxa filosa and Bigelowiella natans. The association of the two Rhizaria with chromalveolates is supported by the approximately unbiased (AU)-test and when the fastest evolving amino acid sites are removed from the 16-protein alignment.

Journal ArticleDOI
TL;DR: New genetic data show that the Sandawe and southern African click speakers share rare mtDNA and Y chromosome haplogroups; however, common ancestry of the 2 populations dates back >35,000 years, which suggests that at the time of the spread of agriculture and pastoralism, the click-speaking populations were already isolated from one another.
Abstract: Little is known about the history of click-speaking populations in Africa. Prior genetic studies revealed that the click-speaking Hadza of eastern Africa are as distantly related to click speakers of southern Africa as are most other African populations. The Sandawe, who currently live within 150 km of the Hadza, are the only other population in eastern Africa whose language has been classified as part of the Khoisan language family. Linguists disagree on whether there is any detectable relationship between the Hadza and Sandawe click languages. We characterized both mtDNA and Y chromosome variation of the Sandawe, Hadza, and neighboring Tanzanian populations. New genetic data show that the Sandawe and southern African click speakers share rare mtDNA and Y chromosome haplogroups; however, common ancestry of the 2 populations dates back >35,000 years. These data also indicate that common ancestry of the Hadza and Sandawe populations dates back >15,000 years. These findings suggest that at the time of the spread of agriculture and pastoralism, the click-speaking populations were already isolated from one another and are consistent with relatively deep linguistic divergence among the respective click languages.

Journal ArticleDOI
TL;DR: The chloroplast (cp) DNA sequence of Jasminum nudiflorum (Oleaceae-Jasmineae) is completed and compared with the large single-copy region sequences from 6 related species, finding that their genome organization is surprisingly similar despite the distant relationship of these 2 angiosperm families.
Abstract: The chloroplast (cp) DNA sequence of Jasminum nudiflorum (Oleaceae-Jasmineae) is completed and compared with the large single-copy region sequences from 6 related species. The cp genomes of the tribe Jasmineae (Jasminum and Menodora) show several distinctive rearrangements, including inversions, gene duplications, insertions, inverted repeat expansions, and gene and intron losses. The ycf4-psaI region in Jasminum section Primulina was relocated as a result of 2 overlapping inversions of 21,169 and 18,414 bp. The 1st, larger inversion is shared by all members of the Jasmineae indicating that it occurred in the common ancestor of the tribe. Similar rearrangements were also identified in the cp genome of Menodora. In this case, 2 fragments including ycf4 and rps4-trnS-ycf3 genes were moved by 2 additional inversions of 14 and 59 kb that are unique to Menodora. Other rearrangements in the Oleaceae are confined to certain regions of the Jasminum and Menodora cp genomes, including the presence of highly repeated sequences and duplications of coding and noncoding sequences that are inserted into clpP and between rbcL and psaI. These insertions are correlated with the loss of 2 introns in clpP and a serial loss of segments of accD. The loss of the accD gene and clpP introns in both the monocot family Poaceae and the eudicot family Oleaceae are clearly independent evolutionary events. However, their genome organization is surprisingly similar despite the distant relationship of these 2 angiosperm families.

Journal ArticleDOI
TL;DR: Within many of the duplicated pairs, 1 gene is expressed at a higher level across all assayed conditions, which suggests that the subfunctionalization model for duplicate gene preservation provides, at best, only a partial explanation for the patterns of expression divergence between duplicated genes.
Abstract: New genes may arise through tandem duplication, dispersed small-scale duplication, and polyploidy, and patterns of divergence between duplicated genes may vary among these classes. We have examined patterns of gene expression and coding sequence divergence between duplicated genes in Arabidopsis thaliana. Due to the simultaneous origin of polyploidy-derived gene pairs, we can compare covariation in the rates of expression divergence and sequence divergence within this group. Among tandem and dispersed duplicates, much of the divergence in expression profile appears to occur at or shortly after duplication. Contrary to findings from other eukaryotic systems, there is little relationship between expression divergence and synonymous substitutions, whereas there is a strong positive relationship between expression divergence and nonsynonymous substitutions. Because this pattern is pronounced among the polyploidy-derived pairs, we infer that the strength of purifying selection acting on protein sequence and expression pattern is correlated. The polyploidy-derived pairs are somewhat atypical in that they have broader expression patterns and are expressed at higher levels, suggesting differences among polyploidy- and nonpolyploidy-derived duplicates in the types of genes that revert to single copy. Finally, within many of the duplicated pairs, 1 gene is expressed at a higher level across all assayed conditions, which suggests that the subfunctionalization model for duplicate gene preservation provides, at best, only a partial explanation for the patterns of expression divergence between duplicated genes.

Journal ArticleDOI
TL;DR: The surfing effect can lead to deleterious mutations reaching high densities at an expanding front, even when they have substantial negative effects on fitness, and is suggested to have important consequences for rates of spread and the evolution of spatially expanding populations.
Abstract: There is an increasing recognition that evolutionary processes play a key role in determining the dynamics of range expansion. Recent work demonstrates that neutral mutations arising near the edge of a range expansion sometimes surf on the expanding front leading them rather than that leads to reach much greater spatial distribution and frequency than expected in stationary populations. Here, we extend this work and examine the surfing behavior of nonneutral mutations. Using an individual-based coupled-map lattice model, we confirm that, regardless of its fitness effects, the probability of survival of a new mutation depends strongly upon where it arises in relation to the expanding wave front. We demonstrate that the surfing effect can lead to deleterious mutations reaching high densities at an expanding front, even when they have substantial negative effects on fitness. Additionally, we highlight that this surfing phenomenon can occur for mutations that impact reproductive rate (i.e., number of offspring produced) as well as mutations that modify juvenile competitive ability. We suggest that these effects are likely to have important consequences for rates of spread and the evolution of spatially expanding populations.

Journal ArticleDOI
TL;DR: Reannotating ISs in 262 prokaryotic genomes shows evidence that IS numbers are controlled by the frequency of highly deleterious insertion targets, and concludes that selection for rapid replication cannot account for the few ISs found in small genomes.
Abstract: Insertion sequences (ISs) are the smallest and most frequent transposable elements in prokaryotes where they play an important evolutionary role by promoting gene inactivation and genome plasticity. Their genomic abundance varies by several orders of magnitude for reasons largely unknown and widely speculated. The current availability of hundreds of genomes renders testable many of these hypotheses, notably that IS abundance correlates positively with the frequency of horizontal gene transfer (HGT), genome size, pathogenicity, nonobligatory ecological associations, and human association. We thus reannotated ISs in 262 prokaryotic genomes and tested these hypotheses showing that when using appropriate controls, there is no empirical basis for IS family specificity, pathogenicity, or human association to influence IS abundance or density. HGT seems necessary for the presence of ISs, but cannot alone explain the absence of ISs in more than 20% of the organisms, some of which showing high rates of HGT. Gene transfer is also not a significant determinant of the abundance of IS elements in genomes, suggesting that IS abundance is controlled at the level of transposition and ensuing natural selection and not at the level of infection. Prokaryotes engaging in obligatory associations have fewer ISs when controlled for genome size, but this may be caused by some being sexually isolated. Surprisingly, genome size is the only significant predictor of IS numbers and density. Alone, it explains over 40% of the variance of IS abundance. Because we find that genome size and IS abundance correlate negatively with minimal doubling times, we conclude that selection for rapid replication cannot account for the few ISs found in small genomes. Instead, we show evidence that IS numbers are controlled by the frequency of highly deleterious insertion targets. Indeed, IS abundance increases quickly with genome size, which is the exact inverse trend found for the density of genes under strong selection such as essential genes. Hence, for ISs, the bigger the genome the better.

Journal ArticleDOI
TL;DR: The heads-or-tails (HoT) methodology can be easily implemented for any choice of alignment method and for any subsequent analytical protocol and is demonstrated the utility of HoT for phylogenetic reconstruction for the case of 130 sequences belonging to the chemoreceptor superfamily in Drosophila melanogaster, and by analysis of the BaliBASE alignment database.
Abstract: The question of multiple sequence alignment quality has received much attention from developers of alignment methods. Less forthcoming, however, are practical measures for addressing alignment quality issues in real life settings. Here, we present a simple methodology to help identify and quantify the uncertainties in multiple sequence alignments and their effects on subsequent analyses. The proposed methodology is based upon the a priori expectation that sequence alignment results should be independent of the orientation of the input sequences. Thus, for totally unambiguous cases, reversing residue order prior to alignment should yield an exact reversed alignment of that obtained by using the unreversed sequences. Such "ideal" alignments, however, are the exception in real life settings, and the two alignments, which we term the heads and tails alignments, are usually different to a greater or lesser degree. The degree of agreement or discrepancy between these two alignments may be used to assess the reliability of the sequence alignment. Furthermore, any alignment dependent sequence analysis protocol can be carried out separately for each of the two alignments, and the two sets of results may be compared with each other, providing us with valuable information regarding the robustness of the whole analytical process. The heads-or-tails (HoT) methodology can be easily implemented for any choice of alignment method and for any subsequent analytical protocol. We demonstrate the utility of HoT for phylogenetic reconstruction for the case of 130 sequences belonging to the chemoreceptor superfamily in Drosophila melanogaster, and by analysis of the BaliBASE alignment database. Surprisingly, Neighbor-Joining methods of phylogenetic reconstruction turned out to be less affected by alignment errors than maximum likelihood and Bayesian methods.

Journal ArticleDOI
TL;DR: This work estimated the first empirical codon model (ECM) for likelihood-based phylogenetic analysis, and an assessment of its ability to describe protein evolution shows that it consistently outperforms comparable mechanistic codon models.
Abstract: In the past, 2 kinds of Markov models have been considered to describe protein sequence evolution. Codon-level models have been mechanistic with a small number of parameters designed to take into account features, such as transition-transversion bias, codon frequency bias, and synonymous-nonsynonymous amino acid substitution bias. Amino acid models have been empirical, attempting to summarize the replacement patterns observed in large quantities of data and not explicitly considering the distinct factors that shape protein evolution. We have estimated the first empirical codon model (ECM). Previous codon models assume that protein evolution proceeds only by successive single nucleotide substitutions, but our results indicate that model accuracy is significantly improved by incorporating instantaneous doublet and triplet changes. We also find that the affiliations between codons, the amino acid each encodes and the physicochemical properties of the amino acids are main factors driving the process of codon evolution. Neither multiple nucleotide changes nor the strong influence of the genetic code nor amino acids' physicochemical properties form a part of standard mechanistic models and their views of how codon evolution proceeds. We have implemented the ECM for likelihood-based phylogenetic analysis, and an assessment of its ability to describe protein evolution shows that it consistently outperforms comparable mechanistic codon models. We point out the biological interpretation of our ECM and possible consequences for studies of selection.

Journal ArticleDOI
TL;DR: The 69.2-kbp chloroplast genome of the model chlorarachniophyte Bigelowiella natans is described and concatenated gene phylogenies show a relationship between the B. natans plastid and the ulvophyte-trebouxiophyte
Abstract: Chlorarachniophytes are amoeboflagellate cercozoans that acquired a plastid by secondary endosymbiosis. Chlorarachniophytes are the last major group of algae for which there is no completely sequenced plastid genome. Here we describe the 69.2-kbp chloroplast genome of the model chlorarachniophyte Bigelowiella natans. The genome is highly reduced in size compared with plastids of other photosynthetic algae and is closer in size to genomes of several nonphotosynthetic plastids. Unlike nonphotosynthetic plastids, however, the B. natans chloroplast genome has not sustained a massive loss of genes, and it retains nearly all of the functional photosynthesis-related genes represented in the genomes of other green algae. Instead, the genome is highly compacted and gene dense. The genes are organized with a strong strand bias, and several unusual rearrangements and inversions also characterize the genome; notably, an inversion in the small-subunit rRNA gene, a translocation of 3 genes in the major ribosomal protein operon, and the fragmentation of the cluster encoding the large photosystem proteins PsaA and PsaB. The chloroplast endosymbiont is known to be a green alga, but its evolutionary origin and relationship to other primary and secondary green plastids has been much debated. A recent hypothesis proposes that the endosymbionts of chlorarachniophytes and euglenids share a common origin (the Cabozoa hypothesis). We inferred phylogenies using individual and concatenated gene sequences for all genes in the genome. Concatenated gene phylogenies show a relationship between the B. natans plastid and the ulvophyte-trebouxiophyte-chlorophyte clade of green algae to the exclusion of Euglena. The B. natans plastid is thus not closely related to that of Euglena, which suggests that plastids originated independently in these 2 groups and the Cabozoa hypothesis is false.

Journal ArticleDOI
TL;DR: Simulation and real data analysis suggest that the multiple test procedures are useful when multiple branches have to be tested on the same data set.
Abstract: Detection of positive Darwinian selection has become ever more important with the rapid growth of genomic data sets. Recent branch-site models of codon substitution account for variation of selective pressure over branches on the tree and across sites in the sequence and provide a means to detect short episodes of molecular adaptation affecting just a few sites. In likelihood ratio tests based on such models, the branches to be tested for positive selection have to be specified a priori. In the absence of a biological hypothesis to designate so-called foreground branches, one may test many branches, but a correction for multiple testing becomes necessary. In this paper, we employ computer simulation to evaluate the performance of 6 multiple test correction procedures when the branch-site models are used to test every branch on the phylogeny for positive selection. Four of the methods control the familywise error rates (FWERs), whereas the other 2 control the false discovery rate (FDR). We found that all correction procedures achieved acceptable FWER except for extremely divergent sequences and serious model violations, when the test may become unreliable. The power of the test to detect positive selection is influenced by the strength of selection and the sequence divergence, with the highest power observed at intermediate divergences. The 4 correction procedures that control the FWER had similar power. We recommend Rom's procedure for its slightly higher power, but the simple Bonferroni correction is useable as well. The 2 correction procedures that control the FDR had slightly more power and also higher FWER. We demonstrate the multiple test procedures by analyzing gene sequences from the extracellular domain of the cluster of differentiation 2 (CD2) gene from 10 mammalian species. Both our simulation and real data analysis suggest that the multiple test procedures are useful when multiple branches have to be tested on the same data set.

Journal ArticleDOI
TL;DR: It is proposed that the globally increasing frequency of adamantane resistance is more likely attributable to its interaction with fitness-enhancing mutations at other genomic sites rather than to direct drug selection pressure, which implies that adamantanes may not be useful for treatment and prophylaxis against influenza viruses in the long term.
Abstract: A dramatic rise in the frequency of resistance to adamantane drugs by influenza A (H3N2) viruses has occurred in recent years -- from approximately 2% to approximately 90% in multiple countries worldwide-and associated with a single S31N amino acid replacement in the viral matrix M2 protein. To explore the emergence and spread of these adamantane resistant viruses we performed a phylogenetic analysis of recently sampled complete A/H3N2 genome sequences. Strikingly, all adamantane resistant viruses belonged to a single lineage (the "N-lineage") characterized by 17 amino acid replacements across the viral genome. Further, our analysis revealed that the genesis of the N-lineage was due to a 4+4 segment reassortment event involving 2 distinct lineages of influenza A/H3N2 virus. A subsequent study of hemagglutinin HA1 sequences suggested that the N-lineage was circulating widely in Asia during 2005, and then dominated the Northern hemisphere 2005-2006 season in Japan and the USA. Given the infrequent use of adamantane drugs in many countries, as well as the decades of use in the US associated with little drug resistance, we propose that the globally increasing frequency of adamantane resistance is more likely attributable to its interaction with fitness-enhancing mutations at other genomic sites rather than to direct drug selection pressure. This implies that adamantanes may not be useful for treatment and prophylaxis against influenza viruses in the long term. More generally, these findings illustrate that drug selection pressure is not the sole factor determining the evolution and maintenance of drug resistance in human pathogens.

Journal ArticleDOI
TL;DR: It is indicated that DNA from sediments can still offer a rich source of information on past environments, provided that the risk from vertical migration can be controlled for and that physical remains of organisms or their ejecta need to have been incorporated in the sediments for their DNA to be detected.
Abstract: In recent years, several studies have reported the successful extraction of ancient DNA (aDNA) from both frozen and nonfrozen sediments (even in the absence of macrofossils) in order to obtain genetic “profiles” from past environments. One of the hazards associated with this approach, particularly in nonfrozen environments, is the potential for vertical migration of aDNA across strata. To assess the extent of this problem, we extracted aDNA from sediments up to 3300 years old at 2 cave sites in the North Island of New Zealand. These sites are ideal for this purpose as the presence or absence of DNA from nonindigenous fauna (such as sheep) in sediments deposited prior to European settlement can serve as an indicator of DNA movement. Additionally, these strata are well defined and dated. DNA from sheep was found in strata that also contained moa DNA, indicating that genetic material had migrated downwards. Quantitative polymerase chain reaction analyses demonstrated that the amount of sheep DNA decreased as the age of sediments increased. Our results suggest that sedimentary aDNA is unlikely to be deposited from wind-borne DNA and that physical remains of organisms or their ejecta need to have been incorporated in the sediments for their DNA to be detected. Our study indicates that DNA from sediments can still offer a rich source of information on past environments, provided that the risk from vertical migration can be controlled for.

Journal ArticleDOI
TL;DR: The authors' data confirm the distinctiveness of Miniopterus, and support previous recommendations to elevate these bats to full familial status, and estimate that they diverged from all other bat species approximately 49-38 MYA, which is comparable to most other bat families.
Abstract: The long-fingered bats (Miniopterus sp.) are among the most widely distributed mammals in the world. However, despite recent focus on the systematics of these bats, their taxonomic position has not been resolved. Traditionally, they are considered to be sole members of Miniopterinae, 1 of 5 subfamilies within the largest family of bats, the Vespertilionidae. However, this classification has increasingly been called into question. Miniopterines differ extensively from other vespertilionids in numerous aspects of morphology, embryology, immunology, and, most recently, genetics. Recent molecular studies have proposed that the miniopterines are sufficiently distinct from vespertilionids that Miniopterinae should be elevated to full familial status. However, controversy remains regarding the relationship of the putative family, Miniopteridae to existing Vespertilionidae and to the closely related free-tailed bats, the Molossidae. We report here the first conclusive analysis of the taxonomic position of Miniopterus relative to all other bat families. We generated one of the largest chiropteran data sets to date, incorporating ∼11 kb of sequence data from 16 nuclear genes, from representatives of all bat families and 2 Miniopterus species. Our data confirm the distinctiveness of Miniopterus, and we support previous recommendations to elevate these bats to full familial status. We estimate that they diverged from all other bat species approximately 49-38 MYA, which is comparable to most other bat families. Furthermore, we find very strong support from all phylogenetic methods for a sister group relationship between Miniopteridae and Vespertilionidae. The Molossidae diverged from these lineages approximately 54-43 MYA and form a sister group to the Miniopteridae-Vespertilionidae clade.

Journal ArticleDOI
TL;DR: Haplotypes at nuclear and chloroplast loci for approximately 70 Aegilops and Triticum lines reveal both B and G genomes of polyploid wheats as unique samples of A. speltoides haplotype diversity.
Abstract: The origin of modern wheats involved alloploidization among related genomes. To determine if Aegilops speltoides was the donor of the B and G genomes in AABB and AAGG tetraploids, we used a 3-tiered approach. Using 70 amplified fragment length polymorphism (AFLP) loci, we sampled molecular diversity among 480 wheat lines from their natural habitats encompassing all S genome Aegilops, the putative progenitors of wheat B and G genomes. Fifty-nine Aegilops representatives for S genome diversity were compared at 375 AFLP loci with diploid, tetraploid, and 11 nulli-tetrasomic Triticum aestivum wheat lines. B genome-specific markers allowed pinning the origin of the B genome to S chromosomes of A. speltoides, while excluding other lineages. The outbreeding nature of A. speltoides influences its molecular diversity and bears upon inferences of B and G genome origins. Haplotypes at nuclear and chloroplast loci ACC1, G6PDH, GPT, PGK1, Q, VRN1, and ndhF for approximately 70 Aegilops and Triticum lines (0.73 Mb sequenced) reveal both B and G genomes of polyploid wheats as unique samples of A. speltoides haplotype diversity. These have been sequestered by the AABB Triticum dicoccoides and AAGG Triticum araraticum lineages during their independent origins.

Journal ArticleDOI
TL;DR: The phylogeography of hg H in the populations of the Near East and the Caucasus is described, showing how most of the present-day Near Eastern-Caucasus area variants of hG H started to expand after the last glacial maximum (LGM) and presumably before the Holocene.
Abstract: More than a third of the European pool of human mitochondrial DNA (mtDNA) is fragmented into a number of subclades of haplogroup (hg) H, the most frequent hg throughout western Eurasia. Although there has been considerable recent progress in studying mitochondrial genome variation in Europe at the complete sequence resolution, little data of comparable resolution is so far available for regions like the Caucasus and the Near and Middle East-areas where most of European genetic lineages, including hg H, have likely emerged. This gap in our knowledge causes a serious hindrance for progress in understanding the demographic prehistory of Europe and western Eurasia in general. Here we describe the phylogeography of hg H in the populations of the Near East and the Caucasus. We have analyzed 545 samples of hg H at high resolution, including 15 novel complete mtDNA sequences. As in Europe, most of the present-day Near Eastern-Caucasus area variants of hg H started to expand after the last glacial maximum (LGM) and presumably before the Holocene. Yet importantly, several hg H subclades in Near East and Southern Caucasus region coalesce to the pre-LGM period. Furthermore, irrespective of their common origin, significant differences between the distribution of hg H sub-hgs in Europe and in the Near East and South Caucasus imply limited post-LGM maternal gene flow between these regions. In a contrast, the North Caucasus mitochondrial gene pool has received an influx of hg H variants, arriving from the Ponto-Caspian/East European area.

Journal ArticleDOI
TL;DR: A novel supertree-based phylogenetic signal-stripping method is used to recover supertrees of life based on phylogenies for up to 5,741 single gene families distributed across 185 genomes and rejects all but two of the current hypotheses for the origin of eukaryotes.
Abstract: Eukaryotes are traditionally considered to be one of the three natural divisions of the tree of life and the sister group of the Archaebacteria. However, eukaryotic genomes are replete with genes of eubacterial ancestry, and more than 20 mutually incompatible hypotheses have been proposed to account for eukaryote origins. Here we test the predictions of these hypotheses using a novel supertree-based phylogenetic signal-stripping method, and recover supertrees of life based on phylogenies for up to 5,741 single gene families distributed across 185 genomes. Using our signal-stripping method, we show that there are three distinct phylogenetic signals in eukaryotic genomes. In order of strength, these link eukaryotes with the Cyanobacteria, the Proteobacteria, and the Thermoplasmatales, an archaebacterial (euryarchaeotes) group. These signals correspond to distinct symbiotic partners involved in eukaryote evolution: plastids, mitochondria, and the elusive host lineage. According to our whole-genome data, eukaryotes are hardly the sister group of the Archaebacteria, because up to 83% of eukaryotic genes with a prokaryotic homolog have eubacterial, not archaebacterial, origins. The results reject all but two of the current hypotheses for the origin of eukaryotes: those assuming a sulfur-dependent or hydrogen-dependent syntrophy for the origin of mitochondria.

Journal ArticleDOI
TL;DR: This genome-wide survey demonstrates that multifunctional genes are common and illustrates the mechanistic diversity by which their products enhance metabolic robustness and evolvability.
Abstract: Our understanding of the origins of new metabolic functions is based upon anecdotal genetic and biochemical evidence. Some auxotrophies can be suppressed by overexpressing substrate-ambiguous enzymes (i.e., those that catalyze the same chemical transformation on different substrates). Other enzymes exhibit weak but detectable catalytic promiscuity in vitro (i.e., they catalyze different transformations on similar substrates). Cells adapt to novel environments through the evolution of these secondary activities, but neither their chemical natures nor their frequencies of occurrence have been characterized en bloc. Here, we systematically identified multifunctional genes within the Escherichia coli genome. We screened 104 single-gene knockout strains and discovered that many (20%) of these auxotrophs were rescued by the overexpression of at least one noncognate E. coli gene. The deleted gene and its suppressor were generally unrelated, suggesting that promiscuity is a product of contingency. This genome-wide survey demonstrates that multifunctional genes are common and illustrates the mechanistic diversity by which their products enhance metabolic robustness and evolvability.