scispace - formally typeset
Search or ask a question

Showing papers in "Molecular Biology and Evolution in 2006"


Journal ArticleDOI
TL;DR: This article reviews the terminology used for phylogenetic networks and covers both split networks and reticulate networks, how they are defined, and how they can be interpreted and outlines the beginnings of a comprehensive statistical framework for applying split network methods.
Abstract: The evolutionary history of a set of taxa is usually represented by a phylogenetic tree, and this model has greatly facilitated the discussion and testing of hypotheses. However, it is well known that more complex evolutionary scenarios are poorly described by such models. Further, even when evolution proceeds in a tree-like manner, analysis of the data may not be best served by using methods that enforce a tree structure but rather by a richer visualization of the data to evaluate its properties, at least as an essential first step. Thus, phylogenetic networks should be employed when reticulate events such as hybridization, horizontal gene transfer, recombination, or gene duplication and loss are believed to be involved, and, even in the absence of such events, phylogenetic networks have a useful role to play. This article reviews the terminology used for phylogenetic networks and covers both split networks and reticulate networks, how they are defined, and how they can be interpreted. Additionally, the article outlines the beginnings of a comprehensive statistical framework for applying split network methods. We show how split networks can represent confidence sets of trees and introduce a conservative statistical test for whether the conflicting signal in a network is treelike. Finally, this article describes a new program, SplitsTree4, an interactive and comprehensive tool for inferring different types of phylogenetic networks from sequences, distances, and trees.

7,273 citations


Journal ArticleDOI
TL;DR: This work provides "hard" minimum and "soft" maximum age constraints for 30 divergences among key genome model organisms; these should contribute to better understanding of the dating of the animal tree of life.
Abstract: The role of fossils in dating the tree of life has been misunderstood. Fossils can provide good "minimum" age estimates for branches in the tree, but "maximum" constraints on those ages are poorer. Current debates about which are the "best" fossil dates for calibration move to consideration of the most appropriate constraints on the ages of tree nodes. Because fossil-based dates are constraints, and because molecular evolution is not perfectly clock-like, analysts should use more rather than fewer dates, but there has to be a balance between many genes and few dates versus many dates and few genes. We provide "hard" minimum and "soft" maximum age constraints for 30 divergences among key genome model organisms; these should contribute to better understanding of the dating of the animal tree of life.

903 citations


Journal ArticleDOI
TL;DR: The exclusive common presence of fungivorous and plant parasitic nematodes supports a long-standing hypothesis that states that plant parasitic Nematodes arose from fungivory ancestors.
Abstract: Inference of evolutionary relationships between nematodes is severely hampered by their conserved morphology, the high frequency of homoplasy, and the scarcity of phylum-wide molecular data. To study the origin of nematode radiation and to unravel the phylogenetic relationships between distantly related species, 339 nearly full-length small-subunit rDNA sequences were analyzed from a diverse range of nematodes. Bayesian inference revealed a backbone comprising 12 consecutive dichotomies that subdivided the phylum Nematoda into 12 clades. The most basal clade is dominated by the subclass Enoplia, and members of the order Triplonchida occupy positions most close to the common ancestor of the nematodes. Crown Clades 8-12, a group formerly indicated as "Secernentea" that includes Caenorhabditis elegans and virtually all major plant and animal parasites, show significantly higher nucleotide substitution rates than the more basal Clades 1-7. Accelerated substitution rates are associated with parasitic lifestyles (Clades 8 and 12) or short generation times (Clades 9-11). The relatively high substitution rates in the distal clades resulted in numerous autapomorphies that allow in most cases DNA barcode-based species identification. Teratocephalus, a genus comprising terrestrial bacterivores, was shown to be most close to the starting point of Secernentean radiation. Notably, fungal feeding nematodes were exclusively found basal to or as sister taxon next to the 3 groups of plant parasitic nematodes, namely, Trichodoridae, Longidoridae, and Tylenchomorpha. The exclusive common presence of fungivorous and plant parasitic nematodes supports a long-standing hypothesis that states that plant parasitic nematodes arose from fungivorous ancestors.

843 citations


Journal ArticleDOI
TL;DR: In this article, a model-based framework was proposed to search a multiple-sequence alignment for putative recombination break points, quantifying the level of support for their locations, and identifying sequences or clades involved in recombination events.
Abstract: The evolution of homologous sequences affected by recombination or gene conversion cannot be adequately explained by a single phylogenetic tree. Many tree-based methods for sequence analysis, for example, those used for detecting sites evolving nonneutrally, have been shown to fail if such phylogenetic incongruity is ignored. However, it may be possible to propose several phylogenies that can correctly model the evolution of nonrecombinant fragments. We propose a model-based framework that uses a genetic algorithm to search a multiple-sequence alignment for putative recombination break points, quantifies the level of support for their locations, and identifies sequences or clades involved in putative recombination events. The software implementation can be run quickly and efficiently in a distributed computing environment, and various components of the methods can be chosen for computational expediency or statistical rigor. We evaluate the performance of the new method on simulated alignments and on an array of published benchmark data sets. Finally, we demonstrate that prescreening alignments with our method allows one to analyze recombinant sequences for positive selection.

832 citations


Journal ArticleDOI
TL;DR: A Bayesian Markov chain Monte Carlo algorithm for estimating species divergence times that uses heterogeneous data from multiple gene loci and accommodates multiple fossil calibration nodes is implemented, and soft bounds are used so that the probability that the true divergence time is outside the bounds is small but nonzero.
Abstract: We implement a Bayesian Markov chain Monte Carlo algorithm for estimating species divergence times that uses heterogeneous data from multiple gene loci and accommodates multiple fossil calibration nodes. A birth-death process with species sampling is used to specify a prior for divergence times, which allows easy assessment of the effects of that prior on posterior time estimates. We propose a new approach for specifying calibration points on the phylogeny, which allows the use of arbitrary and flexible statistical distributions to describe uncertainties in fossil dates. In particular, we use soft bounds, so that the probability that the true divergence time is outside the bounds is small but nonzero. A strict molecular clock is assumed in the current implementation, although this assumption may be relaxed. We apply our new algorithm to two data sets concerning divergences of several primate species, to examine the effects of the substitution model and of the prior for divergence times on Bayesian time estimation. We also conduct computer simulation to examine the differences between soft and hard bounds. We demonstrate that divergence time estimation is intrinsically hampered by uncertainties in fossil calibrations, and the error in Bayesian time estimates will not go to zero with increased amounts of sequence data. Our analyses of both real and simulated data demonstrate potentially large differences between divergence time estimates obtained using soft versus hard bounds and a general superiority of soft bounds. Our main findings are as follows. (1) When the fossils are consistent with each other and with the molecular data, and the posterior time estimates are well within the prior bounds, soft and hard bounds produce similar results. (2) When the fossils are in conflict with each other or with the molecules, soft and hard bounds behave very differently; soft bounds allow sequence data to correct poor calibrations, while poor hard bounds are impossible to overcome by any amount of data. (3) Soft bounds eliminate the need for "safe" but unrealistically high upper bounds, which may bias posterior time estimates. (4) Soft bounds allow more reliable assessment of estimation errors, while hard bounds generate misleadingly high precisions when fossils and molecules are in conflict.

744 citations


Journal ArticleDOI
TL;DR: This work investigated an efficient alternative to standard nucleotide substitution models, in which codon position (CP) is incorporated into the model, and determined the most appropriate model for alignments of 177 RNA virus genes and 106 yeast genes.
Abstract: Although phylogenetic inference of protein-coding sequences continues to dominate the literature, few analyses incorporate evolutionary models that consider the genetic code. This problem is exacerbated by the exclusion of codon-based models from commonly employed model selection techniques, presumably due to the computational cost associated with codon models. We investigated an efficient alternative to standard nucleotide substitution models, in which codon position (CP) is incorporated into the model. We determined the most appropriate model for alignments of 177 RNA virus genes and 106 yeast genes, using 11 substitution models including one codon model and four CP models. The majority of analyzed gene alignments are best described by CP substitution models, rather than by standard nucleotide models, and without the computational cost of full codon models. These results have significant implications for phylogenetic inference of coding sequences as they make it clear that substitution models incorporating CPs not only are a computationally realistic alternative to standard models but may also frequently be statistically superior.

720 citations


Journal ArticleDOI
TL;DR: It is found that the probability of survival of a new mutation depends to a large degree on its proximity to the edge of the wave, and a consequence of the surfing phenomenon is to increase the rate of evolution of spatially expanding populations.
Abstract: Many species, including humans, have dramatically expanded their range in the past, and such range expansions had certainly an impact on their genetic diversity. For example, mutations arising in populations at the edge of a range expansion can sometimes surf on the wave of advance and thus reach a larger spatial distribution and a much higher frequency than would be expected in stationary populations. We study here this surfing phenomenon in more detail, by performing extensive computer simulations under a two-dimensional stepping-stone model. We find that the probability of survival of a new mutation depends to a large degree on its proximity to the edge of the wave. Demographic factors such as deme size, migration rate, and local growth rate also influence the fate of these new mutations. We also find that the final spatial and frequency distributions depend on the local deme size of a subdivided population. This latter result is discussed in the light of human expansions in Europe as it should allow one to distinguish between mutations having spread with Paleolithic or Neolithic expansions. By favoring the spread of new mutations, a consequence of the surfing phenomenon is to increase the rate of evolution of spatially expanding populations.

624 citations


Journal ArticleDOI
TL;DR: A novel 2-stage Markov chain Monte Carlo (MCMC) method that first obtains independent Bayesian posterior probability distributions for individual genes using standard methods and introduces a one-parameter probability distribution to describe the prior distribution of concordance among gene trees.
Abstract: Multigene sequence data have great potential for elucidating important and interesting evolutionary processes, but statistical methods for extracting information from such data remain limited. Although various biological processes may cause different genes to have different genealogical histories (and hence different tree topologies), we also may expect that the number of distinct topologies among a set of genes is relatively small compared with the number of possible topologies. Therefore evidence about the tree topology for one gene should influence our inferences of the tree topology on a different gene, but to what extent? In this paper, we present a new approach for modeling and estimating concordance among a set of gene trees given aligned molecular sequence data. Our approach introduces a one-parameter probability distribution to describe the prior distribution of concordance among gene trees. We describe a novel 2-stage Markov chain Monte Carlo (MCMC) method that first obtains independent Bayesian posterior probability distributions for individual genes using standard methods. These posterior distributions are then used as input for a second MCMC procedure that estimates a posterior distribution of gene-to-tree maps (GTMs). The posterior distribution of GTMs can then be summarized to provide revised posterior probability distributions for each gene (taking account of concordance) and to allow estimation of the proportion of the sampled genes for which any given clade is true (the sample-wide concordance factor). Further, under the assumption that the sampled genes are drawn randomly from a genome of known size, we show how one can obtain an estimate, with credibility intervals, on the proportion of the entire genome for which a clade is true (the genome-wide concordance factor). We demonstrate the method on a set of 106 genes from 8 yeast species.

485 citations


Journal ArticleDOI
TL;DR: The hypothesis that translational selection governs the rate of synonymous and protein sequence evolution in yeast is supported by the first combined analysis of seven predictors previously reported to have independent influences on protein evolutionary rates.
Abstract: A gene's rate of sequence evolution is among the most fundamental evolutionary quantities in common use, but what determines evolutionary rates has remained unclear. Here, we carry out the first combined analysis of seven predictors (gene expression level, dispensability, protein abundance, codon adaptation index, gene length, number of protein-protein interactions, and the gene's centrality in the interaction network) previously reported to have independent influences on protein evolutionary rates. Strikingly, our analysis reveals a single dominant variable linked to the number of translation events which explains 40-fold more variation in evolutionary rate than any other, suggesting that protein evolutionary rate has a single major determinant among the seven predictors. The dominant variable explains nearly half the variation in the rate of synonymous and protein evolution. We show that the two most commonly used methods to disentangle the determinants of evolutionary rate, partial correlation analysis and ordinary multivariate regression, produce misleading or spurious results when applied to noisy biological data. We overcome these difficulties by employing principal component regression, a multivariate regression of evolutionary rate against the principal components of the predictor variables. Our results support the hypothesis that translational selection governs the rate of synonymous and protein sequence evolution in yeast.

415 citations


Journal ArticleDOI
TL;DR: A case for the recent convergent evolution of a lighter pigmentation phenotype in Europeans and East Asians is supported by the testing for the presence of positive directional selection in 6 pigmentation genes using an empirical F(ST) approach and a role for MATP in determining normal skin pigmentation variation using admixture mapping methods.
Abstract: Human skin pigmentation shows a strong positive correlation with ultraviolet radiation intensity, suggesting that variation in skin color is, at least partially, due to adaptation via natural selection. We investigated the evolution of pigmentation variation by testing for the presence of positive directional selection in 6 pigmentation genes using an empirical F(ST) approach, through an examination of global diversity patterns of these genes in the Centre d'Etude du Polymorphisme Humain (CEPH)-Diversity Panel, and by exploring signatures of selection in data from the International HapMap project. Additionally, we demonstrated a role for MATP in determining normal skin pigmentation variation using admixture mapping methods. Taken together (with the results of previous admixture mapping studies), these results point to the importance of several genes in shaping the pigmentation phenotype and a complex evolutionary history involving strong selection. Polymorphisms in 2 genes, ASIP and OCA2, may play a shared role in shaping light and dark pigmentation across the globe, whereas SLC24A5, MATP, and TYR have a predominant role in the evolution of light skin in Europeans but not in East Asians. These findings support a case for the recent convergent evolution of a lighter pigmentation phenotype in Europeans and East Asians.

380 citations


Journal ArticleDOI
Michael Lynch1
TL;DR: By establishing an essentially permanent change in the population-genetic environment permissive to the genome-wide repatterning of gene structure, the eukaryotic condition also promoted a reliable resource from which natural selection could secondarily build novel forms of organismal complexity.
Abstract: Most of the phenotypic diversity that we perceive in the natural world is directly attributable to the peculiar structure of the eukaryotic gene, which harbors numerous embellishments relative to the situation in prokaryotes. The most profound changes include introns that must be spliced out of precursor mRNAs, transcribed but untranslated leader and trailer sequences (untranslated regions), modular regulatory elements that drive patterns of gene expression, and expansive intergenic regions that harbor additional diffuse control mechanisms. Explaining the origins of these features is difficult because they each impose an intrinsic disadvantage by increasing the genic mutation rate to defective alleles. To address these issues, a general hypothesis for the emergence of eukaryotic gene structure is provided here. Extensive information on absolute population sizes, recombination rates, and mutation rates strongly supports the view that eukaryotes have reduced genetic effective population sizes relative to prokaryotes, with especially extreme reductions being the rule in multicellular lineages. The resultant increase in the power of random genetic drift appears to be sufficient to overwhelm the weak mutational disadvantages associated with most novel aspects of the eukaryotic gene, supporting the idea that most such changes are simple outcomes of semi-neutral processes rather than direct products of natural selection. However, by establishing an essentially permanent change in the population-genetic environment permissive to the genome-wide repatterning of gene structure, the eukaryotic condition also promoted a reliable resource from which natural selection could secondarily build novel forms of organismal complexity. Under this hypothesis, arguments based on molecular, cellular, and/or physiological constraints are insufficient to explain the disparities in gene, genomic, and phenotypic complexity between prokaryotes and eukaryotes.

Journal ArticleDOI
TL;DR: By phylogenetic analysis, the evolutionary origin of 675 Tetraodon duplicated genes assigned to chromosomes are determined, showing that similar mechanisms are at work in fishes as in yeast or plants and provide a framework for future investigation of the consequences of duplication in fishes and other animals.
Abstract: Teleost fishes provide the first unambiguous support for ancient whole-genome duplication in an animal lineage. Studies in yeast or plants have shown that the effects of such duplications can be mediated by a complex pattern of gene retention and changes in evolutionary pressure. To explore such patterns in fishes, we have determined by phylogenetic analysis the evolutionary origin of 675 Tetraodon duplicated genes assigned to chromosomes, using additional data from other species of actinopterygian fishes. The subset of genes, which was retained in double after the genome duplication, is enriched in development, signaling, behavior, and regulation functional categories. The evolutionary rate of duplicate fish genes appears to be determined by 3 forces: 1) fish proteins evolve faster than mammalian orthologs; 2) the genes kept in double after genome duplication represent the subset under strongest purifying selection; and 3) following duplication, there is an asymmetric acceleration of evolutionary rate in one of the paralogs. These results show that similar mechanisms are at work in fishes as in yeast or plants and provide a framework for future investigation of the consequences of duplication in fishes and other animals.

Journal ArticleDOI
TL;DR: Chumley et al. as discussed by the authors sequenced the chloroplast genome of Pelargonium e hortorum and mapped it as a circular molecule of 217,942 bp.
Abstract: Author(s): Chumley, Timothy W.; Palmer, Jeffrey D.; Mower, Jeffrey P.; Fourcade, H. Matthew; Calie, Patrick J.; Boore, Jeffrey L.; Jansen, Robert K. | Abstract: The chloroplast genome of Pelargonium e hortorum has been completely sequenced. It maps as a circular molecule of 217,942 bp, and is both the largest and most rearranged land plant chloroplast genome yet sequenced. It features two copies of a greatly expanded inverted repeat (IR) of 75,741 bp each, and consequently diminished single copy regions of 59,710 bp and 6,750 bp. It also contains two different associations of repeated elements that contribute about 10 percent to the overall size and account for the majority of repeats found in the genome. They represent hotspots for rearrangements and gene duplications and include a large number of pseudogenes. We propose simple models that account for the major rearrangements with a minimum of eight IR boundary changes and 12 inversions in addition to a several insertions of duplicated sequence. The major processes at work (duplication, IR expansion, and inversion) have disrupted at least one and possibly two or three transcriptional operons, and the genes involved in these disruptions form the core of the two major repeat associations. Despite the vast increase in size and complexity of the genome, the gene content is similar to that of other angiosperms, with the exceptions of a large number of pseudogenes as part of the repeat associations, the recognition of two open reading frames (ORF56 and ORF42) in the trnA intron with similarities to previously identified mitochondrial products (ACRS and pvs-trnA), the loss of accD and trnT-GGU, and in particular, the lack of a recognizably functional rpoA. One or all of three similar open reading frames may possibly encode the latter, however.

Journal ArticleDOI
TL;DR: Results from parsimony ratchet and Bayesian analyses recovered little support for the backbone of the phylogeny, suggesting that many lineages of Brassicaceae have undergone rapid radiations that may ultimately be difficult to resolve with any single locus.
Abstract: The Brassicaceae is a large plant family (338 genera and 3,700 species) of major scientific and economic importance. The taxonomy of this group has been plagued by convergent evolution in nearly every morphological feature used to define tribes and genera. Phylogenetic analysis of 746 nrDNA internal transcribed spacer (ITS) sequences, representing 24 of the 25 currently recognized tribes, 146 genera, and 461 species of Brassicaceae, produced the most comprehensive, single-locus-based phylogenetic analysis of the family published to date. Novel approaches to nrDNA ITS analysis and extensive taxonomic sampling offered a test of monophyly for a large complement of the currently recognized tribes and genera of Brassicaceae. In the most comprehensive analysis, tribes Alysseae, Anchonieae plus Hesperideae, Boechereae, Cardamineae, Eutremeae, Halimolobeae, Iberideae, Noccaeeae, Physarieae, Schizopetaleae, Smelowskieae, and Thlaspideae were all monophyletic. Several broadly defined genera (e.g., Draba and Smelowskia) were supported as monophyletic, whereas others (e.g., Sisymbrium and Alyssum) were clearly polyphyletic. Analyses of ITS data identified several problematic sequences attributable to errors in sample identification or database submission. Results from parsimony ratchet and Bayesian analyses recovered little support for the backbone of the phylogeny, suggesting that many lineages of Brassicaceae have undergone rapid radiations that may ultimately be difficult to resolve with any single locus. However, the development of a preliminary supermatrix including the combination of 10 loci for 65 species provides an initial estimate of intertribal relations and suggests that broad application of such a method will provide greater understanding of relationships in the family.

Journal ArticleDOI
TL;DR: This study uses analytical calculations based on coalescent theory and computer simulations to analyze molecular adaptation from recurrent mutation or migration, and derives a robust analytical approximation for the number of ancestral haplotypes and their distribution in a sample from the population.
Abstract: In the classical model of molecular adaptation, a favored allele derives from a single mutational origin. This ignores that beneficial alleles can enter a population recurrently, either by mutation or migration, during the selective phase. In this case, descendants of several of these independent origins may contribute to the fixation. As a consequence, all ancestral haplotypes that are linked to any of these copies will be retained in the population, affecting the pattern of a selective sweep on linked neutral variation. In this study, we use analytical calculations based on coalescent theory and computer simulations to analyze molecular adaptation from recurrent mutation or migration. Under the assumption of complete linkage, we derive a robust analytical approximation for the number of ancestral haplotypes and their distribution in a sample from the population. We find that so-called "soft sweeps," where multiple ancestral haplotypes appear in a sample, are likely for biologically realistic values of mutation or migration rates.

Journal ArticleDOI
TL;DR: It is concluded that selection on synonymous codon use in E. coli is largely due to selection for translational accuracy, to reduce the costs of both missense and nonsense errors.
Abstract: In many organisms, selection acts on synonymous codons to improve translation. However, the precise basis of this selection remains unclear in the majority of species. Selection could be acting to maximize the speed of elongation, to minimize the costs of proofreading, or to maximize the accuracy of translation. Using several data sets, we find evidence that codon use in Escherichia coli is biased to reduce the costs of both missense and nonsense translational errors. Highly conserved sites and genes have higher codon bias than less conserved ones, and codon bias is positively correlated to gene length and production costs, both indicating selection against missense errors. Additionally, codon bias increases along the length of genes, indicating selection against nonsense errors. Doublet mutations or replacement substitutions do not explain our observations. The correlations remain when we control for expression level and for conflicting selection pressures at the start and end of genes. Considering each amino acid by itself confirms our results. We conclude that selection on synonymous codon use in E. coli is largely due to selection for translational accuracy, to reduce the costs of both missense and nonsense errors.

Journal ArticleDOI
TL;DR: The genetic frame of adaptation to a gradient of altitude in the common frog is investigated by means of a genome scan based on 392 amplified fragment length polymorphism markers and the need for confirmation of the selection footprints for the outlier loci is highlighted.
Abstract: Today, with the rapid development of population genomics, the genetic basis of adaptation can be unraveled directly at the genome level, without any prerequisites about the selectively advantageous genes or traits. For nonmodel species, it is now possible to screen many markers randomly scattered across the genome and to distinguish between the neutral genetic background and outlier loci displaying an atypical behavior (e.g., a higher differentiation between populations). This study investigated the genetic frame of adaptation to a gradient of altitude in the common frog (Rana temporaria) by means of a genome scan based on 392 amplified fragment length polymorphism markers. Using two outlier detection methods never applied to dominant data so far, we sought for loci with a genetic differentiation diverging from neutral expectations when comparing populations from different altitudes. All the detected loci were sorted out according to their most probable cause for outlier behavior and classified as false positives, outliers due to local effects, or outliers associated with altitude. Altogether, eight good candidate loci were identified as potentially involved in adaptation to altitude because they were picked out in several independent interaltitude comparisons. This result illustrated the potential of genome-wide surveys to reveal selection signatures along selection gradients, where the association between environmental variables and fitness-related traits may be complex and/or cryptic. In this article, we also underlined the need for confirmation of the selection footprints for the outlier loci. Finally, we provided some preliminary insights into the genetic basis of adaptation along an altitudinal cline in the common frog.

Journal ArticleDOI
TL;DR: A rigorous answer to this question has recently been provided by further mathematical investigation, and the purpose of this note is to highlight these results and their significance for interpreting NJ.
Abstract: It is nearly 20 years since the landmark paper (Saitou and Nei, 1987) in MBE introducing Neighbor-Joining (NJ). The method has become the most widely-used method for building phylogenetic trees from distances, and the original paper has been cited about 13,000 times (Science Citation Index ). Yet the question 'what does the NJ method seek to do?' has until recently proved somewhat elusive, leading to some imprecise claims and misunderstanding. However a rigorous answer to this question has recently been provided by further mathematical investigation, and the purpose of this note is to highlight these results and their significance for interpreting NJ. The origins of this story lie in a paper by Pauplin (2000) though its continuation has unfolded in more mathematically-inclined literature. Our aim here is to make these findings more widely accessible.

Journal ArticleDOI
TL;DR: A computer simulation study has been made of the accuracy of estimates of Theta = 4Nemu from a sample from a single isolated population of finite size, and the accuracies turn out to be well predicted by a formula developed by Fu and Li, who used optimistic assumptions.
Abstract: A computer simulation study has been made of the accuracy of estimates of Theta = 4Nemu from a sample from a single isolated population of finite size. The accuracies turn out to be well predicted by a formula developed by Fu and Li, who used optimistic assumptions. Their formulas are restated in terms of accuracy, defined here as the reciprocal of the squared coefficient of variation. This should be proportional to sample size when the entities sampled provide independent information. Using these formulas for accuracy, the sampling strategy for estimation of Theta can be investigated. Two models for cost have been used, a cost-per-base model and a cost-per-read model. The former would lead us to prefer to have a very large number of loci, each one base long. The latter, which is more realistic, causes us to prefer to have one read per locus and an optimum sample size which declines as costs of sampling organisms increase. For realistic values, the optimum sample size is 8 or fewer individuals. This is quite close to the results obtained by Pluzhnikov and Donnelly for a cost-per-base model, evaluating other estimators of Theta. It can be understood by considering that the resources spent collecting larger samples prevent us from considering more loci. An examination of the efficiency of Watterson's estimator of Theta was also made, and it was found to be reasonably efficient when the number of mutants per generation in the sequence in the whole population is less than 2.5.

Journal ArticleDOI
TL;DR: This study provides the first evidence for possible sexual reproduction in P. brasiliensis S1, but does not rule it out in the other two species.
Abstract: Paracoccidioides brasiliensis is the etiologic agent of paracoccidioidomycosis, a disease confined to Latin America and of marked importance in the endemic areas due to its frequency and severity. This species is considered to be clonal according to mycological criteria and has been shown to vary in virulence. To characterize natural genetic variation and reproductive mode in this fungus, we analyzed P. brasiliensis phylogenetically in search of cryptic species and possible recombination using concordance and nondiscordance of gene genealogies with respect to phylogenies of eight regions in five nuclear loci. Our data indicate that this fungus consists of at least three distinct, previously unrecognized species: S1 (species 1 with 38 isolates), PS2 (phylogenetic species 2 with six isolates), and PS3 (phylogenetic species 3 with 21 isolates). Genealogies of four of the regions studied strongly supported the PS2 clade, composed of five Brazilian and one Venezuelan isolate. The second clade, PS3, composed solely of 21 Colombian isolates, was strongly supported by the alpha-tubulin genealogy. The remaining 38 individuals formed S1. Two of the three lineages of P. brasiliensis, S1 and PS2, are sympatric across their range, suggesting barriers to gene flow other than geographic isolation. Our study provides the first evidence for possible sexual reproduction in P. brasiliensis S1, but does not rule it out in the other two species.

Journal ArticleDOI
TL;DR: Excess divergence specifically indicative of subfunctionalization and/or neofunctionalization contributes to the maintenance of most if not all duplicated regulatory genes in Arabidopsis and it is hypothesized that this results in increasing expression diversity or specificity of regulatory genes after each round of duplication.
Abstract: Gene duplication plays an important role in the evolution of diversity and novel function and is especially prevalent in the nuclear genomes of flowering plants. Duplicate genes may be maintained through subfunctionalization and neofunctionalization at the level of expression or coding sequence. In order to test the hypothesis that duplicated regulatory genes will be differentially expressed in a specific manner indicative of regulatory subfunctionalization and/or neofunctionalization, we examined expression pattern shifts in duplicated regulatory genes in Arabidopsis. A two-way analysis of variance was performed on expression data for 280 phylogenetically identified paralogous pairs. Expression data were extracted from global expression profiles for wild-type root, stem, leaf, developing inflorescence, nearly mature flower buds, and seedpod. Gene, organ, and gene by organ interaction (G x O) effects were examined. Results indicate that 85% of the paralogous pairs exhibited a significant G x O effect indicative of regulatory subfunctionalization and/or neofunctionalization. A significant G x O effect was associated with complementary expression patterns in 45% of pairwise comparisons. No association was detected between a G x O effect and a relaxed evolutionary constraint as detected by the ratio of nonsynonymous to synonymous substitutions. Ancestral gene expression patterns inferred across a Type II MADS-box gene phylogeny suggest several cases of regulatory neofunctionalization and organ-specific nonfunctionalization. Complete linkage clustering of gene expression levels across organs suggests that regulatory modules for each organ are independent or ancestral genes had limited expression. We propose a new classification, regulatory hypofunctionalization, for an overall decrease in expression level in one member of a paralogous pair while still having a significant G x O effect. We conclude that expression divergence specifically indicative of subfunctionalization and/or neofunctionalization contributes to the maintenance of most if not all duplicated regulatory genes in Arabidopsis and hypothesize that this results in increasing expression diversity or specificity of regulatory genes after each round of duplication.

Journal ArticleDOI
TL;DR: It is suggested that any further testing of the legitimacy of this taxon should, at the least, include data from opisthokont protists, and the results underline the critical position of these "animal-fungal allies" with respect to the origin and early evolution of animals and fungi.
Abstract: A close evolutionary relationship between Metazoa (animals) and Fungi was proposed over 20 years ago. The name Opisthokonta reflects the presence of a single, posteriorly directed flagellum found on the reproductive cells of most metazoans and some fungi. Subsequent molecular work confirmed this relationship and also identified several protistan groups (including the choanoflagellates, ichthyosporeans and nucleariids) belonging to this clade. In this chapter we review the literature on this group in order to describe the opisthokont lineages and the current thinking about how they are related to each other. The phylogeny of the opisthokonts is far from complete and we will discuss the areas that need to be addressed, as well as current evidence on the possible sister-groups of the opisthokonts.

Journal ArticleDOI
TL;DR: It is demonstrated that these conflicting angiosperm phylogenies are most probably linked to the transitional sites at all codon positions, especially at the third one where the strong base-composition bias and saturation effect take place.
Abstract: Whether the Amborella/Amborella-Nymphaeales or the grass lineage diverged first within the angiosperms has recently been debated. Central to this issue has been focused on the artifacts that might result from sampling only grasses within the monocots. We therefore sequenced the entire chloroplast genome (cpDNA) of Phalaenopsis aphrodite, Taiwan moth orchid. The cpDNA is a circular molecule of 148,964 bp with a comparatively short single-copy region (11,543 bp) due to the unusual loss and truncation/scattered deletion of certain ndh subunits. An open reading frame, orf91, located in the complementary strand of the rrn23 was reported for the first time. A comparison of nucleotide substitutions between P. aphrodite and the grasses indicates that only the plastid expression genes have a strong positive correlation between nonsynonymous (K a ) and synonymous (f s ) substitutions per site, providing evidence for a generation time effect, mainly across these genes. Among the intron-containing protein-coding genes of the sampled monocots, the K s of the genes are significantly correlated to transitional substitutions of their introns. We compiled a concatenated 61 protein-coding gene alignment for the available 20 cpDNAs of vascular plants and analyzed the data set using Bayesian inference, maximum parsimony, and neighbor-joining (NJ) methods. The analyses yielded robust support for the Amborella/Amborella-Nymphaeales-basal hypothesis and for the orchid and grasses together being a monophyletic group nested within the remaining angiosperms. However, the NJ analysis using K a , the first two codon positions, or amino acid sequences, respectively, supports the monocots-basal hypothesis. We demonstrated that these conflicting angiosperm phylogenies are most probably linked to the transitional sites at all codon positions, especially at the third one where the strong base-composition bias and saturation effect take place.

Journal ArticleDOI
TL;DR: Surprisingly, gradients in the frequency distribution of some NRY/mtDNA haplogroups across Polynesia and a gradual west-to-east decrease of overall Nry/mt DNA diversity are identified, providing evidence for a west- to-east direction of Polynesian settlements but also suggesting that Pacific voyaging was regular rather than haphazard.
Abstract: The human settlement of the Pacific Islands represents one of the most recent major migration events of mankind. Polynesians originated in Asia according to linguistic evidence or in Melanesia according to archaeological evidence. To shed light on the genetic origins of Polynesians, we investigated over 400 Polynesians from 8 island groups, in comparison with over 900 individuals from potential parental populations of Melanesia, Southeast and East Asia, and Australia, by means of Y chromosome (NRY) and mitochondrial DNA (mtDNA) markers. Overall, we classified 94.1% of Polynesian Y chromosomes and 99.8% of Polynesian mtDNAs as of either Melanesian (NRY-DNA: 65.8%, mtDNA: 6%) or Asian (NRY-DNA: 28.3%, mtDNA: 93.8%) origin, suggesting a dual genetic origin of Polynesians in agreement with the "Slow Boat" hypothesis. Our data suggest a pronounced admixture bias in Polynesians toward more Melanesian men than women, perhaps as a result of matrilocal residence in the ancestral Polynesian society. Although dating methods are consistent with somewhat similar entries of NRY/mtDNA haplogroups into Polynesia, haplotype sharing suggests an earlier appearance of Melanesian haplogroups than those from Asia. Surprisingly, we identified gradients in the frequency distribution of some NRY/mtDNA haplogroups across Polynesia and a gradual west-to-east decrease of overall NRY/mtDNA diversity, not only providing evidence for a west-to-east direction of Polynesian settlements but also suggesting that Pacific voyaging was regular rather than haphazard. We also demonstrate that Fiji played a pivotal role in the history of Polynesia: humans probably first migrated to Fiji, and subsequent settlement of Polynesia probably came from Fiji.

Journal ArticleDOI
TL;DR: A test for events around the Late Cretaceous is reported by describing the earliest penguin fossils, analyzing complete mitochondrial genomes from an albatross, a petrel, and a loon, and describing the gradual decline of pterosaurs at the same time modern birds radiate.
Abstract: Testing models of macroevolution, and especially the sufficiency of microevolutionary processes, requires good collaboration between molecular biologists and paleontologists. We report such a test for events around the Late Cretaceous by describing the earliest penguin fossils, analyzing complete mitochondrial genomes from an albatross, a petrel, and a loon, and describe the gradual decline of pterosaurs at the same time modern birds radiate. The penguin fossils comprise four naturally associated skeletons from the New Zealand Waipara Greensand, a Paleocene (early Tertiary) formation just above a well-known Cretaceous/Tertiary boundary site. The fossils, in a new genus (Waimanu), provide a lower estimate of 61-62 Ma for the divergence between penguins and other birds and thus establish a reliable calibration point for avian evolution. Combining fossil calibration points, DNA sequences, maximum likelihood, and Bayesian analysis, the penguin calibrations imply a radiation of modern (crown group) birds in the Late Cretaceous. This includes a conservative estimate that modern sea and shorebird lineages diverged at least by the Late Cretaceous about 74 +/- 3 Ma (Campanian). It is clear that modern birds from at least the latest Cretaceous lived at the same time as archaic birds including Hesperornis, Ichthyornis, and the diverse Enantiornithiformes. Pterosaurs, which also coexisted with early crown birds, show notable changes through the Late Cretaceous. There was a decrease in taxonomic diversity, and small- to medium-sized species disappeared well before the end of the Cretaceous. A simple reading of the fossil record might suggest competitive interactions with birds, but much more needs to be understood about pterosaur life histories. Additional fossils and molecular data are still required to help understand the role of biotic interactions in the evolution of Late Cretaceous birds and thus to test that the mechanisms of microevolution are sufficient to explain macroevolution.

Journal ArticleDOI
TL;DR: The idea that genome duplication could contribute to species diversity through reduced probability of extinction through functional redundancy, mutational robustness, increased rates of evolution, and adaptation is developed.
Abstract: Gene and genome duplications provide a source of genetic material for mutation, drift, and selection to act upon, making new evolutionary opportunities possible. As a result, many have argued that genome duplication is a dominant factor in the evolution of complexity and diversity. However, a clear correlation between a genome duplication event and increased complexity and diversity is not apparent, and there are inconsistencies in the patterns of diversity invoked to support this claim. Interestingly, several estimates of genome duplication events in vertebrates are preceded by multiple extinct lineages, resulting in preduplication gaps in extant taxa. Here we argue that gen(om)e duplication could contribute to reduced risk of extinction via functional redundancy, mutational robustness, increased rates of evolution, and adaptation. The timeline for these processes to unfold would not predict immediate increases in species diversity after the duplication event. Rather, reduced probabilities of extinction would predict a latent period between a genome duplication and its effect on species diversity or complexity. In this paper, we will develop the idea that genome duplication could contribute to species diversity through reduced probability of extinction.

Journal ArticleDOI
TL;DR: At the present time, no genetic exchange occurs between pathogen populations on wheat and wild grasses although it is found evidence that gene flow may have occurred since genetic differentiation of the populations.
Abstract: The Fertile Crescent represents the center of origin and earliest known place of domestication for many cereal crops. During the transition from wild grasses to domesticated cereals, many host-specialized pathogen species are thought to have emerged. A sister population of the wheat-adapted pathogen Mycosphaerella graminicola was identified on wild grasses collected in northwest Iran. Isolates of this wild grass pathogen from 5 locations in Iran were compared with 123 M. graminicola isolates from the Middle East, Europe, and North America. DNA sequencing revealed a close phylogenetic relationship between the pathogen populations. To reconstruct the evolutionary history of M. graminicola, we sequenced 6 nuclear loci encompassing 464 polymorphic sites. Coalescence analyses indicated a relatively recent origin of M. graminicola, coinciding with the known domestication of wheat in the Fertile Crescent around 8,000-9,000 BC. The sympatric divergence of populations was accompanied by strong genetic differentiation. At the present time, no genetic exchange occurs between pathogen populations on wheat and wild grasses although we found evidence that gene flow may have occurred since genetic differentiation of the populations.

Journal ArticleDOI
TL;DR: Estimating rates of nucleotide substitution in a diverse array of avian influenza viruses and allowing for rate variation among lineages estimated that Influenza A virus exhibits rapid evolutionary dynamics across its host range, consistent with a high background mutation rate and rapid replication.
Abstract: Influenza A viruses from wild aquatic birds, their natural reservoir species, are thought to have reached a form of stasis, characterized by low rates of evolutionary change. We tested this hypothesis by estimating rates of nucleotide substitution in a diverse array of avian influenza viruses (AIV) and allowing for rate variation among lineages. The rates observed were extremely high, at >10(-3) substitutions per site, per year, with little difference among wild and domestic host species or viral subtypes and were similar to those seen in mammalian influenza A viruses. Influenza A virus therefore exhibits rapid evolutionary dynamics across its host range, consistent with a high background mutation rate and rapid replication. Using the same approach, we also estimated that the common ancestors of the hemagglutinin and neuraminidase sequences of AIV arose within the last 3,000 years, with most intrasubtype diversity emerging within the last 100 years and suggestive of a continual selective turnover.

Journal ArticleDOI
TL;DR: The first comprehensive analysis of mitogenomic data of 48 vertebrates, including 35 birds, is performed to derive a Bayesian timescale for avian evolution and to estimate rates of DNA evolution, finding no support for the hypothesis that the molecular clock in birds "ticks" according to a constant rate of substitution per unit of mass-specific metabolic energy rather than per unitOf time, as recently suggested.
Abstract: Current understanding of the diversification of birds is hindered by their incomplete fossil record and uncertainty in phylogenetic relationships and phylogenetic rates of molecular evolution. Here we performed the first comprehensive analysis of mitogenomic data of 48 vertebrates, including 35 birds, to derive a Bayesian timescale for avian evolution and to estimate rates of DNA evolution. Our approach used multiple fossil time constraints scattered throughout the phylogenetic tree and accounts for uncertainties in time constraints, branch lengths, and heterogeneity of rates of DNA evolution. We estimated that the major vertebrate lineages originated in the Permian; the 95% credible intervals of our estimated ages of the origin of archosaurs (258 MYA), the amniote-amphibian split (356 MYA), and the archosaur-lizard divergence (278 MYA) bracket estimates from the fossil record. The origin of modern orders of birds was estimated to have occurred throughout the Cretaceous beginning about 139 MYA, arguing against a cataclysmic extinction of lineages at the Cretaceous/Tertiary boundary. We identified fossils that are useful as time constraints within vertebrates. Our timescale reveals that rates of molecular evolution vary across genes and among taxa through time, thereby refuting the widely used mitogenomic or cytochrome b molecular clock in birds. Moreover, the 5-Myr divergence time assumed between 2 genera of geese (Branta and Anser) to originally calibrate the standard mitochondrial clock rate of 0.01 substitutions per site per lineage per Myr (s/s/l/Myr) in birds was shown to be underestimated by about 9.5 Myr. Phylogenetic rates in birds vary between 0.0009 and 0.012 s/s/l/Myr, indicating that many phylogenetic splits among avian taxa also have been underestimated and need to be revised. We found no support for the hypothesis that the molecular clock in birds "ticks" according to a constant rate of substitution per unit of mass-specific metabolic energy rather than per unit of time, as recently suggested. Our analysis advances knowledge of rates of DNA evolution across birds and other vertebrates and will, therefore, aid comparative biology studies that seek to infer the origin and timing of major adaptive shifts in vertebrates.

Journal ArticleDOI
TL;DR: It is shown that synonymous sites in putative ESEs evolve more slowly than the remaining exonic sequence, and constraints on synonymous evolution within E SEs causes the true mutation rate to be underestimated by not more than approximately 8%.
Abstract: Silent sites in mammals have classically been assumed to be free from selective pressures Consequently, the synonymous substitution rate (Ks) is often used as a proxy for the mutation rate Although accumulating evidence demonstrates that the assumption is not valid, the mechanism by which selection acts remain unclear Recent work has revealed that the presence of exonic splicing enhancers (ESEs) in coding sequence might influence synonymous evolution ESEs are predominantly located near intron-exon junctions, which may explain the reduced single-nucleotide polymorphism (SNP) density in these regions Here we show that synonymous sites in putative ESEs evolve more slowly than the remaining exonic sequence Differential mutabilities of ESEs do not appear to explain this difference We observe that substitution frequency at fourfold synonymous sites decreases as one approaches the ends of exons, consistent with the existing SNP data This gradient is at least in part explained by ESEs being more abundant near junctions Between-gene variation in Ks is hence partly explained by the proportion of the gene that acts as an ESE Given the relative abundance of ESEs and the reduced rates of synonymous divergence within them, we estimate that constraints on synonymous evolution within ESEs causes the true mutation rate to be underestimated by not more than approximately 8% We also find that Ks outside of ESEs is much lower in alternatively spliced exons than in constitutive exons, implying that other causes of selection on synonymous mutations exist Additionally, selection on ESEs appears to affect nonsynonymous sites and may explain why amino acid usage near intron-exon junctions is nonrandom