scispace - formally typeset
Search or ask a question

Showing papers in "Genetics in 2010"


Journal ArticleDOI
01 Oct 2010-Genetics
TL;DR: A new class of sequence-specific nucleases created by fusing transcription activator-like effectors (TALEs) to the catalytic domain of the FokI endonuclease is reported.
Abstract: Engineered nucleases that cleave specific DNA sequences in vivo are valuable reagents for targeted mutagenesis. Here we report a new class of sequence-specific nucleases created by fusing transcription activator-like effectors (TALEs) to the catalytic domain of the FokI endonuclease. Both native and custom TALE-nuclease fusions direct DNA double-strand breaks to specific, targeted sites.

1,928 citations


Journal ArticleDOI
01 Oct 2010-Genetics
TL;DR: The increased strength and reliability of these optimized reagents overcome many of the previous limitations of these methods and will facilitate genetic manipulations of greater complexity and sophistication in Drosophila melanogaster.
Abstract: A wide variety of biological experiments rely on the ability to express an exogenous gene in a transgenic animal at a defined level and in a spatially and temporally controlled pattern. We describe major improvements of the methods available for achieving this objective in Drosophila melanogaster. We have systematically varied core promoters, UTRs, operator sequences, and transcriptional activating domains used to direct gene expression with the GAL4, LexA, and Split GAL4 transcription factors and the GAL80 transcriptional repressor. The use of site-specific integration allowed us to make quantitative comparisons between different constructs inserted at the same genomic location. We also characterized a set of PhiC31 integration sites for their ability to support transgene expression of both drivers and responders in the nervous system. The increased strength and reliability of these optimized reagents overcome many of the previous limitations of these methods and will facilitate genetic manipulations of greater complexity and sophistication.

1,033 citations


Journal ArticleDOI
01 Aug 2010-Genetics
TL;DR: A Bayesian method that estimates the empirical pattern of covariance in allele frequencies between populations from a set of markers and then uses this as a null model for a test at individual SNPs, which can be used to identify SNPs with unusually large allele frequency differentiation and offers a powerful alternative to tests based on pairwise or global FST.
Abstract: Loci involved in local adaptation can potentially be identified by an unusual correlation between allele frequencies and important ecological variables or by extreme allele frequency differences between geographic regions. However, such comparisons are complicated by differences in sample sizes and the neutral correlation of allele frequencies across populations due to shared history and gene flow. To overcome these difficulties, we have developed a Bayesian method that estimates the empirical pattern of covariance in allele frequencies between populations from a set of markers and then uses this as a null model for a test at individual SNPs. In our model the sample frequencies of an allele across populations are drawn from a set of underlying population frequencies; a transform of these population frequencies is assumed to follow a multivariate normal distribution. We first estimate the covariance matrix of this multivariate normal across loci using a Monte Carlo Markov chain. At each SNP, we then provide a measure of the support, a Bayes factor, for a model where an environmental variable has a linear effect on the transformed allele frequencies compared to a model given by the covariance matrix alone. This test is shown through power simulations to outperform existing correlation tests. We also demonstrate that our method can be used to identify SNPs with unusually large allele frequency differentiation and offers a powerful alternative to tests based on pairwise or global F(ST). Software is available at http://www.eve.ucdavis.edu/gmcoop/.

687 citations


Journal ArticleDOI
01 Oct 2010-Genetics
TL;DR: Evaluated parametric and semiparametric models for GS using wheat and maize data in which different traits were measured in several environmental conditions indicate that models including marker information had higher predictive ability than pedigree-based models.
Abstract: The availability of dense molecular markers has made possible the use of genomic selection (GS) for plant breeding. However, the evaluation of models for GS in real plant populations is very limited. This article evaluates the performance of parametric and semiparametric models for GS using wheat (Triticum aestivum L.) and maize (Zea mays) data in which different traits were measured in several environmental conditions. The findings, based on extensive cross-validations, indicate that models including marker information had higher predictive ability than pedigree-based models. In the wheat data set, and relative to a pedigree model, gains in predictive ability due to inclusion of markers ranged from 7.7 to 35.7%. Correlation between observed and predictive values in the maize data set achieved values up to 0.79. Estimates of marker effects were different across environmental conditions, indicating that genotype × environment interaction is an important component of genetic variability. These results indicate that GS in plant breeding can be an effective strategy for selecting among lines whose phenotypes have yet to be observed.

676 citations


Journal ArticleDOI
01 Jul 2010-Genetics
TL;DR: The relative accuracy of GBLUP and BayesB for a given number of records and heritability are highly dependent on Me, which is a property of the target genome, as well as the architecture of the trait (NQTL).
Abstract: The rapid increase in high-throughput single-nucleotide polymorphism data has led to a great interest in applying genome-wide evaluation methods to identify an individual's genetic merit. Genome-wide evaluation combines statistical methods with genomic data to predict genetic values for complex traits. Considerable uncertainty currently exists in determining which genome-wide evaluation method is the most appropriate. We hypothesize that genome-wide methods deal differently with the genetic architecture of quantitative traits and genomes. A genomic linear method (GBLUP), and a genomic nonlinear Bayesian variable selection method (BayesB) are compared using stochastic simulation across three effective population sizes and a wide range of numbers of quantitative trait loci (NQTL). GBLUP had a constant accuracy, for a given heritability and sample size, regardless of NQTL. BayesB had a higher accuracy than GBLUP when NQTL was low, but this advantage diminished as NQTL increased and when NQTL became large, GBLUP slightly outperformed BayesB. In addition, deterministic equations are extended to predict the accuracy of both methods and to estimate the number of independent chromosome segments (Me) and NQTL. The predictions of accuracy and estimates of Me and NQTL were generally in good agreement with results from simulated data. We conclude that the relative accuracy of GBLUP and BayesB for a given number of records and heritability are highly dependent on Me, which is a property of the target genome, as well as the architecture of the trait (NQTL).

665 citations


Journal ArticleDOI
01 May 2010-Genetics
TL;DR: The approximation of marginal likelihood using thermodynamic integration in MIGRATE allows the evaluation of complex population genetic models, not only of whether sampling locations belong to a single panmictic population, but also of competing complex structured population models.
Abstract: For many biological investigations, groups of individuals are genetically sampled from several geographic locations. These sampling locations often do not reflect the genetic population structure. We describe a framework using marginal likelihoods to compare and order structured population models, such as testing whether the sampling locations belong to the same randomly mating population or comparing unidirectional and multidirectional gene flow models. In the context of inferences employing Markov chain Monte Carlo methods, the accuracy of the marginal likelihoods depends heavily on the approximation method used to calculate the marginal likelihood. Two methods, modified thermodynamic integration and a stabilized harmonic mean estimator, are compared. With finite Markov chain Monte Carlo run lengths, the harmonic mean estimator may not be consistent. Thermodynamic integration, in contrast, delivers considerably better estimates of the marginal likelihood. The choice of prior distributions does not influence the order and choice of the better models when the marginal likelihood is estimated using thermodynamic integration, whereas with the harmonic mean estimator the influence of the prior is pronounced and the order of the models changes. The approximation of marginal likelihood using thermodynamic integration in MIGRATE allows the evaluation of complex population genetic models, not only of whether sampling locations belong to a single panmictic population, but also of competing complex structured population models.

621 citations


Journal ArticleDOI
01 Nov 2010-Genetics
TL;DR: This review traces the history of the term “pleiotropy” and reevaluates its current place in the field of genetics and traces the different approaches to the study of pleiotropy.
Abstract: Pleiotropy is defined as the phenomenon in which a single locus affects two or more distinct phenotypic traits. The term was formally introduced into the literature by the German geneticist Ludwig Plate in 1910, 100 years ago. Pleiotropy has had an important influence on the fields of physiological and medical genetics as well as on evolutionary biology. Different approaches to the study of pleiotropy have led to incongruence in the way that it is perceived and discussed among researchers in these fields. Furthermore, our understanding of the term has changed quite a bit since 1910, particularly in light of modern molecular data. This review traces the history of the term “pleiotropy” and reevaluates its current place in the field of genetics.

395 citations


Journal ArticleDOI
01 Jun 2010-Genetics
TL;DR: Using simulations, the benefits of collecting replicated RNA sequencing data according to well known statistical designs that partition the sources of biological and technical variation are demonstrated.
Abstract: Next-generation sequencing technologies are quickly becoming the preferred approach for characterizing and quantifying entire genomes. Even though data produced from these technologies are proving to be the most informative of any thus far, very little attention has been paid to fundamental design aspects of data collection and analysis, namely sampling, randomization, replication, and blocking. We discuss these concepts in an RNA sequencing framework. Using simulations we demonstrate the benefits of collecting replicated RNA sequencing data according to well known statistical designs that partition the sources of biological and technical variation. Examples of these designs and their corresponding models are presented with the goal of testing differential expression.

390 citations


Journal ArticleDOI
01 Jul 2010-Genetics
TL;DR: In this article, the authors used a genome-wide data set of single nucleotide polymorphisms genotyped across 3059 functional genes to study patterns of population structure and identify loci associated with aridity across the natural range of loblolly pine (Pinus taeda L.).
Abstract: Natural populations of forest trees exhibit striking phenotypic adaptations to diverse environmental gradients, thereby making them appealing subjects for the study of genes underlying ecologically relevant phenotypes. Here, we use a genome-wide data set of single nucleotide polymorphisms genotyped across 3059 functional genes to study patterns of population structure and identify loci associated with aridity across the natural range of loblolly pine (Pinus taeda L.). Overall patterns of population structure, as inferred using principal components and Bayesian cluster analyses, were consistent with three genetic clusters likely resulting from expansions out of Pleistocene refugia located in Mexico and Florida. A novel application of association analysis, which removes the confounding effects of shared ancestry on correlations between genetic and environmental variation, identified five loci correlated with aridity. These loci were primarily involved with abiotic stress response to temperature and drought. A unique set of 24 loci was identified as FST outliers on the basis of the genetic clusters identified previously and after accounting for expansions out of Pleistocene refugia. These loci were involved with a diversity of physiological processes. Identification of nonoverlapping sets of loci highlights the fundamental differences implicit in the use of either method and suggests a pluralistic, yet complementary, approach to the identification of genes underlying ecologically relevant phenotypes.

360 citations


Journal ArticleDOI
01 Sep 2010-Genetics
TL;DR: It is shown that NGS of pools of individuals is often more effective in SNP discovery and provides more accurate allele frequency estimates, even when taking sequencing errors into account.
Abstract: Next generation sequencing (NGS) is about to revolutionize genetic analysis. Currently NGS techniques are mainly used to sequence individual genomes. Due to the high sequence coverage required, the costs for population-scale analyses are still too high to allow an extension to nonmodel organisms. Here, we show that NGS of pools of individuals is often more effective in SNP discovery and provides more accurate allele frequency estimates, even when taking sequencing errors into account. We modify the population genetic estimators Tajima's π and Watterson's θ to obtain unbiased estimates from NGS pooling data. Given the same sequencing effort, the resulting estimators often show a better performance than those obtained from individual sequencing. Although our analysis also shows that NGS of pools of individuals will not be preferable under all circumstances, it provides a cost-effective approach to estimate allele frequencies on a genome-wide scale.

358 citations


Journal ArticleDOI
01 Jun 2010-Genetics
TL;DR: The main aim here is to use whole-genome sequence data for the prediction of genetic values of individuals for complex traits and to explore the accuracy of such predictions, using a Bayesian nonlinear model.
Abstract: Whole-genome resequencing technology has improved rapidly during recent years and is expected to improve further such that the sequencing of an entire human genome sequence for $1000 is within reach. Our main aim here is to use whole-genome sequence data for the prediction of genetic values of individuals for complex traits and to explore the accuracy of such predictions. This is relevant for the fields of plant and animal breeding and, in human genetics, for the prediction of an individual's risk for complex diseases. Here, population history and genomic architectures were simulated under the Wright-Fisher population and infinite-sites mutation model, and prediction of genetic value was by the genomic selection approach, where a Bayesian nonlinear model was used to predict the effects of individual SNPs. The Bayesian model assumed a priori that only few SNPs are causative, i.e., have an effect different from zero. When using whole-genome sequence data, accuracies of prediction of genetic value were >40% increased relative to the use of dense approximately 30K SNP chips. At equal high density, the inclusion of the causative mutations yielded an extra increase of accuracy of 2.5-3.7%. Predictions of genetic value remained accurate even when the training and evaluation data were 10 generations apart. Best linear unbiased prediction (BLUP) of SNP effects does not take full advantage of the genome sequence data, and nonlinear predictions, such as the Bayesian method used here, are needed to achieve maximum accuracy. On the basis of theoretical work, the results could be extended to more realistic genome and population sizes.

Journal ArticleDOI
01 Oct 2010-Genetics
TL;DR: It is concluded that ZFN technology is an efficient and convenient alternative to conventional gene targeting and will greatly facilitate the rapid creation of mouse models and functional genomics research.
Abstract: Homologous recombination-based gene targeting using Mus musculus embryonic stem cells has greatly impacted biomedical research. This study presents a powerful new technology for more efficient and less time-consuming gene targeting in mice using embryonic injection of zinc-finger nucleases (ZFNs), which generate site-specific double strand breaks, leading to insertions or deletions via DNA repair by the nonhomologous end joining pathway. Three individual genes, multidrug resistant 1a (Mdr1a), jagged 1 (Jag1), and notch homolog 3 (Notch3), were targeted in FVB/N and C57BL/6 mice. Injection of ZFNs resulted in a range of specific gene deletions, from several nucleotides to >1000 bp in length, among 20-75% of live births. Modified alleles were efficiently transmitted through the germline, and animals homozygous for targeted modifications were obtained in as little as 4 months. In addition, the technology can be adapted to any genetic background, eliminating the need for generations of backcrossing to achieve congenic animals. We also validated the functional disruption of Mdr1a and demonstrated that the ZFN-mediated modifications lead to true knockouts. We conclude that ZFN technology is an efficient and convenient alternative to conventional gene targeting and will greatly facilitate the rapid creation of mouse models and functional genomics research.

Journal ArticleDOI
01 Nov 2010-Genetics
TL;DR: The results show that all three factors (genetic differentiation/gene flow, genetic diversity, and the sampling scheme) play a role in generating false bottleneck signals, and suggest an ad hoc method to counter this effect.
Abstract: The idea that molecular data should contain information on the recent evolutionary history of populations is rather old. However, much of the work carried out today owes to the work of the statisticians and theoreticians who demonstrated that it was possible to detect departures from equilibrium conditions (e.g., panmictic population/mutation–drift equilibrium) and interpret them in terms of deviations from neutrality or stationarity. During the last 20 years the detection of population size changes has usually been carried out under the assumption that samples were obtained from populations that can be approximated by a Wright–Fisher model (i.e., assuming panmixia, demographic stationarity, etc.). However, natural populations are usually part of spatial networks and are interconnected through gene flow. Here we simulated genetic data at mutation and migration–drift equilibrium under an n-island and a stepping-stone model. The simulated populations were thus stationary and not subject to any population size change. We varied the level of gene flow between populations and the scaled mutation rate. We also used several sampling schemes. We then analyzed the simulated samples using the Bayesian method implemented in MSVAR, the Markov Chain Monte Carlo simulation program, to detect and quantify putative population size changes using microsatellite data. Our results show that all three factors (genetic differentiation/gene flow, genetic diversity, and the sampling scheme) play a role in generating false bottleneck signals. We also suggest an ad hoc method to counter this effect. The confounding effect of population structure and of the sampling scheme has practical implications for many conservation studies. Indeed, if population structure is creating “spurious” bottleneck signals, the interpretation of bottleneck signals from genetic data might be less straightforward than it would seem, and several studies may have overestimated or incorrectly detected bottlenecks in endangered species.

Journal ArticleDOI
01 Sep 2010-Genetics
TL;DR: An extension of the original LK statistic (TLK) is proposed, named TF–LK, that uses a phylogenetic estimation of the population's kinship that accounts for historical branching and heterogeneity of genetic drift and represents one solution for compromise between advanced SNP genetic data acquisition and outlier analyses.
Abstract: Detecting genetic signatures of selection is of great interest for many research issues. Common approaches to separate selective from neutral processes focus on the variance of F(ST) across loci, as does the original Lewontin and Krakauer (LK) test. Modern developments aim to minimize the false positive rate and to increase the power, by accounting for complex demographic structures. Another stimulating goal is to develop straightforward parametric and computationally tractable tests to deal with massive SNP data sets. Here, we propose an extension of the original LK statistic (T(LK)), named T(F-LK), that uses a phylogenetic estimation of the population's kinship (F) matrix, thus accounting for historical branching and heterogeneity of genetic drift. Using forward simulations of single-nucleotide polymorphisms (SNPs) data under neutrality and selection, we confirm the relative robustness of the LK statistic (T(LK)) to complex demographic history but we show that T(F-LK) is more powerful in most cases. This new statistic outperforms also a multinomial-Dirichlet-based model [estimation with Markov chain Monte Carlo (MCMC)], when historical branching occurs. Overall, T(F-LK) detects 15-35% more selected SNPs than T(LK) for low type I errors (P < 0.001). Also, simulations show that T(LK) and T(F-LK) follow a chi-square distribution provided the ancestral allele frequencies are not too extreme, suggesting the possible use of the chi-square distribution for evaluating significance. The empirical distribution of T(F-LK) can be derived using simulations conditioned on the estimated F matrix. We apply this new test to pig breeds SNP data and pinpoint outliers using T(F-LK), otherwise undetected using the less powerful T(LK) statistic. This new test represents one solution for compromise between advanced SNP genetic data acquisition and outlier analyses.

Journal ArticleDOI
01 Dec 2010-Genetics
TL;DR: The main events that influenced the thinking about transposable elements as a young scientist are summarized and the influence and role of these specific genomic elements in evolution over subsequent years are summarized.
Abstract: The idea that some genetic factors are able to move around chromosomes emerged more than 60 years ago when Barbara McClintock first suggested that such elements existed and had a major role in controlling gene expression and that they also have had a major influence in reshaping genomes in evolution. It was many years, however, before the accumulation of data and theories showed that this latter revolutionary idea was correct although, understandably, it fell far short of our present view of the significant influence of what are now known as “transposable elements” in evolution. In this article, I summarize the main events that influenced my thinking about transposable elements as a young scientist and the influence and role of these specific genomic elements in evolution over subsequent years. Today, we recognize that the findings about genomic changes affected by transposable elements have considerably altered our view of the ways in which genomes evolve and work.

Journal ArticleDOI
01 Jan 2010-Genetics
TL;DR: This work proposes a reformulation of the regression adjustment of population subdivision among western chimpanzees in terms of a general linear model (GLM), which allows the integration into the sound theoretical framework of Bayesian statistics and the use of its methods, including model selection via Bayes factors.
Abstract: Until recently, the use of Bayesian inference was limited to a few cases because for many realistic probability models the likelihood function cannot be calculated analytically. The situation changed with the advent of likelihood-free inference algorithms, often subsumed under the term approximate Bayesian computation (ABC). A key innovation was the use of a postsampling regression adjustment, allowing larger tolerance values and as such shifting computation time to realistic orders of magnitude. Here we propose a reformulation of the regression adjustment in terms of a general linear model (GLM). This allows the integration into the sound theoretical framework of Bayesian statistics and the use of its methods, including model selection via Bayes factors. We then apply the proposed methodology to the question of population subdivision among western chimpanzees, Pan troglodytes verus.

Journal ArticleDOI
01 Nov 2010-Genetics
TL;DR: This work proposes an alternative in which lambda exonuclease entirely degrades one strand, while leaving the other strand intact as single-stranded DNA, which then recombines via beta recombinase-catalyzed annealing at the replication fork.
Abstract: The phage lambda-derived Red recombination system is a powerful tool for making targeted genetic changes in Escherichia coli, providing a simple and versatile method for generating insertion, deletion, and point mutations on chromosomal, plasmid, or BAC targets. However, despite the common use of this system, the detailed mechanism by which lambda Red mediates double-stranded DNA recombination remains uncertain. Current mechanisms posit a recombination intermediate in which both 5′ ends of double-stranded DNA are recessed by λ exonuclease, leaving behind 3′ overhangs. Here, we propose an alternative in which lambda exonuclease entirely degrades one strand, while leaving the other strand intact as single-stranded DNA. This single-stranded intermediate then recombines via beta recombinase-catalyzed annealing at the replication fork. We support this by showing that single-stranded gene insertion cassettes are recombinogenic and that these cassettes preferentially target the lagging strand during DNA replication. Furthermore, a double-stranded DNA cassette containing multiple internal mismatches shows strand-specific mutations cosegregating roughly 80% of the time. These observations are more consistent with our model than with previously proposed models. Finally, by using phosphorothioate linkages to protect the lagging-targeting strand of a double-stranded DNA cassette, we illustrate how our new mechanistic knowledge can be used to enhance lambda Red recombination frequency. The mechanistic insights revealed by this work may facilitate further improvements to the versatility of lambda Red recombination.

Journal ArticleDOI
01 Mar 2010-Genetics
TL;DR: It is demonstrated that the approach improves the accuracy of allele phasing as well as imputation of missing genotypes, and is computationally effective at handling large data sets based on high-density SNP panels.
Abstract: Faithful reconstruction of haplotypes from diploid marker data (phasing) is important for many kinds of genetic analyses, including mapping of trait loci, prediction of genomic breeding values, and identification of signatures of selection. In human genetics, phasing most often exploits population information (linkage disequilibrium), while in animal genetics the primary source of information is familial (Mendelian segregation and linkage). We herein develop and evaluate a method that simultaneously exploits both sources of information. It builds on hidden Markov models that were initially developed to exploit population information only. We demonstrate that the approach improves the accuracy of allele phasing as well as imputation of missing genotypes. Reconstructed haplotypes are assigned to hidden states that are shown to correspond to clusters of genealogically related chromosomes. We show that these cluster states can directly be used to fine map QTL. The method is computationally effective at handling large data sets based on high-density SNP panels.

Journal ArticleDOI
01 Oct 2010-Genetics
TL;DR: It is argued that even in widely dispersing species, such parallel geographic sweeps may be surprisingly common, and predicted that as more data become available, many more examples of intraspecies parallel adaptation will be uncovered.
Abstract: Models for detecting the effect of adaptation on population genomic diversity are often predicated on a single newly arisen mutation sweeping rapidly to fixation. However, a population can also adapt to a new environment by multiple mutations of similar phenotypic effect that arise in parallel, at the same locus or different loci. These mutations can each quickly reach intermediate frequency, preventing any single one from rapidly sweeping to fixation globally, leading to a “soft” sweep in the population. Here we study various models of parallel mutation in a continuous, geographically spread population adapting to a global selection pressure. The slow geographic spread of a selected allele due to limited dispersal can allow other selected alleles to arise and start to spread elsewhere in the species range. When these different selected alleles meet, their spread can slow dramatically and so initially form a geographic patchwork, a random tessellation, which could be mistaken for a signal of local adaptation. This spatial tessellation will dissipate over time due to mixing by migration, leaving a set of partial sweeps within the global population. We show that the spatial tessellation initially formed by mutational types is closely connected to Poisson process models of crystallization, which we extend. We find that the probability of parallel mutation and the spatial scale on which parallel mutation occurs are captured by a single compound parameter, a characteristic length, which reflects the expected distance a spreading allele travels before it encounters a different spreading allele. This characteristic length depends on the mutation rate, the dispersal parameter, the effective local density of individuals, and to a much lesser extent the strength of selection. While our knowledge of these parameters is poor, we argue that even in widely dispersing species, such parallel geographic sweeps may be surprisingly common. Thus, we predict that as more data become available, many more examples of intraspecies parallel adaptation will be uncovered.

Journal ArticleDOI
01 Jul 2010-Genetics
TL;DR: This GWA analysis identified the two major polymorphic loci controlling GSL variation in natural populations within large blocks of positive associations encompassing dozens of genes that are likely linked with the formation of new defensive chemistries that alter plant fitness in natural environments.
Abstract: With the improvement and decline in cost of high-throughput genotyping and phenotyping technologies, genome-wide association (GWA) studies are fast becoming a preferred approach for dissecting complex quantitative traits. Glucosinolate (GSL) secondary metabolites within Arabidopsis spp. can serve as a model system to understand the genomic architecture of quantitative traits. GSLs are key defenses against insects in the wild and the relatively large number of cloned quantitative trait locus (QTL) controlling GSL traits allows comparison of GWA to previous QTL analyses. To better understand the specieswide genomic architecture controlling plant-insect interactions and the relative strengths of GWA and QTL studies, we conducted a GWA mapping study using 96 A. thaliana accessions, 43 GSL phenotypes, and ∼230,000 SNPs. Our GWA analysis identified the two major polymorphic loci controlling GSL variation (AOP and MAM) in natural populations within large blocks of positive associations encompassing dozens of genes. These blocks of positive associations showed extended linkage disequilibrium (LD) that we hypothesize to have arisen from balancing or fluctuating selective sweeps at both the AOP and MAM loci. These potential sweep blocks are likely linked with the formation of new defensive chemistries that alter plant fitness in natural environments. Interestingly, this GWA analysis did not identify the majority of previously identified QTL even though these polymorphisms were present in the GWA population. This may be partly explained by a nonrandom distribution of phenotypic variation across population subgroups that links population structure and GSL variation, suggesting that natural selection can hinder the detection of phenotype–genotype associations in natural populations.

Journal ArticleDOI
01 Jul 2010-Genetics
TL;DR: This study focuses on the development of methods that can be used to distinguish neutral from selective hypotheses in equilibrium and nonequilibrium populations and recurrent selection models, and introduces methods from the machine-learning field.
Abstract: A major goal of population genomics is to reconstruct the history of natural populations and to infer the neutral and selective scenarios that can explain the present-day polymorphism patterns. However, the separation between neutral and selective hypotheses has proven hard, mainly because both may predict similar patterns in the genome. This study focuses on the development of methods that can be used to distinguish neutral from selective hypotheses in equilibrium and nonequilibrium populations. These methods utilize a combination of statistics on the basis of the site frequency spectrum (SFS) and linkage disequilibrium (LD). We investigate the patterns of genetic variation along recombining chromosomes using a multitude of comparisons between neutral and selective hypotheses, such as selection or neutrality in equilibrium and nonequilibrium populations and recurrent selection models. We perform hypothesis testing using the classical P-value approach, but we also introduce methods from the machine-learning field. We demonstrate that the combination of SFS- and LD-based statistics increases the power to detect recent positive selection in populations that have experienced past demographic changes.

Journal ArticleDOI
01 Dec 2010-Genetics
TL;DR: It is argued that a simple overall pattern of diminishing-returns adaptation emerges, despite pervasive epistasis between beneficial mutations, because many beneficial mutations are available, and while the fitness landscape is rugged at the fine scale, it is smooth and regular when the authors consider the average over possible routes to adaptation.
Abstract: Because adaptation depends upon the fixation of novel beneficial mutations, the fitness effects of beneficial mutations that are substituted by selection are key to our understanding of the process of adaptation. In this study, we experimentally investigated the fitness effects of beneficial mutations that are substituted when populations of the pathogenic bacterium Pseudomonas aeruginosa adapt to the antibiotic rifampicin. Specifically, we isolated the first beneficial mutation to be fixed by selection when 96 populations of three different genotypes of P. aeruginosa that vary considerably in fitness in the presence of rifampicin were challenged with adapting to a high dose of this antibiotic. The simple genetics of rifampicin resistance allowed us to determine the genetic basis of adaptation in the majority of our populations. We show that the average fitness effects of fixed beneficial mutations show a simple and clear pattern of diminishing returns, such that selection tends to fix mutations with progressively smaller effects as populations approach a peak on the adaptive landscape. The fitness effects of individual mutations, on the other hand, are highly idiosyncratic across genetic backgrounds, revealing pervasive epistasis. In spite of this complexity of genetic interactions in this system, there is an overall tendency toward diminishing-returns epistasis. We argue that a simple overall pattern of diminishing-returns adaptation emerges, despite pervasive epistasis between beneficial mutations, because many beneficial mutations are available, and while the fitness landscape is rugged at the fine scale, it is smooth and regular when we consider the average over possible routes to adaptation. In the context of antibiotic resistance, these results show that acquiring mutations that confer low levels of antibiotic resistance does not impose any constraint on the ability to evolve high levels of resistance.

Journal ArticleDOI
01 Nov 2010-Genetics
TL;DR: This work uses a haploid, three-locus, binary genetic model to describe the conditions under which indirect associations become stronger than any of the causative associations in the organism—even to the point of representing the only associations present in the data.
Abstract: Genome-wide association mapping is a popular method for using natural variation within a species to generate a genotype–phenotype map. Statistical association between an allele at a locus and the trait in question is used as evidence that variation at the locus is responsible for variation of the trait. Indirect association, however, can give rise to statistically significant results at loci unrelated to the trait. We use a haploid, three-locus, binary genetic model to describe the conditions under which these indirect associations become stronger than any of the causative associations in the organism—even to the point of representing the only associations present in the data. These indirect associations are the result of disequilibrium between multiple factors affecting a single trait. Epistasis and population structure can exacerbate the problem but are not required to create it. From a statistical point of view, indirect associations are true associations rather than the result of stochastic noise: they will not be ameliorated by increasing sampling size or marker density and can be reproduced in independent studies.

Journal ArticleDOI
01 Dec 2010-Genetics
TL;DR: A new method to reconstruct the history of recombination events that affected a given sample of bacterial genomes by introducing a mathematical model that represents both the donor and the recipient of each DNA import as an ancestor of the genomes in the sample.
Abstract: Bacteria and archaea reproduce clonally, but sporadically import DNA into their chromosomes from other organisms. In many of these events, the imported DNA replaces an homologous segment in the recipient genome. Here we present a new method to reconstruct the history of recombination events that affected a given sample of bacterial genomes. We introduce a mathematical model that represents both the donor and the recipient of each DNA import as an ancestor of the genomes in the sample. The model represents a simplification of the previously described coalescent with gene conversion. We implement a Monte Carlo Markov chain algorithm to perform inference under this model from sequence data alignments and show that inference is feasible for whole-genome alignments through parallelization. Using simulated data, we demonstrate accurate and reliable identification of individual recombination events and global recombination rate parameters. We applied our approach to an alignment of 13 whole genomes from the Bacillus cereus group. We find, as expected from laboratory experiments, that the recombination rate is higher between closely related organisms and also that the genome contains several broad regions of elevated levels of recombination. Application of the method to the genomic data sets that are becoming available should reveal the evolutionary history and private lives of populations of bacteria and archaea. The methods described in this article have been implemented in a computer software package, ClonalOrigin, which is freely available from http://code.google.com/p/clonalorigin/.

Journal ArticleDOI
01 Sep 2010-Genetics
TL;DR: It is stressed the importance of studying taxa suitable for testing hypotheses, and the need for phylogenetic studies directed to taxa where the patterns of changes can be most reliably inferred, if the ultimate goal of testing hypotheses regarding the selective forces that have led to changes in such an essential trait is to become feasible.
Abstract: The ability to identify genetic markers in nonmodel systems has allowed geneticists to construct linkage maps for a diversity of species, and the sex-determining locus is often among the first to be mapped. Sex determination is an important area of study in developmental and evolutionary biology, as well as ecology. Its importance for organisms might suggest that sex determination is highly conserved. However, genetic studies have shown that sex determination mechanisms, and the genes involved, are surprisingly labile. We review studies using genetic mapping and phylogenetic inferences, which can help reveal evolutionary pattern within this lability and potentially identify the changes that have occurred among different sex determination systems. We define some of the terminology, particularly where confusion arises in writing about such a diverse range of organisms, and highlight some major differences between plants and animals, and some important similarities. We stress the importance of studying taxa suitable for testing hypotheses, and the need for phylogenetic studies directed to taxa where the patterns of changes can be most reliably inferred, if the ultimate goal of testing hypotheses regarding the selective forces that have led to changes in such an essential trait is to become feasible.

Journal ArticleDOI
01 Oct 2010-Genetics
TL;DR: Sex-antagonistic alleles can become more strongly associated with pleiotropically dominant sex-determining factors, which may help to explain biases in the rates of transitions between male and female heterogamety.
Abstract: Many animal taxa show frequent and rapid transitions between male heterogamety (XY) and female heterogamety (ZW). We develop a model showing how these transitions can be driven by sex-antagonistic selection. Sex-antagonistic selection acting on loci linked to a new sex-determination mutation can cause it to invade, but when acting on loci linked to the ancestral sex-determination gene will inhibit an invasion. The strengths of the consequent indirect selection on the old and new sex-determination loci are mediated by the strengths of sex-antagonistic selection, linkage between the sex-antagonistic and sex-determination genes, and the amount of genetic variation. Sex-antagonistic loci that are tightly linked to a sex-determining gene have a vastly stronger influence on the balance of selection than more distant loci. As a result, changes in linkage, caused, for example, by an inversion that captures a sex-determination mutation and a gene under sex-antagonistic selection, can trigger transitions between XY and ZW systems. Sex-antagonistic alleles can become more strongly associated with pleiotropically dominant sex-determining factors, which may help to explain biases in the rates of transitions between male and female heterogamety. Deleterious recessive mutations completely linked to the ancestral Y chromosome can prevent invasion of a neo-W chromosome or result in a stable equilibrium at which XY and ZW systems segregate simultaneously at two linkage groups.

Journal ArticleDOI
01 Sep 2010-Genetics
TL;DR: A novel strategy is established that employs whole-genome sequencing to simultaneously map and identify mutations without the need for any prior genetic mapping.
Abstract: Mutant screens have proven powerful for genetic dissection of a myriad of biological processes, but subsequent identification and isolation of the causative mutations are usually complex and time consuming. We have made the process easier by establishing a novel strategy that employs whole-genome sequencing to simultaneously map and identify mutations without the need for any prior genetic mapping.

Journal ArticleDOI
01 Jun 2010-Genetics
TL;DR: The spectrum of mutations induced by three commonly used mutagens: ethyl methanesulfonate (EMS), N-ethyl-N-nitrosourea (ENU), and ultraviolet trimethylpsoralen (UV/TMP) in the nematode Caenorhabditis elegans is described and the strong GC to AT transition bias of EMS is confirmed.
Abstract: Deep sequencing offers an unprecedented view of an organism's genome. We describe the spectrum of mutations induced by three commonly used mutagens: ethyl methanesulfonate (EMS), N-ethyl-N-nitrosourea (ENU), and ultraviolet trimethylpsoralen (UV/TMP) in the nematode Caenorhabditis elegans. Our analysis confirms the strong GC to AT transition bias of EMS. We found that ENU mainly produces A to T and T to A transversions, but also all possible transitions. We found no bias for any specific transition or transversion in the spectrum of UV/TMP-induced mutations. In 10 mutagenized strains we identified 2723 variants, of which 508 are expected to alter or disrupt gene function, including 21 nonsense mutations and 10 mutations predicted to affect mRNA splicing. This translates to an average of 50 informative mutations per strain. We also present evidence of genetic drift among laboratory wild-type strains derived from the Bristol N2 strain. We make several suggestions for best practice using massively parallel short read sequencing to ensure mutation detection.

Journal ArticleDOI
01 Feb 2010-Genetics
TL;DR: The theory connects gene-level relative polymorphism and divergence with the spatial and temporal frequency of environments inducing gene expression, and suggests that null hypotheses for levels of standing genetic variation and sequence divergence must be corrected to account for the frequency of expression of the genes under study.
Abstract: Conditionally expressed genes have the property that every individual in a population carries and transmits the gene, but only a fraction, f, expresses the gene and exposes it to natural selection. We show that a consequence of this pattern of inheritance and expression is a weakening of the strength of natural selection, allowing deleterious mutations to accumulate within and between species and inhibiting the spread of beneficial mutations. We extend previous theory to show that conditional expression in space and time have approximately equivalent effects on relaxing the strength of selection and that the effect holds in a spatially heterogeneous environment even with low migration rates among patches. We support our analytical approximations with computer simulations and delineate the parameter range under which the approximations fail. We model the effects of conditional expression on sequence polymorphism at mutation‐selection‐drift equilibrium, allowing for neutral sites, and show that sequence variation within and between species is inflated by conditional expression, with the effect being strongest in populations with large effective size. As f decreases, more sites are recruited into neutrality, leading to pseudogenization and increased drift load. Mutation accumulation diminishes the degree of adaptation of conditionally expressed genes to rare environments, and the mutational cost of phenotypic plasticity, which we quantify as the plasticity load, is greater for more rarely expressed genes. Our theory connects gene-level relative polymorphism and divergence with the spatial and temporal frequency of environments inducing gene expression. Our theory suggests that null hypotheses for levels of standing genetic variation and sequence divergence must be corrected to account for the frequency of expression of the genes under study.

Journal ArticleDOI
01 Apr 2010-Genetics
TL;DR: Modulating the in vivo activity of hsp110 or tra1, two hits from the screen, affects neurodegeneration in a dose-dependent manner in a Drosophila model of Huntington's disease, suggesting that other aggregates regulators isolated in this screen may identify additional genes involved in the protein-folding pathway and neurotoxicity.
Abstract: Protein aggregates are a common pathological feature of most neurodegenerative diseases (NDs). Understanding their formation and regulation will help clarify their controversial roles in disease pathogenesis. To date, there have been few systematic studies of aggregates formation in Drosophila, a model organism that has been applied extensively in modeling NDs and screening for toxicity modifiers. We generated transgenic fly lines that express enhanced-GFP-tagged mutant Huntingtin (Htt) fragments with different lengths of polyglutamine (polyQ) tract and showed that these Htt mutants develop protein aggregates in a polyQ-length- and age-dependent manner in Drosophila. To identify central regulators of protein aggregation, we further generated stable Drosophila cell lines expressing these Htt mutants and also established a cell-based quantitative assay that allows automated measurement of aggregates within cells. We then performed a genomewide RNA interference screen for regulators of mutant Htt aggregation and isolated 126 genes involved in diverse cellular processes. Interestingly, although our screen focused only on mutant Htt aggregation, several of the identified candidates were known previously as toxicity modifiers of NDs. Moreover, modulating the in vivo activity of hsp110 ( CG6603 ) or tra1 , two hits from the screen, affects neurodegeneration in a dose-dependent manner in a Drosophila model of Huntington9s disease. Thus, other aggregates regulators isolated in our screen may identify additional genes involved in the protein-folding pathway and neurotoxicity.