scispace - formally typeset
Search or ask a question

Showing papers in "Genetics in 2018"


Journal ArticleDOI
01 Oct 2018-Genetics
TL;DR: A method for inferring population structure through PCA in an iterative heuristic approach of estimating individual allele frequencies is proposed, where improved accuracy in samples with low and variable sequencing depth for both simulated and real datasets are demonstrated.
Abstract: We here present two methods for inferring population structure and admixture proportions in low-depth next-generation sequencing (NGS) data. Inference of population structure is essential in both population genetics and association studies, and is often performed using principal component analysis (PCA) or clustering-based approaches. NGS methods provide large amounts of genetic data but are associated with statistical uncertainty, especially for low-depth sequencing data. Models can account for this uncertainty by working directly on genotype likelihoods of the unobserved genotypes. We propose a method for inferring population structure through PCA in an iterative heuristic approach of estimating individual allele frequencies, where we demonstrate improved accuracy in samples with low and variable sequencing depth for both simulated and real datasets. We also use the estimated individual allele frequencies in a fast non-negative matrix factorization method to estimate admixture proportions. Both methods have been implemented in the PCAngsd framework available at http://www.popgen.dk/software/.

350 citations


Journal ArticleDOI
01 Nov 2018-Genetics
TL;DR: A streamlined and optimized editing protocol for the nematode Caenorhabditis elegans is described, and its efficacy, flexibility, and cost-effectiveness are demonstrated by affinity-tagging 14 Argonaute proteins in C. elegans using ssODN donors.
Abstract: CRISPR-based genome editing using ribonucleoprotein complexes and synthetic single-stranded oligodeoxynucleotide (ssODN) donors can be highly effective. However, reproducibility can vary, and precise, targeted integration of longer constructs—such as green fluorescent protein tags remains challenging in many systems. Here, we describe a streamlined and optimized editing protocol for the nematode Caenorhabditis elegans. We demonstrate its efficacy, flexibility, and cost-effectiveness by affinity-tagging 14 Argonaute proteins in C. elegans using ssODN donors. In addition, we describe a novel PCR-based, partially single-stranded, “hybrid” donor design that yields high efficiency editing with large (kilobase-scale) constructs. We use these hybrid donors to introduce fluorescent protein tags into multiple loci, achieving editing efficiencies that approach those previously obtained only with much shorter ssODN donors. The principals and strategies described here are likely to translate to other systems, and should allow researchers to reproducibly and efficiently obtain both long and short precision genome edits.

301 citations


Journal ArticleDOI
01 Oct 2018-Genetics
TL;DR: This review summarizes the current knowledge of both the formation and function of the Drosophila melanogaster digestive tract, with a major focus on its main digestive/absorptive portion: the strikingly adaptable adult midgut.
Abstract: The gastrointestinal tract has recently come to the forefront of multiple research fields. It is now recognized as a major source of signals modulating food intake, insulin secretion and energy balance. It is also a key player in immunity and, through its interaction with microbiota, can shape our physiology and behavior in complex and sometimes unexpected ways. The insect intestine had remained, by comparison, relatively unexplored until the identification of adult somatic stem cells in the Drosophila intestine over a decade ago. Since then, a growing scientific community has exploited the genetic amenability of this insect organ in powerful and creative ways. By doing so, we have shed light on a broad range of biological questions revolving around stem cells and their niches, interorgan signaling and immunity. Despite their relatively recent discovery, some of the mechanisms active in the intestine of flies have already been shown to be more widely applicable to other gastrointestinal systems, and may therefore become relevant in the context of human pathologies such as gastrointestinal cancers, aging, or obesity. This review summarizes our current knowledge of both the formation and function of the Drosophila melanogaster digestive tract, with a major focus on its main digestive/absorptive portion: the strikingly adaptable adult midgut.

268 citations


Journal ArticleDOI
17 Aug 2018-Genetics
TL;DR: Although it is found that updates to the orthology-prediction methods significantly changed the landscape of C. elegans–human orthologs predicted by individual programs and—unexpectedly—reduced agreement among them, it is shown that the meta-analysis approach “buffered” against changes in gene content.
Abstract: OrthoList, a compendium of Caenorhabditis elegans genes with human orthologs compiled in 2011 by a meta-analysis of four orthology-prediction methods, has been a popular tool for identifying conserved genes for research into biological and disease mechanisms. However, the efficacy of orthology prediction depends on the accuracy of gene-model predictions, an ongoing process, and orthology-prediction algorithms have also been updated over time. Here we present OrthoList 2 (OL2), a new comparative genomic analysis between C. elegans and humans, and the first assessment of how changes over time affect the landscape of predicted orthologs between two species. Although we find that updates to the orthology-prediction methods significantly changed the landscape of C. elegans–human orthologs predicted by individual programs and—unexpectedly—reduced agreement among them, we also show that our meta-analysis approach “buffered” against changes in gene content. We show that adding results from more programs did not lead to many additions to the list and discuss reasons to avoid assigning “scores” based on support by individual orthology-prediction programs; the treatment of “legacy” genes no longer predicted by these programs; and the practical difficulties of updating due to encountering deprecated, changed, or retired gene identifiers. In addition, we consider what other criteria may support claims of orthology and alternative approaches to find potential orthologs that elude identification by these programs. Finally, we created a new web-based tool that allows for rapid searches of OL2 by gene identifiers, protein domains [InterPro and SMART (Simple Modular Architecture Research Tool], or human disease associations ([OMIM (Online Mendelian Inheritence in Man], and also includes available RNA-interference resources to facilitate potential translational cross-species studies.

199 citations


Journal ArticleDOI
01 Sep 2018-Genetics
TL;DR: A statistical framework for the simultaneous inference of continuous and discrete patterns of population structure is presented, which addresses the “clines versus clusters” problem in modeling population genetic variation, and remedies some of the overfitting to which nonspatial models are prone.
Abstract: A classic problem in population genetics is the characterization of discrete population structure in the presence of continuous patterns of genetic differentiation. Especially when sampling is discontinuous, the use of clustering or assignment methods may incorrectly ascribe differentiation due to continuous processes (e.g., geographic isolation by distance) to discrete processes, such as geographic, ecological, or reproductive barriers between populations. This reflects a shortcoming of current methods for inferring and visualizing population structure when applied to genetic data deriving from geographically distributed populations. Here, we present a statistical framework for the simultaneous inference of continuous and discrete patterns of population structure. The method estimates ancestry proportions for each sample from a set of two-dimensional population layers, and, within each layer, estimates a rate at which relatedness decays with distance. This thereby explicitly addresses the "clines versus clusters" problem in modeling population genetic variation, and remedies some of the overfitting to which nonspatial models are prone. The method produces useful descriptions of structure in genetic relatedness in situations where separated, geographically distributed populations interact, as after a range expansion or secondary contact. We demonstrate the utility of this approach using simulations and by applying it to empirical datasets of poplars and black bears in North America.

195 citations


Journal ArticleDOI
01 Mar 2018-Genetics
TL;DR: These data will facilitate a vast number of scientific inquiries into the function of individual TFs in key developmental, metabolic, and defense and homeostatic regulatory pathways, as well as provide a broader perspective on how individualTFs work together in local networks and globally across the life spans of these two key model organisms.
Abstract: To develop a catalog of regulatory sites in two major model organisms, Drosophila melanogaster and Caenorhabditis elegans, the modERN (model organism Encyclopedia of Regulatory Networks) consortium has systematically assayed the binding sites of transcription factors (TFs). Combined with data produced by our predecessor, modENCODE (Model Organism ENCyclopedia Of DNA Elements), we now have data for 262 TFs identifying 1.23 M sites in the fly genome and 217 TFs identifying 0.67 M sites in the worm genome. Because sites from different TFs are often overlapping and tightly clustered, they fall into 91,011 and 59,150 regions in the fly and worm, respectively, and these binding sites span as little as 8.7 and 5.8 Mb in the two organisms. Clusters with large numbers of sites (so-called high occupancy target, or HOT regions) predominantly associate with broadly expressed genes, whereas clusters containing sites from just a few factors are associated with genes expressed in tissue-specific patterns. All of the strains expressing GFP-tagged TFs are available at the stock centers, and the chromatin immunoprecipitation sequencing data are available through the ENCODE Data Coordinating Center and also through a simple interface (http://epic.gs.washington.edu/modERN/) that facilitates rapid accessibility of processed data sets. These data will facilitate a vast number of scientific inquiries into the function of individual TFs in key developmental, metabolic, and defense and homeostatic regulatory pathways, as well as provide a broader perspective on how individual TFs work together in local networks and globally across the life spans of these two key model organisms.

145 citations


Journal ArticleDOI
01 Aug 2018-Genetics
TL;DR: It is argued that it is not sufficient to accommodate developmental bias into evolutionary theory merely as a constraint on evolutionary adaptation, and recent theory on regulatory networks that explains why the influence of genetic and environmental perturbation on phenotypes is typically not uniform, and may even be biased toward adaptive phenotypic variation is described.
Abstract: Phenotypic variation is generated by the processes of development, with some variants arising more readily than others-a phenomenon known as "developmental bias." Developmental bias and natural selection have often been portrayed as alternative explanations, but this is a false dichotomy: developmental bias can evolve through natural selection, and bias and selection jointly influence phenotypic evolution. Here, we briefly review the evidence for developmental bias and illustrate how it is studied empirically. We describe recent theory on regulatory networks that explains why the influence of genetic and environmental perturbation on phenotypes is typically not uniform, and may even be biased toward adaptive phenotypic variation. We show how bias produced by developmental processes constitutes an evolving property able to impose direction on adaptive evolution and influence patterns of taxonomic and phenotypic diversity. Taking these considerations together, we argue that it is not sufficient to accommodate developmental bias into evolutionary theory merely as a constraint on evolutionary adaptation. The influence of natural selection in shaping developmental bias, and conversely, the influence of developmental bias in shaping subsequent opportunities for adaptation, requires mechanistic models of development to be expanded and incorporated into evolutionary theory. A regulatory network perspective on phenotypic evolution thus helps to integrate the generation of phenotypic variation with natural selection, leaving evolutionary biology better placed to explain how organisms adapt and diversify.

140 citations


Journal ArticleDOI
01 Jan 2018-Genetics
TL;DR: An overview of how CRISPR-Cas gene editing has revolutionized genetic analysis in Drosophila is provided and key areas for future advances are highlighted.
Abstract: Drosophila has long been a premier model for the development and application of cutting-edge genetic approaches. The CRISPR-Cas system now adds the ability to manipulate the genome with ease and precision, providing a rich toolbox to interrogate relationships between genotype and phenotype, to delineate and visualize how the genome is organized, to illuminate and manipulate RNA, and to pioneer new gene drive technologies. Myriad transformative approaches have already originated from the CRISPR-Cas system, which will likely continue to spark the creation of tools with diverse applications. Here, we provide an overview of how CRISPR-Cas gene editing has revolutionized genetic analysis in Drosophila and highlight key areas for future advances.

140 citations


Journal ArticleDOI
13 Mar 2018-Genetics
TL;DR: This report describes a set of over 2800 transgenic lines for use with the split-GAL4 intersectional method, in which expression of the transgene only occurs where two different enhancers overlap in their expression patterns, to achieve the desired specificity.
Abstract: The ability to reproducibly target expression of transgenes to small, defined subsets of cells is a key experimental tool for understanding many biological processes. The Drosophila nervous system contains thousands of distinct cell types and it has generally not been possible to limit expression to one or a few cell types when using a single segment of genomic DNA as an enhancer to drive expression. Intersectional methods, in which expression of the transgene only occurs where two different enhancers overlap in their expression patterns, can be used to achieve the desired specificity. This report describes a set of over 2800 transgenic lines for use with the split-GAL4 intersectional method.

138 citations


Journal ArticleDOI
01 Nov 2018-Genetics
TL;DR: It is suggested that more research is needed to adapt CNN methodology, originally motivated by image analysis, to genetic-based problems in order for CNNs to be competitive with linear models.
Abstract: The genetic analysis of complex traits does not escape the current excitement around artificial intelligence, including a renewed interest in “deep learning” (DL) techniques such as Multilayer Perceptrons (MLPs) and Convolutional Neural Networks (CNNs). However, the performance of DL for genomic prediction of complex human traits has not been comprehensively tested. To provide an evaluation of MLPs and CNNs, we used data from distantly related white Caucasian individuals (n ∼100k individuals, m ∼500k SNPs, and k = 1000) of the interim release of the UK Biobank. We analyzed a total of five phenotypes: height, bone heel mineral density, body mass index, systolic blood pressure, and waist–hip ratio, with genomic heritabilities ranging from ∼0.20 to 0.70. After hyperparameter optimization using a genetic algorithm, we considered several configurations, from shallow to deep learners, and compared the predictive performance of MLPs and CNNs with that of Bayesian linear regressions across sets of SNPs (from 10k to 50k) that were preselected using single-marker regression analyses. For height, a highly heritable phenotype, all methods performed similarly, although CNNs were slightly but consistently worse. For the rest of the phenotypes, the performance of some CNNs was comparable or slightly better than linear methods. Performance of MLPs was highly dependent on SNP set and phenotype. In all, over the range of traits evaluated in this study, CNN performance was competitive to linear models, but we did not find any case where DL outperformed the linear model by a sizable margin. We suggest that more research is needed to adapt CNN methodology, originally motivated by image analysis, to genetic-based problems in order for CNNs to be competitive with linear models.

137 citations


Journal ArticleDOI
01 May 2018-Genetics
TL;DR: The goal of this paper is to provide a review of the recent taxonomic changes and phylogenetic relationships in this genus to aid in further comparative studies.
Abstract: Understanding phylogenetic relationships among taxa is key to designing and implementing comparative analyses. The genus Drosophila, which contains over 1600 species, is one of the most important model systems in the biological sciences. For over a century, one species in this group, Drosophila melanogaster, has been key to studies of animal development and genetics, genome organization and evolution, and human disease. As whole-genome sequencing becomes more cost-effective, there is increasing interest in other members of this morphologically, ecologically, and behaviorally diverse genus. Phylogenetic relationships within Drosophila are complicated, and the goal of this paper is to provide a review of the recent taxonomic changes and phylogenetic relationships in this genus to aid in further comparative studies.

Journal ArticleDOI
01 Sep 2018-Genetics
TL;DR: It is shown how the use of an unbiased FST estimator may question the interpretation of population structure inferred from previous analyses, and the robustness of the estimator to model misspecification, such as sequencing errors and uneven contributions of individual DNAs to the pools.
Abstract: The advent of high throughput sequencing and genotyping technologies enables the comparison of patterns of polymorphisms at a very large number of markers. While the characterization of genetic structure from individual sequencing data remains expensive for many nonmodel species, it has been shown that sequencing pools of individual DNAs (Pool-seq) represents an attractive and cost-effective alternative. However, analyzing sequence read counts from a DNA pool instead of individual genotypes raises statistical challenges in deriving correct estimates of genetic differentiation. In this article, we provide a method-of-moments estimator of F-ST for Pool-seq data, based on an analysis-of-variance framework. We show, by means of simulations, that this new estimator is unbiased and outperforms previously proposed estimators. We evaluate the robustness of our estimator to model misspecification, such as sequencing errors and uneven contributions of individual DNAs to the pools. Finally, by reanalyzing published Pool-seq data of different ecotypes of the prickly sculpin Cottus asper, we show how the use of an unbiased F-ST estimator may question the interpretation of population structure inferred from previous analyses.

Journal ArticleDOI
01 Aug 2018-Genetics
TL;DR: It is found that whenever background selection is strong enough to lead to a reduction in genetic diversity, it also results in substantial distortions to the site frequency spectrum, which can mimic the effects of population expansions or positive selection.
Abstract: Purifying selection reduces genetic diversity, both at sites under direct selection and at linked neutral sites. This process, known as background selection, is thought to play an important role in shaping genomic diversity in natural populations. Yet despite its importance, the effects of background selection are not fully understood. Previous theoretical analyses of this process have taken a backward-time approach based on the structured coalescent. While they provide some insight, these methods are either limited to very small samples or are computationally prohibitive. Here, we present a new forward-time analysis of the trajectories of both neutral and deleterious mutations at a nonrecombining locus. We find that strong purifying selection leads to remarkably rich dynamics: neutral mutations can exhibit sweep-like behavior, and deleterious mutations can reach substantial frequencies even when they are guaranteed to eventually go extinct. Our analysis of these dynamics allows us to calculate analytical expressions for the full site frequency spectrum. We find that whenever background selection is strong enough to lead to a reduction in genetic diversity, it also results in substantial distortions to the site frequency spectrum, which can mimic the effects of population expansions or positive selection. Because these distortions are most pronounced in the low and high frequency ends of the spectrum, they become particularly important in larger samples, but may have small effects in smaller samples. We also apply our forward-time framework to calculate other quantities, such as the ultimate fates of polymorphisms or the fitnesses of their ancestral backgrounds.

Journal ArticleDOI
01 Nov 2018-Genetics
TL;DR: It is concluded that the true heritability of human longevity for birth cohorts across the 1800s and early 1900s was well below 10%, and that it has been generally overestimated due to the effect of assortative mating.
Abstract: Human life span is a phenotype that integrates many aspects of health and environment into a single ultimate quantity: the elapsed time between birth and death. Though it is widely believed that long life runs in families for genetic reasons, estimates of life span "heritability" are consistently low (∼15-30%). Here, we used pedigree data from Ancestry public trees, including hundreds of millions of historical persons, to estimate the heritability of human longevity. Although "nominal heritability" estimates based on correlations among genetic relatives agreed with prior literature, the majority of that correlation was also captured by correlations among nongenetic (in-law) relatives, suggestive of highly assortative mating around life span-influencing factors (genetic and/or environmental). We used structural equation modeling to account for assortative mating, and concluded that the true heritability of human longevity for birth cohorts across the 1800s and early 1900s was well below 10%, and that it has been generally overestimated due to the effect of assortative mating.

Journal ArticleDOI
14 Aug 2018-Genetics
TL;DR: The variance explaining by previously and newly identified variants decreased with increasing age in the GERA and UKB cohorts, echoed in the variance explained by the entire genome, which also showed gene–age interaction effects.
Abstract: Body mass index (BMI), a proxy measure for obesity, is determined by both environmental (including ethnicity, age, and sex) and genetic factors, with > 400 BMI-associated loci identified to date. However, the impact, interplay, and underlying biological mechanisms among BMI, environment, genetics, and ancestry are not completely understood. To further examine these relationships, we utilized 427,509 calendar year-averaged BMI measurements from 100,418 adults from the single large multiethnic Genetic Epidemiology Research on Adult Health and Aging (GERA) cohort. We observed substantial independent ancestry and nationality differences, including ancestry principal component interactions and nonlinear effects. To increase the list of BMI-associated variants before assessing other differences, we conducted a genome-wide association study (GWAS) in GERA, with replication in the Genetic Investigation of Anthropomorphic Traits (GIANT) consortium combined with the UK Biobank (UKB), followed by GWAS in GERA combined with GIANT, with replication in the UKB. We discovered 30 novel independent BMI loci ( P −8 ) that replicated. We then assessed the proportion of BMI variance explained by sex in the UKB using previously identified loci compared to previously and newly identified loci and found slight increases: from 3.0 to 3.3% for males and from 2.7 to 3.0% for females. Further, the variance explained by previously and newly identified variants decreased with increasing age in the GERA and UKB cohorts, echoed in the variance explained by the entire genome, which also showed gene–age interaction effects. Finally, we conducted a tissue expression QTL enrichment analysis, which revealed that GWAS BMI-associated variants were enriched in the cerebellum, consistent with prior work in humans and mice.

Journal ArticleDOI
01 Apr 2018-Genetics
TL;DR: It is concluded that downstream “omics” can complement genomics for hybrid prediction, and, thereby, contribute to more efficient selection of hybrid candidates.
Abstract: The ability to predict the agronomic performance of single-crosses with high precision is essential for selecting superior candidates for hybrid breeding. With recent technological advances, thousands of new parent lines, and, consequently, millions of new hybrid combinations are possible in each breeding cycle, yet only a few hundred can be produced and phenotyped in multi-environment yield trials. Well established prediction approaches such as best linear unbiased prediction (BLUP) using pedigree data and whole-genome prediction using genomic data are limited in capturing epistasis and interactions occurring within and among downstream biological strata such as transcriptome and metabolome. Because mRNA and small RNA (sRNA) sequences are involved in transcriptional, translational and post-translational processes, we expect them to provide information influencing several biological strata. However, using sRNA data of parent lines to predict hybrid performance has not yet been addressed. Here, we gathered genomic, transcriptomic (mRNA and sRNA) and metabolomic data of parent lines to evaluate the ability of the data to predict the performance of untested hybrids for important agronomic traits in grain maize. We found a considerable interaction for predictive ability between predictor and trait, with mRNA data being a superior predictor for grain yield and genomic data for grain dry matter content, while sRNA performed relatively poorly for both traits. Combining mRNA and genomic data as predictors resulted in high predictive abilities across both traits and combining other predictors improved prediction over that of the individual predictors alone. We conclude that downstream "omics" can complement genomics for hybrid prediction, and, thereby, contribute to more efficient selection of hybrid candidates.

Journal ArticleDOI
01 Dec 2018-Genetics
TL;DR: The metabolism of TAG in the Drosophila model system is reviewed, which focuses on lipolytic processes, which mobilize storage TAG to make it metabolically accessible as either an energy source or as a building block for biosynthesis of other lipid classes.
Abstract: Triacylglycerol (TAG) is the most important caloric source with respect to energy homeostasis in animals. In addition to its evolutionarily conserved importance as an energy source, TAG turnover is crucial to the metabolism of structural and signaling lipids. These neutral lipids are also key players in development and disease. Here, we review the metabolism of TAG in the Drosophila model system. Recently, the fruit fly has attracted renewed attention in research due to the unique experimental approaches it affords in studying the tissue-autonomous and interorgan regulation of lipid metabolism in vivo. Following an overview of the systemic control of fly body fat stores, we will cover lipid anabolic, enzymatic, and regulatory processes, which begin with the dietary lipid breakdown and de novo lipogenesis that results in lipid droplet storage. Next, we focus on lipolytic processes, which mobilize storage TAG to make it metabolically accessible as either an energy source or as a building block for biosynthesis of other lipid classes. Since the buildup and breakdown of fat involves various organs, we highlight avenues of lipid transport, which are at the heart of functional integration of organismic lipid metabolism. Finally, we draw attention to some “missing links” in basic neutral lipid metabolism and conclude with a perspective on how fly research can be exploited to study functional metabolic roles of diverse lipids.

Journal ArticleDOI
01 Nov 2018-Genetics
TL;DR: This work draws attention to, and then model, common aspects of NGS data: sequencing error, allelic bias, overdispersion, and outlying observations, and develops novel models to account for preferential pairing of chromosomes, and harness these for genotyping.
Abstract: Detecting and quantifying the differences in individual genomes (ie, genotyping), plays a fundamental role in most modern bioinformatics pipelines Many scientists now use reduced representation next-generation sequencing (NGS) approaches for genotyping Genotyping diploid individuals using NGS is a well-studied field, and similar methods for polyploid individuals are just emerging However, there are many aspects of NGS data, particularly in polyploids, that remain unexplored by most methods Our contributions in this paper are fourfold: (i) We draw attention to, and then model, common aspects of NGS data: sequencing error, allelic bias, overdispersion, and outlying observations (ii) Many datasets feature related individuals, and so we use the structure of Mendelian segregation to build an empirical Bayes approach for genotyping polyploid individuals (iii) We develop novel models to account for preferential pairing of chromosomes, and harness these for genotyping (iv) We derive oracle genotyping error rates that may be used for read depth suggestions We assess the accuracy of our method in simulations, and apply it to a dataset of hexaploid sweet potato (Ipomoea batatas) An R package implementing our method is available at https://cranr-projectorg/package=updog

Journal ArticleDOI
01 Oct 2018-Genetics
TL;DR: The authors constructed genomic predictors for heritable but extremely complex human quantitative traits (height, heel bone density, and educational attainment) using modern methods in high dimensional statistics (i.e., machine learning).
Abstract: We construct genomic predictors for heritable but extremely complex human quantitative traits (height, heel bone density, and educational attainment) using modern methods in high dimensional statistics (i.e., machine learning). The constructed predictors explain, respectively, ∼40, 20, and 9% of total variance for the three traits, in data not used for training. For example, predicted heights correlate ∼0.65 with actual height; actual heights of most individuals in validation samples are within a few centimeters of the prediction. The proportion of variance explained for height is comparable to the estimated common SNP heritability from genome-wide complex trait analysis (GCTA), and seems to be close to its asymptotic value (i.e., as sample size goes to infinity), suggesting that we have captured most of the heritability for SNPs. Thus, our results close the gap between prediction R-squared and common SNP heritability. The ∼20k activated SNPs in our height predictor reveal the genetic architecture of human height, at least for common variants. Our primary dataset is the UK Biobank cohort, comprised of almost 500k individual genotypes with multiple phenotypes. We also use other datasets and SNPs found in earlier genome-wide association studies (GWAS) for out-of-sample validation of our results.

Journal ArticleDOI
01 Apr 2018-Genetics
TL;DR: A method to detect polygenic adaptation in an admixture graph, which is a representation of the historical divergences and admixture events relating different populations through time, and developed a Markov chain Monte Carlo algorithm to infer branch-specific parameters reflecting the strength of selection in each branch of a graph.
Abstract: An open question in human evolution is the importance of polygenic adaptation: adaptive changes in the mean of a multifactorial trait due to shifts in allele frequencies across many loci. In recent years, several methods have been developed to detect polygenic adaptation using loci identified in genome-wide association studies (GWAS). Though powerful, these methods suffer from limited interpretability: they can detect which sets of populations have evidence for polygenic adaptation, but are unable to reveal where in the history of multiple populations these processes occurred. To address this, we created a method to detect polygenic adaptation in an admixture graph, which is a representation of the historical divergences and admixture events relating different populations through time. We developed a Markov chain Monte Carlo (MCMC) algorithm to infer branch-specific parameters reflecting the strength of selection in each branch of a graph. Additionally, we developed a set of summary statistics that are fast to compute and can indicate which branches are most likely to have experienced polygenic adaptation. We show via simulations that this method—which we call PolyGraph—has good power to detect polygenic adaptation, and applied it to human population genomic data from around the world. We also provide evidence that variants associated with several traits, including height, educational attainment, and self-reported unibrow, have been influenced by polygenic adaptation in different populations during human evolution.

Journal ArticleDOI
01 Jul 2018-Genetics
TL;DR: The results of this study support the existence of a relationship between genetic and phenotypic correlations in humans, a finding of specific interest in anthropological studies, which use measured phenotypesic correlations to make inferences about the genetics of ancient human populations.
Abstract: Accurate estimation of genetic correlation requires large sample sizes and access to genetically informative data, which are not always available. Accordingly, phenotypic correlations are often assumed to reflect genotypic correlations in evolutionary biology. Cheverud’s conjecture asserts that the use of phenotypic correlations as proxies for genetic correlations is appropriate. Empirical evidence of the conjecture has been found across plant and animal species, with results suggesting that there is indeed a robust relationship between the two. Here, we investigate the conjecture in human populations, an analysis made possible by recent developments in availability of human genomic data and computing resources. A sample of 108,035 British European individuals from the UK Biobank was split equally into discovery and replication datasets. Seventeen traits were selected based on sample size, distribution, and heritability. Genetic correlations were calculated using linkage disequilibrium score regression applied to the genome-wide association summary statistics of pairs of traits, and compared within and across datasets. Strong and significant correlations were found for the between-dataset comparison, suggesting that the genetic correlations from one independent sample were able to predict the phenotypic correlations from another independent sample within the same population. Designating the selected traits as morphological or nonmorphological indicated little difference in correlation. The results of this study support the existence of a relationship between genetic and phenotypic correlations in humans. This finding is of specific interest in anthropological studies, which use measured phenotypic correlations to make inferences about the genetics of ancient human populations.

Journal ArticleDOI
01 Apr 2018-Genetics
TL;DR: A new method for detecting natural selection on polygenic traits is developed and applied to several human examples in this issue of GENETICS.
Abstract: > In this issue of GENETICS, a new method for detecting natural selection on polygenic traits is developed and applied to several human examples ([Racimo et al. 2018][1]). By definition, many loci contribute to variation in polygenic traits, and a challenge for evolutionary geneticists has been that

Journal ArticleDOI
01 Mar 2018-Genetics
TL;DR: Advances in genetic and genomic approaches allow a reconsideration of meiotic phenomena such as interference and the centromere effect, which were previously described only by genetic studies.
Abstract: A century of genetic studies of the meiotic process in Drosophila melanogaster females has been greatly augmented by both modern molecular biology and major advances in cytology. These approaches, and the findings they have allowed, are the subject of this review. Specifically, these efforts have revealed that meiotic pairing in Drosophila females is not an extension of somatic pairing, but rather occurs by a poorly understood process during premeiotic mitoses. This process of meiotic pairing requires the function of several components of the synaptonemal complex (SC). When fully assembled, the SC also plays a critical role in maintaining homolog synapsis and in facilitating the maturation of double-strand breaks (DSBs) into mature crossover (CO) events. Considerable progress has been made in elucidating not only the structure, function, and assembly of the SC, but also the proteins that facilitate the formation and repair of DSBs into both COs and noncrossovers (NCOs). The events that control the decision to mature a DSB as either a CO or an NCO, as well as determining which of the two CO pathways (class I or class II) might be employed, are also being characterized by genetic and genomic approaches. These advances allow a reconsideration of meiotic phenomena such as interference and the centromere effect, which were previously described only by genetic studies. In delineating the mechanisms by which the oocyte controls the number and position of COs, it becomes possible to understand the role of CO position in ensuring the proper orientation of homologs on the first meiotic spindle. Studies of bivalent orientation have occurred in the context of numerous investigations into the assembly, structure, and function of the first meiotic spindle. Additionally, studies have examined the mechanisms ensuring the segregation of chromosomes that have failed to undergo crossing over.

Journal ArticleDOI
01 Feb 2018-Genetics
TL;DR: It is demonstrated that the oligo-based FISH techniques are powerful new tools for chromosome identification and karyotyping research, especially for nonmodel plant species.
Abstract: Developing the karyotype of a eukaryotic species relies on identification of individual chromosomes, which has been a major challenge for most nonmodel plant and animal species. We developed a novel chromosome identification system by selecting and labeling oligonucleotides (oligos) located in specific regions on every chromosome. We selected a set of 54,672 oligos (45 nt) based on single copy DNA sequences in the potato genome. These oligos generated 26 distinct FISH signals that can be used as a "bar code" or "banding pattern" to uniquely label each of the 12 chromosomes from both diploid and polyploid (4× and 6×) potato species. Remarkably, the same bar code can be used to identify the 12 homeologous chromosomes among distantly related Solanum species, including tomato and eggplant. Accurate karyotypes based on individually identified chromosomes were established in six Solanum species that have diverged for >15 MY. These six species have maintained a similar karyotype; however, modifications to the FISH signal bar code led to the discovery of two reciprocal chromosomal translocations in Solanum etuberosum and S. caripense We also validated these translocations by oligo-based chromosome painting. We demonstrate that the oligo-based FISH techniques are powerful new tools for chromosome identification and karyotyping research, especially for nonmodel plant species.

Journal ArticleDOI
01 Apr 2018-Genetics
TL;DR: Trans transformations from the genetic effects estimated under the LMM to the OR that only rely on summary statistics are derived and validated, improving the comparability of results from prospective and already performed LMM GWAS on complex diseases by providing a reliable transformation to a common comparative scale for the Genetic effects.
Abstract: Genome-wide association studies (GWAS) have identified thousands of loci that are robustly associated with complex diseases The use of linear mixed model (LMM) methodology for GWAS is becoming more prevalent due to its ability to control for population structure and cryptic relatedness and increase power The odds ratio (OR) is a common measure of the association of a disease with an exposure (eg, a genetic variant) and is readably available from logistic regression However, when the LMM is applied to all-or-none traits it provides estimates of genetic effects on the observed 0-1 scale, a different scale to that in logistic regression This limits the comparability of results across studies, for example in a meta-analysis, and makes the interpretation of the magnitude of an effect from an LMM GWAS difficult In this study, we derived transformations from the genetic effects estimated under the LMM to the OR that only rely on summary statistics To test the proposed transformations, we used real genotypes from two large publicly available data sets to simulate all-or-none phenotypes for a set of scenarios that differ in underlying model, disease prevalence and heritability Furthermore, we applied these transformations to GWAS summary statistics for type 2 diabetes generated from 108,042 individuals in the UK Biobank In both simulation and real data application, we observed very high concordance between the transformed OR from the LMM and either the simulated truth or estimates from logistic regression The transformations derived and validated in this study improve the comparability of results from prospective and already performed LMM GWAS on complex diseases by providing a reliable transformation to a common comparative scale for the genetic effects

Journal ArticleDOI
01 Apr 2018-Genetics
TL;DR: It is demonstrated that TRACE can efficiently generate precise single-gene deletion mutants using the ADE2 locus as an example and can also effectively delete multiple genes in a single transformation, as evident by the successful generation of quadruple mfα1Δ2Δ3Δ4Δ mutants.
Abstract: Cryptococcus neoformans is a fungal pathogen that claims hundreds of thousands of lives annually. Targeted genetic manipulation through biolistic transformation in C. neoformans drove the investigation of this clinically important pathogen at the molecular level. Although costly and inefficient, biolistic transformation remains the major method for editing the Cryptococcus genome as foreign DNAs introduced by other methods such as electroporation are predominantly not integrated into the genome. Although the majority of DNAs introduced by biolistic transformation are stably inherited, the transformation efficiency and the homologous integration rate (∼1–10%) are low. Here, we developed a Transient CRISPR (clustered regularly interspaced short palindromic repeat)-Cas9 coupled with Electroporation (TRACE) system for targeted genetic manipulations in the C. neoformans species complex. This method took advantages of efficient genome integration due to double-strand breaks created at specific sites by the transient CRISPR-Cas9 system and the high transformation efficiency of electroporation. We demonstrated that TRACE can efficiently generate precise single-gene deletion mutants using the ADE2 locus as an example. This system can also effectively delete multiple genes in a single transformation, as evident by the successful generation of quadruple mfα1Δ2Δ3Δ4Δ mutants. In addition to generating gene deletion mutants, we complemented the ade2Δ mutant by integrating a wild-type ADE2 allele at the “safe haven” region (SH2) via homologous recombination using TRACE. Interestingly, introduced DNAs can be inserted at a designated genetic site without any homologous sequences, opening up numerous other applications. We expect that TRACE, an efficient, versatile, and cost-effective gene editing approach, will greatly accelerate research in this field.

Journal ArticleDOI
01 Apr 2018-Genetics
TL;DR: The wealth of reagents available to map and manipulate neuronal activity with light is reviewed, showing how activity patterns in specific neural circuits coordinate an animal’s behavior remains a key area of neuroscience research.
Abstract: Understanding how activity patterns in specific neural circuits coordinate an animal's behavior remains a key area of neuroscience research. Genetic tools and a brain of tractable complexity make Drosophila a premier model organism for these studies. Here, we review the wealth of reagents available to map and manipulate neuronal activity with light.

Journal ArticleDOI
01 May 2018-Genetics
TL;DR: It is concluded that genome-wide prediction is feasible in potato and that it will improve selection for breeding value given the substantial amount of nonadditive genetic variance in elite germplasm.
Abstract: As one of the world’s most important food crops, the potato (Solanum tuberosum L.) has spurred innovation in autotetraploid genetics, including in the use of SNP arrays to determine allele dosage at thousands of markers. By combining genotype and pedigree information with phenotype data for economically important traits, the objectives of this study were to (1) partition the genetic variance into additive vs. nonadditive components, and (2) determine the accuracy of genome-wide prediction. Between 2012 and 2017, a training population of 571 clones was evaluated for total yield, specific gravity, and chip fry color. Genomic covariance matrices for additive (G), digenic dominant (D), and additive × additive epistatic (G#G) effects were calculated using 3895 markers, and the numerator relationship matrix (A) was calculated from a 13-generation pedigree. Based on model fit and prediction accuracy, mixed model analysis with G was superior to A for yield and fry color but not specific gravity. The amount of additive genetic variance captured by markers was 20% of the total genetic variance for specific gravity, compared to 45% for yield and fry color. Within the training population, including nonadditive effects improved accuracy and/or bias for all three traits when predicting total genotypic value. When six F1 populations were used for validation, prediction accuracy ranged from 0.06 to 0.63 and was consistently lower (0.13 on average) without allele dosage information. We conclude that genome-wide prediction is feasible in potato and that it will improve selection for breeding value given the substantial amount of nonadditive genetic variance in elite germplasm.

Journal ArticleDOI
01 Apr 2018-Genetics
TL;DR: It is shown that, as in Drosophila, both high and low temperatures increase meiotic crossovers in Arabidopsis thaliana, and it is found that, in contrast to what has been reported in barley, synaptonemal complex length is negatively correlated with temperature.
Abstract: Meiotic recombination shuffles genetic information from sexual species into gametes to create novel combinations in offspring. Thus, recombination is an important factor in inheritance, adaptation, and responses to selection. However, recombination is not a static parameter; meiotic recombination rate is sensitive to variation in the environment, especially temperature. That recombination rates change in response to both increases and decreases in temperature was reported in Drosophila a century ago, and since then in several other species. But it is still unclear what the underlying mechanism is, and whether low- and high-temperature effects are mechanistically equivalent. Here, we show that, as in Drosophila, both high and low temperatures increase meiotic crossovers in Arabidopsis thaliana. We show that, from a nadir at 18°, both lower and higher temperatures increase recombination through additional class I (interfering) crossovers. However, the increase in crossovers at high and low temperatures appears to be mechanistically at least somewhat distinct, as they differ in their association with the DNA repair protein MLH1. We also find that, in contrast to what has been reported in barley, synaptonemal complex length is negatively correlated with temperature; thus, an increase in chromosome axis length may account for increased crossovers at low temperature in A. thaliana, but cannot explain the increased crossovers observed at high temperature. The plasticity of recombination has important implications for evolution and breeding, and also for the interpretation of observations of recombination rate variation among natural populations.

Journal ArticleDOI
01 Mar 2018-Genetics
TL;DR: The use of RNAi to study gene function in Drosophila with a particular focus on high-throughput screening methods applied in cultured cells is described and the generation and use of genome-scale RNAi libraries for tissue-specific knockdown analysis in vivo are reviewed.
Abstract: In the last decade, RNA interference (RNAi), a cellular mechanism that uses RNA-guided degradation of messenger RNA transcripts, has had an important impact on identifying and characterizing gene function. First discovered in Caenorhabditis elegans, RNAi can be used to silence the expression of genes through introduction of exogenous double-stranded RNA into cells. In Drosophila, RNAi has been applied in cultured cells or in vivo to perturb the function of single genes or to systematically probe gene function on a genome-wide scale. In this review, we will describe the use of RNAi to study gene function in Drosophila with a particular focus on high-throughput screening methods applied in cultured cells. We will discuss available reagent libraries and cell lines, methodological approaches for cell-based assays, and computational methods for the analysis of high-throughput screens. Furthermore, we will review the generation and use of genome-scale RNAi libraries for tissue-specific knockdown analysis in vivo and discuss the differences and similarities with the use of genome-engineering methods such as CRISPR/Cas9 for functional analysis.