scispace - formally typeset
Search or ask a question

Showing papers in "Genetics in 2019"


Journal ArticleDOI
01 Feb 2019-Genetics
TL;DR: R/qtl2 is an interactive software environment for mapping quantitative trait loci (QTL) in experimental populations, designed to handle modern high-density genotyping data and high-dimensional molecular phenotypes, including gene expression and proteomics.
Abstract: R/qtl2 is an interactive software environment for mapping quantitative trait loci (QTL) in experimental populations. The R/qtl2 software expands the scope of the widely used R/qtl software package to include multiparent populations derived from more than two founder strains, such as the Collaborative Cross and Diversity Outbred mice, heterogeneous stocks, and MAGIC plant populations. R/qtl2 is designed to handle modern high-density genotyping data and high-dimensional molecular phenotypes, including gene expression and proteomics. R/qtl2 includes the ability to perform genome scans using a linear mixed model to account for population structure, and also includes features to impute SNPs based on founder strain genomes and to carry out association mapping. The R/qtl2 software provides all of the basic features needed for QTL mapping, including graphical displays and summary reports, and it can be extended through the creation of add-on packages. R/qtl2, which is free and open source software written in the R and C++ programming languages, comes with a test framework.

282 citations


Journal ArticleDOI
01 Feb 2019-Genetics
TL;DR: This FlyBook chapter presents a survey of the current literature on the development of the hematopoietic system in Drosophila, and develops the tools and mechanisms critical to further the authors' understanding of human hematoiesis during homeostasis and dysfunction.
Abstract: In this FlyBook chapter, we present a survey of the current literature on the development of the hematopoietic system in Drosophila The Drosophila blood system consists entirely of cells that function in innate immunity, tissue integrity, wound healing, and various forms of stress response, and are therefore functionally similar to myeloid cells in mammals. The primary cell types are specialized for phagocytic, melanization, and encapsulation functions. As in mammalian systems, multiple sites of hematopoiesis are evident in Drosophila and the mechanisms involved in this process employ many of the same molecular strategies that exemplify blood development in humans. Drosophila blood progenitors respond to internal and external stress by coopting developmental pathways that involve both local and systemic signals. An important goal of these Drosophila studies is to develop the tools and mechanisms critical to further our understanding of human hematopoiesis during homeostasis and dysfunction.

174 citations


Journal ArticleDOI
01 Jan 2019-Genetics
TL;DR: This work shows how to use local PCA to describe this intermediate-scale heterogeneity in patterns of relatedness, and applies the method to genomic data from three species, finding in each that the effect of population structure can vary substantially across only a few megabases.
Abstract: Population structure leads to systematic patterns in measures of mean relatedness between individuals in large genomic data sets, which are often discovered and visualized using dimension reduction techniques such as principal component analysis (PCA). Mean relatedness is an average of the relationships across locus-specific genealogical trees, which can be strongly affected on intermediate genomic scales by linked selection and other factors. We show how to use local PCA to describe this intermediate-scale heterogeneity in patterns of relatedness, and apply the method to genomic data from three species, finding in each that the effect of population structure can vary substantially across only a few megabases. In a global human data set, localized heterogeneity is likely explained by polymorphic chromosomal inversions. In a range-wide data set of Medicago truncatula, factors that produce heterogeneity are shared between chromosomes, correlate with local gene density, and may be caused by linked selection, such as background selection or local adaptation. In a data set of primarily African Drosophila melanogaster, large-scale heterogeneity across each chromosome arm is explained by known chromosomal inversions thought to be under recent selection and, after removing samples carrying inversions, remaining heterogeneity is correlated with recombination rate and gene density, again suggesting a role for linked selection. The visualization method provides a flexible new way to discover biological drivers of genetic variation, and its application to data highlights the strong effects that linked selection and chromosomal inversions can have on observed patterns of genetic variation.

108 citations


Journal ArticleDOI
01 Aug 2019-Genetics
TL;DR: A comprehensive overview of transgenic methods in C. elegans is provided, with an emphasis on recent advances in transposon-mediated transgenesis, CRISPR/Cas9 gene editing, conditional gene and protein inactivation, and bipartite systems for temporal and spatial control of expression.
Abstract: The power of any genetic model organism is derived, in part, from the ease with which gene expression can be manipulated. The short generation time and invariant developmental lineage have made Caenorhabditis elegans very useful for understanding, e.g., developmental programs, basic cell biology, neurobiology, and aging. Over the last decade, the C. elegans transgenic toolbox has expanded considerably, with the addition of a variety of methods to control expression and modify genes with unprecedented resolution. Here, we provide a comprehensive overview of transgenic methods in C. elegans, with an emphasis on recent advances in transposon-mediated transgenesis, CRISPR/Cas9 gene editing, conditional gene and protein inactivation, and bipartite systems for temporal and spatial control of expression.

107 citations


Journal ArticleDOI
01 Oct 2019-Genetics
TL;DR: The state of the TOR field in C. elegans is reviewed, and what is learned about its functions in development, metabolism, and aging is focused on.
Abstract: The Target of Rapamycin (TOR or mTOR) is a serine/threonine kinase that regulates growth, development, and behaviors by modulating protein synthesis, autophagy, and multiple other cellular processes in response to changes in nutrients and other cues. Over recent years, TOR has been studied intensively in mammalian cell culture and genetic systems because of its importance in growth, metabolism, cancer, and aging. Through its advantages for unbiased, and high-throughput, genetic and in vivo studies, Caenorhabditis elegans has made major contributions to our understanding of TOR biology. Genetic analyses in the worm have revealed unexpected aspects of TOR functions and regulation, and have the potential to further expand our understanding of how growth and metabolic regulation influence development. In the aging field, C. elegans has played a leading role in revealing the promise of TOR inhibition as a strategy for extending life span, and identifying mechanisms that function upstream and downstream of TOR to influence aging. Here, we review the state of the TOR field in C. elegans, and focus on what we have learned about its functions in development, metabolism, and aging. We discuss knowledge gaps, including the potential pitfalls in translating findings back and forth across organisms, but also describe how TOR is important for C. elegans biology, and how C. elegans work has developed paradigms of great importance for the broader TOR field.

92 citations


Journal ArticleDOI
01 Apr 2019-Genetics
TL;DR: The equation that predicts the expected proportion of variance explained using PRS, and how GWAS sample size is the key factor for maximizing accuracy of prediction in both humans and livestock are explored.
Abstract: In this Review, we focus on the similarity of the concepts underlying prediction of estimated breeding values (EBVs) in livestock and polygenic risk scores (PRS) in humans. Our research spans both fields and so we recognize factors that are very obvious for those in one field, but less so for those in the other. Differences in family size between species is the wedge that drives the different viewpoints and approaches. Large family size achievable in nonhuman species accompanied by selection generates a smaller effective population size, increased linkage disequilibrium and a higher average genetic relationship between individuals within a population. In human genetic analyses, we select individuals unrelated in the classical sense (coefficient of relationship <0.05) to estimate heritability captured by common SNPs. In livestock data, all animals within a breed are to some extent "related," and so it is not possible to select unrelated individuals and retain a data set of sufficient size to analyze. These differences directly or indirectly impact the way data analyses are undertaken. In livestock, genetic segregation variance exposed through samplings of parental genomes within families is directly observable and taken for granted. In humans, this genomic variation is under-recognized for its contribution to variation in polygenic risk of common disease, in both those with and without family history of disease. We explore the equation that predicts the expected proportion of variance explained using PRS, and quantify how GWAS sample size is the key factor for maximizing accuracy of prediction in both humans and livestock. Last, we bring together the concepts discussed to address some frequently asked questions.

90 citations


Journal ArticleDOI
01 Feb 2019-Genetics
TL;DR: The results suggest that high mutation rate potentially contributes to high polymorphism and low mutation rate to reduced polymorphism in natural populations providing insights of mutational inputs in generating natural genetic diversity.
Abstract: Mutations are the ultimate source of all genetic variation. However, few direct estimates of the contribution of mutation to molecular genetic variation are available. To address this issue, we first analyzed the rate and spectrum of mutations in the Arabidopsis thaliana reference accession after 25 generations of single-seed descent. We then compared the mutation profile in these mutation accumulation (MA) lines against genetic variation observed in the 1001 Genomes Project. The estimated haploid single nucleotide mutation (SNM) rate for A. thaliana is 6.95 × 10−9 (SE ± 2.68 × 10−10) per site per generation, with SNMs having higher frequency in transposable elements (TEs) and centromeric regions. The estimated indel mutation rate is 1.30 × 10−9 (±1.07 × 10−10) per site per generation, with deletions being more frequent and larger than insertions. Among the 1694 unique SNMs identified in the MA lines, the positions of 389 SNMs (23%) coincide with biallelic SNPs from the 1001 Genomes population, and in 289 (17%) cases the changes are identical. Of the 329 unique indels identified in the MA lines, 96 (29%) overlap with indels from the 1001 Genomes dataset, and 16 indels (5% of the total) are identical. These overlap frequencies are significantly higher than expected, suggesting that de novo mutations are not uniformly distributed and arise at polymorphic sites more frequently than assumed. These results suggest that high mutation rate potentially contributes to high polymorphism and low mutation rate to reduced polymorphism in natural populations providing insights of mutational inputs in generating natural genetic diversity.

89 citations


Journal ArticleDOI
01 Nov 2019-Genetics
TL;DR: It is concluded that suppression of COs occurs over a narrow region spanning large- and small-scale SVs, representing an influence on the CO landscape in addition to sequence and epigenetic variation along chromosomes.
Abstract: Many environmental, genetic, and epigenetic factors are known to affect the frequency and positioning of meiotic crossovers (COs). Suppression of COs by large, cytologically visible inversions and translocations has long been recognized, but relatively little is known about how smaller structural variants (SVs) affect COs. To examine fine-scale determinants of the CO landscape, including SVs, we used a rapid, cost-effective method for high-throughput sequencing to generate a precise map of >17,000 COs between the Col-0 and Ler-0 accessions of Arabidopsis thaliana COs were generally suppressed in regions with SVs, but this effect did not depend on the size of the variant region, and was only marginally affected by the variant type. CO suppression did not extend far beyond the SV borders and CO rates were slightly elevated in the flanking regions. Disease resistance gene clusters, which often exist as SVs, exhibited high CO rates at some loci, but there was a tendency toward depressed CO rates at loci where large structural differences exist between the two parents. Our high-density map also revealed in fine detail how CO positioning relates to genetic (DNA motifs) and epigenetic (chromatin structure) features of the genome. We conclude that suppression of COs occurs over a narrow region spanning large- and small-scale SVs, representing an influence on the CO landscape in addition to sequence and epigenetic variation along chromosomes.

87 citations


Journal ArticleDOI
01 Jan 2019-Genetics
TL;DR: While the Y chromosome does not undergo crossing over, high gene conversion rates within and between members of the crystal-Stellate gene family, Su(Ste), and PCKR, compared to genome-wide estimates are observed, suggesting that gene conversion and gene duplication play an important role in the evolution of Y-linked genes.
Abstract: Heterochromatic regions of the genome are repeat-rich and poor in protein coding genes, and are therefore underrepresented in even the best genome assemblies. One of the most difficult regions of the genome to assemble are sex-limited chromosomes. The Drosophila melanogaster Y chromosome is entirely heterochromatic, yet has wide-ranging effects on male fertility, fitness, and genome-wide gene expression. The genetic basis of this phenotypic variation is difficult to study, in part because we do not know the detailed organization of the Y chromosome. To study Y chromosome organization in D. melanogaster, we develop an assembly strategy involving the in silico enrichment of heterochromatic long single-molecule reads and use these reads to create targeted de novo assemblies of heterochromatic sequences. We assigned contigs to the Y chromosome using Illumina reads to identify male-specific sequences. Our pipeline extends the D. melanogaster reference genome by 11.9 Mb, closes 43.8% of the gaps, and improves overall contiguity. The addition of 10.6 MB of Y-linked sequence permitted us to study the organization of repeats and genes along the Y chromosome. We detected a high rate of duplication to the pericentric regions of the Y chromosome from other regions in the genome. Most of these duplicated genes exist in multiple copies. We detail the evolutionary history of one sex-linked gene family, crystal-Stellate. While the Y chromosome does not undergo crossing over, we observed high gene conversion rates within and between members of the crystal-Stellate gene family, Su(Ste), and PCKR, compared to genome-wide estimates. Our results suggest that gene conversion and gene duplication play an important role in the evolution of Y-linked genes.

84 citations


Journal ArticleDOI
01 Dec 2019-Genetics
TL;DR: C. elegans is a powerful model for understanding germline stem cells and stem cell biology and the Notch regulated genetic network that controls the key decision between the stem cell fate and meiotic development occurs under optimal laboratory conditions in adult and larval stages.
Abstract: Stem cell systems regulate tissue development and maintenance. The germline stem cell system is essential for animal reproduction, controlling both the timing and number of progeny through its influence on gamete production. In this review, we first draw general comparisons to stem cell systems in other organisms, and then present our current understanding of the germline stem cell system in Caenorhabditis elegans. In contrast to stereotypic somatic development and cell number stasis of adult somatic cells in C. elegans, the germline stem cell system has a variable division pattern, and the system differs between larval development, early adult peak reproduction and age-related decline. We discuss the cell and developmental biology of the stem cell system and the Notch regulated genetic network that controls the key decision between the stem cell fate and meiotic development, as it occurs under optimal laboratory conditions in adult and larval stages. We then discuss alterations of the stem cell system in response to environmental perturbations and aging. A recurring distinction is between processes that control stem cell fate and those that control cell cycle regulation. C. elegans is a powerful model for understanding germline stem cells and stem cell biology.

80 citations


Journal ArticleDOI
01 Aug 2019-Genetics
TL;DR: Results suggest that heritability on an entry-difference basis is a well-suited alternative for obtaining an overall heritability estimate, and in addition provides one heritability per genotype as well as one per difference between genotypes.
Abstract: In plant breeding, heritability is often calculated (i) as a measure of precision of trials and/or (ii) to compute the response to selection. It is usually estimated on an entry-mean basis, since the phenotype is usually an aggregated value, as genotypes are replicated in trials, which stands in contrast with animal breeding and human genetics. When this was first proposed, assumptions such as balanced data and independent genotypic effects were made that are often violated in modern plant breeding trials/analyses. Due to this, multiple alternative methods have been proposed, aiming to generalize heritability on an entry-mean basis. In this study, we propose an extension of the concept for heritability on an entry-mean to an entry-difference basis, which allows for more detailed insight and is more meaningful in the context of selection in plant breeding, because the correlation among entry means can be accounted for. We show that under certain circumstances our method reduces to other popular generalized methods for heritability estimation on an entry-mean basis. The approach is exemplified via four examples that show different levels of complexity, where we compare six methods for heritability estimation on an entry-mean basis to our approach (example codes: https://github.com/PaulSchmidtGit/Heritability). Results suggest that heritability on an entry-difference basis is a well-suited alternative for obtaining an overall heritability estimate, and in addition provides one heritability per genotype as well as one per difference between genotypes.

Journal ArticleDOI
01 May 2019-Genetics
TL;DR: This chapter summarizes current knowledge about the sensitivity and response dynamics of individual classes of C. elegans mechano- and thermosensory neurons from in vivo calcium imaging and whole-cell patch-clamp electrophysiology studies, and describes the roles of conserved molecules and signaling pathways in mediating the remarkably sensitive responses of these nematodes to mechanical and thermal cues.
Abstract: Caenorhabditis elegans lives in a complex habitat in which they routinely experience large fluctuations in temperature, and encounter physical obstacles that vary in size and composition. Their habitat is shared by other nematodes, by beneficial and harmful bacteria, and nematode-trapping fungi. Not surprisingly, these nematodes can detect and discriminate among diverse environmental cues, and exhibit sensory-evoked behaviors that are readily quantifiable in the laboratory at high resolution. Their ability to perform these behaviors depends on <100 sensory neurons, and this compact sensory nervous system together with powerful molecular genetic tools has allowed individual neuron types to be linked to specific sensory responses. Here, we describe the sensory neurons and molecules that enable C. elegans to sense and respond to physical stimuli. We focus primarily on the pathways that allow sensation of mechanical and thermal stimuli, and briefly consider this animal's ability to sense magnetic and electrical fields, light, and relative humidity. As the study of sensory transduction is critically dependent upon the techniques for stimulus delivery, we also include a section on appropriate laboratory methods for such studies. This chapter summarizes current knowledge about the sensitivity and response dynamics of individual classes of C. elegans mechano- and thermosensory neurons from in vivo calcium imaging and whole-cell patch-clamp electrophysiology studies. We also describe the roles of conserved molecules and signaling pathways in mediating the remarkably sensitive responses of these nematodes to mechanical and thermal cues. These studies have shown that the protein partners that form mechanotransduction channels are drawn from multiple superfamilies of ion channel proteins, and that signal transduction pathways responsible for temperature sensing in C. elegans share many features with those responsible for phototransduction in vertebrates.

Journal ArticleDOI
01 Apr 2019-Genetics
TL;DR: It is shown that the short generation time and high fecundity of T. urticae can be readily exploited in experimental evolution designs for high-resolution mapping of quantitative traits, and that a limited number of loci could explain quantitative resistance to this compound.
Abstract: Pesticide resistance arises rapidly in arthropod herbivores, as can host plant adaptation, and both are significant problems in agriculture. These traits have been challenging to study as both are often polygenic and many arthropods are genetically intractable. Here, we examined the genetic architecture of pesticide resistance and host plant adaptation in the two-spotted spider mite, Tetranychus urticae, a global agricultural pest. We show that the short generation time and high fecundity of T. urticae can be readily exploited in experimental evolution designs for high-resolution mapping of quantitative traits. As revealed by selection with spirodiclofen, an acetyl-CoA carboxylase inhibitor, in populations from a cross between a spirodiclofen-resistant and a spirodiclofen-susceptible strain, and which also differed in performance on tomato, we found that a limited number of loci could explain quantitative resistance to this compound. These were resolved to narrow genomic intervals, suggesting specific candidate genes, including acetyl-CoA carboxylase itself, clustered and copy variable cytochrome P450 genes, and NADPH cytochrome P450 reductase, which encodes a redox partner for cytochrome P450s. For performance on tomato, candidate genomic regions for response to selection were distinct from those responding to the synthetic compound and were consistent with a more polygenic architecture. In accomplishing this work, we exploited the continuous nature of allele frequency changes across experimental populations to resolve the existing fragmented T. urticae draft genome to pseudochromosomes. This improved assembly was indispensable for our analyses, as it will be for future research with this model herbivore that is exceptionally amenable to genetic studies.

Journal ArticleDOI
01 Aug 2019-Genetics
TL;DR: General concordance of Wolbachia and mitochondrial phylogenies suggests that horizontal transmission is rare, but varying relative rates of molecular divergence complicate chronogram-based statistical tests.
Abstract: Maternally transmitted Wolbachia infect about half of insect species, yet the predominant mode(s) of Wolbachia acquisition remains uncertain. Species-specific associations could be old, with Wolbachia and hosts codiversifying (i.e., cladogenic acquisition), or relatively young and acquired by horizontal transfer or introgression. The three Drosophila yakuba-clade hosts [(D. santomea, D. yakuba) D. teissieri] diverged ∼3 MYA and currently hybridize on the West African islands Bioko and Sao Tome. Each species is polymorphic for nearly identical Wolbachia that cause weak cytoplasmic incompatibility (CI)-reduced egg hatch when uninfected females mate with infected males. D. yakuba-clade Wolbachia are closely related to wMel, globally polymorphic in D. melanogaster We use draft Wolbachia and mitochondrial genomes to demonstrate that D. yakuba-clade phylogenies for Wolbachia and mitochondria tend to follow host nuclear phylogenies. However, roughly half of D. santomea individuals, sampled both inside and outside of the Sao Tome hybrid zone, have introgressed D. yakuba mitochondria. Both mitochondria and Wolbachia possess far more recent common ancestors than the bulk of the host nuclear genomes, precluding cladogenic Wolbachia acquisition. General concordance of Wolbachia and mitochondrial phylogenies suggests that horizontal transmission is rare, but varying relative rates of molecular divergence complicate chronogram-based statistical tests. Loci that cause CI in wMel are disrupted in D. yakuba-clade Wolbachia; but a second set of loci predicted to cause CI are located in the same WO prophage region. These alternative CI loci seem to have been acquired horizontally from distantly related Wolbachia, with transfer mediated by flanking Wolbachia-specific ISWpi1 transposons.

Journal ArticleDOI
01 Mar 2019-Genetics
TL;DR: Analytical and simulation tools for evolve-and-resequencing experiments are developed and applied to a new study of rapid evolution in Drosophila simulans and SNPs showing strong parallel evolution in the experiment are intermediate in frequency in the natural population indicative of balancing selection in nature.
Abstract: We develop analytical and simulation tools for evolve-and-resequencing experiments and apply them to a new study of rapid evolution in Drosophila simulans Likelihood test statistics applied to pooled population sequencing data suggest parallel evolution of 138 SNPs across the genome. This number is reduced by orders of magnitude from previous studies (thousands or tens of thousands), owing to differences in both experimental design and statistical analysis. Whole genome simulations calibrated from Drosophila genetic data sets indicate that major features of the genome-wide response could be explained by as few as 30 loci under strong directional selection with a corresponding hitchhiking effect. Smaller effect loci are likely also responding, but are below the detection limit of the experiment. Finally, SNPs showing strong parallel evolution in the experiment are intermediate in frequency in the natural population (usually 30-70%) indicative of balancing selection in nature. These loci also exhibit elevated differentiation among natural populations of D. simulans, suggesting environmental heterogeneity as a potential balancing mechanism.

Journal ArticleDOI
01 May 2019-Genetics
TL;DR: The feasibility and relevance of using penalized regression for PRS computation when large individual-level datasets are available, thanks to the efficient implementation available in the R package bigstatsr, and its scalability and strong predictive power, even for highly polygenic traits.
Abstract: Polygenic Risk Scores (PRS) combine genotype information across many single-nucleotide polymorphisms (SNPs) to give a score reflecting the genetic risk of developing a disease. PRS might have a major impact on public health, possibly allowing for screening campaigns to identify high-genetic risk individuals for a given disease. The "Clumping+Thresholding" (C+T) approach is the most common method to derive PRS. C+T uses only univariate genome-wide association studies (GWAS) summary statistics, which makes it fast and easy to use. However, previous work showed that jointly estimating SNP effects for computing PRS has the potential to significantly improve the predictive performance of PRS as compared to C+T. In this paper, we present an efficient method for the joint estimation of SNP effects using individual-level data, allowing for practical application of penalized logistic regression (PLR) on modern datasets including hundreds of thousands of individuals. Moreover, our implementation of PLR directly includes automatic choices for hyper-parameters. We also provide an implementation of penalized linear regression for quantitative traits. We compare the performance of PLR, C+T and a derivation of random forests using both real and simulated data. Overall, we find that PLR achieves equal or higher predictive performance than C+T in most scenarios considered, while being scalable to biobank data. In particular, we find that improvement in predictive performance is more pronounced when there are few effects located in nearby genomic regions with correlated SNPs; for instance, in simulations, AUC values increase from 83% with the best prediction of C+T to 92.5% with PLR. We confirm these results in a data analysis of a case-control study for celiac disease where PLR and the standard C+T method achieve AUC values of 89% and of 82.5%. Applying penalized linear regression to 350,000 individuals of the UK Biobank, we predict height with a larger correlation than with the best prediction of C+T (∼65% instead of ∼55%), further demonstrating its scalability and strong predictive power, even for highly polygenic traits. Moreover, using 150,000 individuals of the UK Biobank, we are able to predict breast cancer better than C+T, fitting PLR in a few minutes only. In conclusion, this paper demonstrates the feasibility and relevance of using penalized regression for PRS computation when large individual-level datasets are available, thanks to the efficient implementation available in our R package bigstatsr.

Journal ArticleDOI
01 Jan 2019-Genetics
TL;DR: The virtues of C. elegans as a model system are summarized and the current understanding of centriole duplication, the acquisition of pericentriolar material by centrioles to form centrosomes, the assembly of kinetochores and the mitotic spindle are reviewed.
Abstract: Mitotic cell divisions increase cell number while faithfully distributing the replicated genome at each division. The Caenorhabditis elegans embryo is a powerful model for eukaryotic cell division. Nearly all of the genes that regulate cell division in C. elegans are conserved across metazoan species, including humans. The C. elegans pathways tend to be streamlined, facilitating dissection of the more redundant human pathways. Here, we summarize the virtues of C. elegans as a model system and review our current understanding of centriole duplication, the acquisition of pericentriolar material by centrioles to form centrosomes, the assembly of kinetochores and the mitotic spindle, chromosome segregation, and cytokinesis.

Journal ArticleDOI
01 Jul 2019-Genetics
TL;DR: ROHan, a probabilistic method which substantially improves the estimate of heterozygosity rates both genome-wide and for genomic local windows, is introduced, which combines a local Bayesian model and a Hidden Markov Model at the genome- wide level and can work both on modern and ancient samples.
Abstract: Both the total amount and the distribution of heterozygous sites within individual genomes are informative about the genetic diversity of the population they belong to. Detecting true heterozygous sites in ancient genomes is complicated by the generally limited coverage achieved and the presence of post-mortem damage inflating sequencing errors. Additionally, large runs of homozygosity found in the genomes of particularly inbred individuals and of domestic animals can skew estimates of genome-wide heterozygosity rates. Current computational tools aimed at estimating runs of homozygosity and genome-wide heterozygosity levels are generally sensitive to such limitations. Here, we introduce ROHan, a probabilistic method which substantially improves the estimate of heterozygosity rates both genome-wide and for genomic local windows. It combines a local Bayesian model and a Hidden Markov Model at the genome-wide level and can work both on modern and ancient samples. We show that our algorithm outperforms currently available methods for predicting heterozygosity rates for ancient samples. Specifically, ROHan can delineate large runs of homozygosity (at megabase scales) and produce a reliable confidence interval for the genome-wide rate of heterozygosity outside of such regions from modern genomes with a depth of coverage as low as 5-6× and down to 7-8× for ancient samples showing moderate DNA damage. We apply ROHan to a series of modern and ancient genomes previously published and revise available estimates of heterozygosity for humans, chimpanzees and horses.

Journal ArticleDOI
01 Apr 2019-Genetics
TL;DR: The effects of knockdowns of genes implicated in the initial screen were on average more similar than expected under a null model, and a subset of SNP effects were replicable in an unrelated panel of inbred lines.
Abstract: Due to the complexity of genotype–phenotype relationships, simultaneous analyses of genomic associations with multiple traits will be more powerful and informative than a series of univariate analyses. However, in most cases, studies of genotype–phenotype relationships have been analyzed only one trait at a time. Here, we report the results of a fully integrated multivariate genome-wide association analysis of the shape of the Drosophila melanogaster wing in the Drosophila Genetic Reference Panel. Genotypic effects on wing shape were highly correlated between two different laboratories. We found 2396 significant SNPs using a 5% false discovery rate cutoff in the multivariate analyses, but just four significant SNPs in univariate analyses of scores on the first 20 principal component axes. One quarter of these initially significant SNPs retain their effects in regularized models that take into account population structure and linkage disequilibrium. A key advantage of multivariate analysis is that the direction of the estimated phenotypic effect is much more informative than a univariate one. We exploit this fact to show that the effects of knockdowns of genes implicated in the initial screen were on average more similar than expected under a null model. A subset of SNP effects were replicable in an unrelated panel of inbred lines. Association studies that take a phenomic approach, considering many traits simultaneously, are an important complement to the power of genomics.

Journal ArticleDOI
01 May 2019-Genetics
TL;DR: It is shown that the observed relationships between the rate of crossing over, and the level of synonymous site diversity and rate of adaptive evolution in Drosophila are probably mainly caused by background selection, whereas selective sweeps and population size changes are needed to produce the observed distortions of the site frequency spectrum.
Abstract: Levels of variability and rates of adaptive evolution may be affected by hitchhiking, the effect of selection on evolution at linked sites. Hitchhiking can be caused either by "selective sweeps" or by background selection, involving the spread of new favorable alleles or the elimination of deleterious mutations, respectively. Recent analyses of population genomic data have fitted models where both these processes act simultaneously, to infer the parameters of selection. Here, we investigate the consequences of relaxing a key assumption of some of these studies, that the time occupied by a selective sweep is negligible compared with the neutral coalescent time. We derive a new expression for the expected level of neutral variability in the presence of recurrent selective sweeps and background selection. We also derive approximate integral expressions for the effects of recurrent selective sweeps. The accuracy of the theoretical predictions was tested against multilocus simulations, with selection, recombination, and mutation parameters that are realistic for Drosophila melanogaster In the presence of crossing over, there is approximate agreement between the theoretical and simulation results. We show that the observed relationships between the rate of crossing over, and the level of synonymous site diversity and rate of adaptive evolution in Drosophila are probably mainly caused by background selection, whereas selective sweeps and population size changes are needed to produce the observed distortions of the site frequency spectrum.

Journal ArticleDOI
01 Jul 2019-Genetics
TL;DR: In this paper, the earliest stages of sex-chromosome evolution in the diploid dioecious herb Mercurialis annua were revealed based on cytological analysis, de novo genome assembly and annotation, genetic mapping, exome resequencing of natural populations, and transcriptome analysis.
Abstract: Suppressed recombination allows divergence between homologous sex chromosomes and the functionality of their genes. Here, we reveal patterns of the earliest stages of sex-chromosome evolution in the diploid dioecious herb Mercurialis annua on the basis of cytological analysis, de novo genome assembly and annotation, genetic mapping, exome resequencing of natural populations, and transcriptome analysis. The genome assembly contained 34,105 expressed genes, of which 10,076 were assigned to linkage groups. Genetic mapping and exome resequencing of individuals across the species range both identified the largest linkage group, LG1, as the sex chromosome. Although the sex chromosomes of M. annua are karyotypically homomorphic, we estimate that about one-third of the Y chromosome, containing 568 transcripts and spanning 22.3 cM in the corresponding female map, has ceased recombining. Nevertheless, we found limited evidence for Y-chromosome degeneration in terms of gene loss and pseudogenization, and most X- and Y-linked genes appear to have diverged in the period subsequent to speciation between M. annua and its sister species M. huetii, which shares the same sex-determining region. Taken together, our results suggest that the M. annua Y chromosome has at least two evolutionary strata: a small old stratum shared with M. huetii, and a more recent larger stratum that is probably unique to M. annua and that stopped recombining ∼1 MYA. Patterns of gene expression within the nonrecombining region are consistent with the idea that sexually antagonistic selection may have played a role in favoring suppressed recombination.

Journal ArticleDOI
01 May 2019-Genetics
TL;DR: The properties of the RNA polymerase II core promoter in Drosophila indicate that the core promoter is a central component of the transcriptional apparatus that regulates gene expression.
Abstract: Transcription by RNA polymerase II initiates at the core promoter, which is sometimes referred to as the "gateway to transcription." Here, we describe the properties of the RNA polymerase II core promoter in Drosophila The core promoter is at a strategic position in the expression of genes, as it is the site of convergence of the signals that lead to transcriptional activation. Importantly, core promoters are diverse in terms of their structure and function. They are composed of various combinations of sequence motifs such as the TATA box, initiator (Inr), and downstream core promoter element (DPE). Different types of core promoters are transcribed via distinct mechanisms. Moreover, some transcriptional enhancers exhibit specificity for particular types of core promoters. These findings indicate that the core promoter is a central component of the transcriptional apparatus that regulates gene expression.

Journal ArticleDOI
01 Apr 2019-Genetics
TL;DR: In this paper, a century ago, a paper from R.A. Fisher reconciled Mendelian and biometrical genetics in a landmark contribution that is now accepted as the main foundation of the field of quantitative genetics.
Abstract: The genetics and evolution of complex traits, including quantitative traits and disease, have been hotly debated ever since Darwin. A century ago, a paper from R.A. Fisher reconciled Mendelian and biometrical genetics in a landmark contribution that is now accepted as the main foundation stone of the field of quantitative genetics. Here, we give our perspective on Fisher's 1918 paper in the context of how and why it is relevant in today's genome era. We mostly focus on human trait variation, in part because Fisher did so too, but the conclusions are general and extend to other natural populations, and to populations undergoing artificial selection.

Journal ArticleDOI
01 Feb 2019-Genetics
TL;DR: It is shown that insertions or deletions in coding sequences more reliably cause somatic mutations than DNA excisions induced by two gRNAs, and CRISPR-TRiM efficiently unmasks redundant soluble N-ethylmaleimide–sensitive factor attachment protein receptor gene functions in neurons and epidermal cells.
Abstract: Tissue-specific loss-of-function (LOF) analysis is essential for characterizing gene function. Here, we present a simple, yet highly efficient, clustered regularly interspaced short palindromic repeats (CRISPR)-mediated tissue-restricted mutagenesis (CRISPR-TRiM) method for ablating gene function in Drosophila This binary system consists of a tissue-specific Cas9 and a ubiquitously expressed multi-guide RNA (gRNA) transgene. We describe convenient toolkits for making enhancer-driven Cas9 lines and multi-gRNAs that are optimized for mutagenizing somatic cells. We demonstrate that insertions or deletions in coding sequences more reliably cause somatic mutations than DNA excisions induced by two gRNAs. We further show that enhancer-driven Cas9 is less cytotoxic yet results in more complete LOF than Gal4-driven Cas9 in larval sensory neurons. Finally, CRISPR-TRiM efficiently unmasks redundant soluble N-ethylmaleimide-sensitive factor attachment protein receptor gene functions in neurons and epidermal cells. Importantly, Cas9 transgenes expressed at different times in the neuronal lineage reveal the extent to which gene products persist in cells after tissue-specific gene knockout. These CRISPR tools can be applied to analyze tissue-specific gene function in many biological processes.

Journal ArticleDOI
01 May 2019-Genetics
TL;DR: A distinction between easy landscapes of traditional theory where local fitness peaks can be found in a moderate number of steps, and hard landscapes where finding local optima requires an infeasible amount of time is introduced.
Abstract: Experiments show that evolutionary fitness landscapes can have a rich combinatorial structure due to epistasis. For some landscapes, this structure can produce a computational constraint that prevents evolution from finding local fitness optima—thus overturning the traditional assumption that local fitness peaks can always be reached quickly if no other evolutionary forces challenge natural selection. Here, I introduce a distinction between easy landscapes of traditional theory where local fitness peaks can be found in a moderate number of steps, and hard landscapes where finding local optima requires an infeasible amount of time. Hard examples exist even among landscapes with no reciprocal sign epistasis; on these semismooth fitness landscapes, strong selection weak mutation dynamics cannot find the unique peak in polynomial time. More generally, on hard rugged fitness landscapes that include reciprocal sign epistasis, no evolutionary dynamics—even ones that do not follow adaptive paths—can find a local fitness optimum quickly. Moreover, on hard landscapes, the fitness advantage of nearby mutants cannot drop off exponentially fast but has to follow a power-law that long-term evolution experiments have associated with unbounded growth in fitness. Thus, the constraint of computational complexity enables open-ended evolution on finite landscapes. Knowing this constraint allows us to use the tools of theoretical computer science and combinatorial optimization to characterize the fitness landscapes that we expect to see in nature. I present candidates for hard landscapes at scales from single genes, to microbes, to complex organisms with costly learning (Baldwin effect) or maintained cooperation (Hankshaw effect). Just how ubiquitous hard landscapes (and the corresponding ultimate constraint on evolution) are in nature becomes an open empirical question.

Journal ArticleDOI
01 Feb 2019-Genetics
TL;DR: These strategies enhance Cas9-targeting efficiency, lend insight into the timing and mechanisms of DSB repair, and establish guidelines for achieving predictable precise and imprecise repair outcomes with high frequency.
Abstract: The targetable DNA endonuclease CRISPR-Cas9 has transformed analysis of biological processes by enabling robust genome editing in model and nonmodel organisms. Although rules directing Cas9 to its target DNA via a guide RNA are straightforward, wide variation occurs in editing efficiency and repair outcomes for both imprecise error-prone repair and precise templated repair. We found that imprecise and precise DNA repair from double-strand breaks (DSBs) is asymmetric, favoring repair in one direction. Using this knowledge, we designed RNA guides and repair templates that increased the frequency of imprecise insertions and deletions and greatly enhanced precise insertion of point mutations in Caenorhabditis elegans. We also devised strategies to insert long (10 kb) exogenous sequences and incorporate multiple nucleotide substitutions at a considerable distance from DSBs. We expanded the repertoire of co-conversion markers appropriate for diverse nematode species. These selectable markers enable rapid identification of Cas9-edited animals also likely to carry edits in desired targets. Lastly, we explored the timing, location, frequency, sex dependence, and categories of DSB repair events by developing loci with allele-specific Cas9 targets that can be contributed during mating from either male or hermaphrodite germ cells. We found a striking difference in editing efficiency between maternally and paternally contributed genomes. Furthermore, imprecise repair and precise repair from exogenous repair templates occur with high frequency before and after fertilization. Our strategies enhance Cas9-targeting efficiency, lend insight into the timing and mechanisms of DSB repair, and establish guidelines for achieving predictable precise and imprecise repair outcomes with high frequency.

Journal ArticleDOI
01 Mar 2019-Genetics
TL;DR: Results suggest that incorporating putatively deleterious variants into genomic models slightly improves prediction accuracy because of extensive linkage, and could be leveraged for sorghum breeding through either genome editing and/or conventional breeding that focuses on the selection of progeny with fewer deleteriously alleles.
Abstract: Sorghum (Sorghum bicolor L.) is a major food cereal for millions of people worldwide. The sorghum genome, like other species, accumulates deleterious mutations, likely impacting its fitness. The lack of recombination, drift, and the coupling with favorable loci impede the removal of deleterious mutations from the genome by selection. To study how deleterious variants impact phenotypes, we identified putative deleterious mutations among ∼5.5 M segregating variants of 229 diverse biomass sorghum lines. We provide the whole-genome estimate of the deleterious burden in sorghum, showing that ∼33% of nonsynonymous substitutions are putatively deleterious. The pattern of mutation burden varies appreciably among racial groups. Across racial groups, the mutation burden correlated negatively with biomass, plant height, specific leaf area (SLA), and tissue starch content (TSC), suggesting that deleterious burden decreases trait fitness. Putatively deleterious variants explain roughly one-half of the genetic variance. However, there is only moderate improvement in total heritable variance explained for biomass (7.6%) and plant height (average of 3.1% across all stages). There is no advantage in total heritable variance for SLA and TSC. The contribution of putatively deleterious variants to phenotypic diversity therefore appears to be dependent on the genetic architecture of traits. Overall, these results suggest that incorporating putatively deleterious variants into genomic models slightly improves prediction accuracy because of extensive linkage. Knowledge of deleterious variants could be leveraged for sorghum breeding through either genome editing and/or conventional breeding that focuses on the selection of progeny with fewer deleterious alleles.

Journal ArticleDOI
01 Apr 2019-Genetics
TL;DR: Nested CRISPR is developed, a cloning-free ribonucleoprotein-driven method that robustly produces endogenous fluorescent reporters with EGFP, mCherry or wrmScarlet in Caenorhabditis elegans.
Abstract: CRISPR-based genome-editing methods in model organisms are evolving at an extraordinary speed. Whereas the generation of deletion or missense mutants is quite straightforward, the production of endogenous fluorescent reporters is more challenging. We have developed Nested CRISPR, a cloning-free ribonucleoprotein-driven method that robustly produces endogenous fluorescent reporters with EGFP, mCherry or wrmScarlet in Caenorhabditis elegans This method is based on the division of the fluorescent protein (FP) sequence in three fragments. In the first step, single-stranded DNA (ssDNA) donors (≤200 bp) are used to insert the 5' and 3' fragments of the FP in the locus of interest. In the second step, these sequences act as homology regions for homology-directed repair using a double-stranded DNA (dsDNA) donor (PCR product) containing the middle fragment, thus completing the FP sequence. In Nested CRISPR, the first step involving ssDNA donors is a well-established method that yields high editing efficiencies, and the second step is reliable because it uses universal CRISPR RNAs (crRNAs) and PCR products. We have also used Nested CRISPR in a nonessential gene to produce a deletion mutant in the first step and a transcriptional reporter in the second step. In the search for modifications to optimize the method, we tested synthetic single guide RNAs (sgRNAs), but did not observe a significant increase in efficiency. To streamline the approach, we combined all step 1 and step 2 reagents in a single injection and were successful in three of five loci tested with editing efficiencies of up to 20%. Finally, we discuss the prospects of this method in the future.

Journal ArticleDOI
01 Jan 2019-Genetics
TL;DR: The principles of fluorescence microscopy technologies from wide-field to Super-resolution microscopy and its application in the Drosophila research field are reviewed.
Abstract: The development of fluorescent labels and powerful imaging technologies in the last two decades has revolutionized the field of fluorescence microscopy, which is now widely used in diverse scientific fields from biology to biomedical and materials science. Fluorescence microscopy has also become a standard technique in research laboratories working on Drosophila melanogaster as a model organism. Here, we review the principles of fluorescence microscopy technologies from wide-field to Super-resolution microscopy and its application in the Drosophila research field.

Journal ArticleDOI
01 Dec 2019-Genetics
TL;DR: This model uses explicit forward simulations of a single trait with additive-effect mutations adapting to an “optimum shift” to show how reducing the mutational variance increases the magnitude of hitchhiking patterns.
Abstract: Predictions about the effect of natural selection on patterns of linked neutral variation are largely based on models involving the rapid fixation of unconditionally beneficial mutations. However, when phenotypes adapt to a new optimum trait value, the strength of selection on individual mutations decreases as the population adapts. Here, I use explicit forward simulations of a single trait with additive-effect mutations adapting to an "optimum shift." Detectable "hitchhiking" patterns are only apparent if (i) the optimum shifts are large with respect to equilibrium variation for the trait, (ii) mutation rates to large-effect mutations are low, and (iii) large-effect mutations rapidly increase in frequency and eventually reach fixation, which typically occurs after the population reaches the new optimum. For the parameters simulated here, partial sweeps do not appreciably affect patterns of linked variation, even when the mutations are strongly selected. The contribution of new mutations vs. standing variation to fixation depends on the mutation rate affecting trait values. Given the fixation of a strongly selected variant, patterns of hitchhiking are similar on average for the two classes of sweeps because sweeps from standing variation involving large-effect mutations are rare when the optimum shifts. The distribution of effect sizes of new mutations has little effect on the time to reach the new optimum, but reducing the mutational variance increases the magnitude of hitchhiking patterns. In general, populations reach the new optimum prior to the completion of any sweeps, and the times to fixation are longer for this model than for standard models of directional selection. The long fixation times are due to a combination of declining selection pressures during adaptation and the possibility of interference among weakly selected sites for traits with high mutation rates.