08 Oct 2016-bioRxiv (Cold Spring Harbor Laboratory)-pp 048819
TL;DR: It is demonstrated that the model of gene action, relating genotype to phenotype, has a qualitative effect on several relevant aspects of the population genetic architecture of a complex trait, and the genetic model impacts genetic variance component partitioning across the allele frequency spectrum and the power of statistical tests.
Abstract: The genetic component of complex disease risk in humans remains largely unexplained. A corollary is that the allelic spectrum of genetic variants contributing to complex disease risk is unknown. Theoretical models that relate population genetic processes to the maintenance of genetic variation for quantitative traits may suggest profitable avenues for future experimental design. Here we use forward simulation to model a genomic region evolving under a balance between recurrent deleterious mutation and Gaussian stabilizing selection. We consider multiple genetic and demographic models, and several different methods for identifying genomic regions harboring variants associated with complex disease risk. We demonstrate that the model of gene action, relating genotype to phenotype, has a qualitative effect on several relevant aspects of the population genetic architecture of a complex trait. In particular, the genetic model impacts genetic variance component partitioning across the allele frequency spectrum and the power of statistical tests. Models with partial recessivity closely match the minor allele frequency distribution of significant hits from empirical genome-wide association studies without requiring homozygous effect-sizes to be small. We highlight a particular gene-based model of incomplete recessivity that is appealing from first principles. Under that model, deleterious mutations in a genomic region partially fail to complement one another. This model of gene-based recessivity predicts the empirically observed inconsistency between twin and SNP based estimated of dominance heritability. Furthermore, this model predicts considerable levels of unexplained variance associated with intralocus epistasis. Our results suggest a need for improved statistical tools for region based genetic association and heritability estimation.
Risk for complex diseases in humans, such as diabetes and hypertension, is highly heritable yet the causal DNA sequence variants responsible for that risk remain largely unknown.
The authors have declared that no competing interests exist.
This model assigns an effect size to a mutant allele, but formally makes no concrete statement regarding the molecular nature of the allele.
When applied to molecular data, such as SNP genotypes in a GWAS, these models treat the SNPs themselves as the loci of interest.
Further, the model of gene-based recessivity best explains the differences between estimates of additive and dominance variance components from SNP-based methods [27] and from twin studies [28] and is consistent with the distribution of frequencies of significant associations in GWAS [4, 26].
Results and Discussion
The models As in [36],the authors simulate a 100 kilobase region of human genome, contributing to a complex disease phenotype and fitness.
The expected fitness effect of a mutation is always deleterious because trait effects are sampled from an exponential distribution.
Specifically, the authors studied three different genetic models and two different demographic models, holding the fitness model as a constant.
Parameters are briefly described in Table 1.
Parameter Description
This reflects the competing forces of increasing average genetic effect and decreasing average allele frequency which occurs as λ increases (S5 Fig).
Yet, when simulating iMR model, the authors find that an intermediate degree of dominance, h = 0.25, results in distribution of significant hits which is similar to the GBR results (Fig 4).
The reason the authors emphasize this feature of the data is to demonstrate that models with rare alleles of large effect do not necessarily imply a visual excess of rare significant GWAS hits.
Of the models the authors explored, only the gene-based recessive model with intermediate to large effects is consistent with the difference between twin and SNP based estimates of dominance variance (Fig 2).
Materials and Methods
Forward simulation Using the fwdpp template library v0.2.8 [87], the authors implemented a forward in time individualbased simulation of a Wright-Fisher population with mutation under the infinitely many sites model [88], recombination, and selection occurring each generation.
For comparison, the authors calculated the true total heritability in the sample as H2sample ¼ ðVG;sampleÞ=ðVP;sampleÞ.
Under MSGREMLd, many replicates resulted in numerical errors in GCTA.
For dizygotic (DZ) twins the authors used two child gamete pairs, each with a unique environmental deviate.
S10 Fig. Distribution of significant hits under site based recessive models with incomplete
Horizontal violin plots depict the distribution of minor allele frequencies (MAF) of the most strongly associated single marker in a GWAS.
Moments were calculated using the boost C++ statistical accumulators library.
Data are plotted as the mean across model replicates ± the standard error of the mean.
Shown are the additive co-dominant (AC), gene-based (GBR) and complete multiplicative recessive (Mult. recessive (h = 0); cMR) models.
(TIFF) S14 Fig. Regression Based Estimates of Genetic Variance.
S17 Fig. Additive genetic variance explained over allele frequency under the Tennessen
Et al. [40] model for European demography.
The left column of panels shows how VG changes over time under this model.
The burden ratio [91] is calculated as the ratio of genetic load between simulations with only ancient growth and those with an additional recent bottleneck and growth.
Here load is calculated as the average deviation from optimum fitness due to (left) fixed mutations, segregating mutations and all mutations.
For large effect size models, under which there are relatively more mutations that experience strong selection, the authors see the characteristic drop in the burden ratio following the bottleneck and rebound following re-expansion [91].
S21 Fig. Non-parametric comparison between empirical and simulated GWAS hits. A
Non-parametric comparison between distribution of allele frequencies between simulated and empirical GWAS hits.
In cases where more than one marker was tied for the lowest p-value, one was chosen at random.
Specific information regarding the empirical data can be obtained in S1 Table.
The dashed lines show the analytical result and the solid curves are empirical cumulative distribution functions based on a sample of 500 mutation effects from an exponential distribution.
Acknowledgments
The authors are thankful to Joseph Farran, Harry Mangalam, Adam Brenner, Garr Updegraff, and Edward Xia for administering the University of California, Irvine High Performance Computing cluster.
The authors are thankful to Kirk Lohmueller for helpful detailed comments throughout this project.
The authors would like to thank Peter Andolfatto, Bogdan Pasaniuc and Nick Mancuso for helpful discussion.
TL;DR: For the next few weeks the course is going to be exploring a field that’s actually older than classical population genetics, although the approach it’ll be taking to it involves the use of population genetic machinery.
Abstract: So far in this course we have dealt entirely with the evolution of characters that are controlled by simple Mendelian inheritance at a single locus. There are notes on the course website about gametic disequilibrium and how allele frequencies change at two loci simultaneously, but we didn’t discuss them. In every example we’ve considered we’ve imagined that we could understand something about evolution by examining the evolution of a single gene. That’s the domain of classical population genetics. For the next few weeks we’re going to be exploring a field that’s actually older than classical population genetics, although the approach we’ll be taking to it involves the use of population genetic machinery. If you know a little about the history of evolutionary biology, you may know that after the rediscovery of Mendel’s work in 1900 there was a heated debate between the “biometricians” (e.g., Galton and Pearson) and the “Mendelians” (e.g., de Vries, Correns, Bateson, and Morgan). Biometricians asserted that the really important variation in evolution didn’t follow Mendelian rules. Height, weight, skin color, and similar traits seemed to
TL;DR: Results provide strong empirical support for an important role for incomplete dominance of deleterious alleles in explaining heterosis and demonstrate the utility of incorporating functional annotation in phenotypic prediction and plant breeding.
Abstract: Complementation of deleterious alleles has long been proposed as a major contributor to the hybrid vigor observed in the offspring of inbred parents. We test this hypothesis using evolutionary measures of sequence conservation to ask whether incorporating information about putatively deleterious alleles can inform genomic selection (GS) models and improve phenotypic prediction. We measured a number of agronomic traits in both the inbred parents and hybrids of an elite maize partial diallel population and re-sequenced the parents of the population. Inbred elite maize lines vary for more than 500,000 putatively deleterious sites, but show less genetic load than a comparable set of inbred landraces. Our modeling reveals widespread evidence for incomplete dominance at these loci, and supports theoretical models that more damaging variants are usually more recessive. We identify haplotype blocks using an identity-by-decent (IBD) analysis and perform genomic prediction analyses in which we weight blocks on the basis of segregating putatively deleterious variants. Cross-validation results show that incorporating sequence conservation in genomic selection improves prediction accuracy for yield and several other traits as well as heterosis for those traits. Our results provide strong empirical support for an important role for incomplete dominance of deleterious alleles in explaining heterosis and demonstrate the utility of incorporating functional annotation in phenotypic prediction and plant breeding.
TL;DR: It is argued that increased effort in disease genetics theory, complementing experimental, and statistical efforts, will escalate the unraveling of molecular etiologies of complex diseases.
Abstract: Development of human genetics theoretical models and the integration of those models with experiment and statistical evaluation are critical for scientific progress. This perspective argues that increased effort in disease genetics theory, complementing experimental, and statistical efforts, will escalate the unraveling of molecular etiologies of complex diseases. In particular, the development of new, realistic disease genetics models will help elucidate complex disease pathogenesis, and the predicted patterns in genetic data made by these models will enable the concurrent, more comprehensive statistical testing of multiple aspects of disease genetics predictions, thereby better identifying disease loci. By theoretical human genetics, I intend to encompass all investigations devoted to modeling the heritable architecture underlying disease traits and studies of the resulting principles and dynamics of such models. Hence, the scope of theoretical disease genetics work includes construction and analysis of models describing how disease-predisposing alleles (1) arise, (2) are transmitted across families and populations, and (3) interact with other risk and protective alleles across both the genome and environmental factors to produce disease states. Theoretical work improves insight into viable genetic models of diseases consistent with empirical results from linkage, transmission, and association studies as well as population genetics. Furthermore, understanding the patterns of genetic data expected under realistic disease models will enable more powerful approaches to discover disease-predisposing alleles and additional heritable factors important in common diseases. In spite of the pivotal role of disease genetics theory, such investigation is not particularly vibrant.
9 citations
Additional excerpts
...Importantly, very recent results from simulations appear to favor incomplete recessivity models for complex trait etiologies, demonstrating consistency with both realistic population genetic models, heritability data, and GWAS findings (Sanjak et al., 2016)....
TL;DR: In this article, a comprehensive analysis of cancer predisposition in the UK Biobank cohort using a new gene-based method for detecting protein-coding genes that are functionally interpretable was conducted.
Abstract: The characterization of germline genetic variation affecting cancer risk, known as cancer predisposition, is fundamental to preventive and personalized medicine. Studies of genetic cancer predisposition typically identify significant genomic regions based on family-based cohorts or genome-wide association studies (GWAS). However, the results of such studies rarely provide biological insight or functional interpretation. In this study, we conducted a comprehensive analysis of cancer predisposition in the UK Biobank cohort using a new gene-based method for detecting protein-coding genes that are functionally interpretable. Specifically, we conducted proteome-wide association studies (PWAS) to identify genetic associations mediated by alterations to protein function. With PWAS, we identified 110 significant gene-cancer associations in 70 unique genomic regions across nine cancer types and pan-cancer. In 48 of the 110 PWAS associations (44%), estimated gene damage is associated with reduced rather than elevated cancer risk, suggesting a protective effect. Together with standard GWAS, we implicated 145 unique genomic loci with cancer risk. While most of these genomic regions are supported by external evidence, our results also highlight many novel loci. Based on the capacity of PWAS to detect non-additive genetic effects, we found that 46% of the PWAS-significant cancer regions exhibited exclusive recessive inheritance. These results highlight the importance of recessive genetic effects, without relying on familial studies. Finally, we show that many of the detected genes exert substantial cancer risk in the studied cohort determined by a quantitative functional description, suggesting their relevance for diagnosis and genetic consulting.
TL;DR: This work introduces PLINK, an open-source C/C++ WGAS tool set, and describes the five main domains of function: data management, summary statistics, population stratification, association analysis, and identity-by-descent estimation, which focuses on the estimation and use of identity- by-state and identity/descent information in the context of population-based whole-genome studies.
Abstract: Whole-genome association studies (WGAS) bring new computational, as well as analytic, challenges to researchers. Many existing genetic-analysis tools are not designed to handle such large data sets in a convenient manner and do not necessarily exploit the new opportunities that whole-genome data bring. To address these issues, we developed PLINK, an open-source C/C++ WGAS tool set. With PLINK, large data sets comprising hundreds of thousands of markers genotyped for thousands of individuals can be rapidly manipulated and analyzed in their entirety. As well as providing tools to make the basic analytic steps computationally efficient, PLINK also supports some novel approaches to whole-genome data that take advantage of whole-genome coverage. We introduce PLINK and describe the five main domains of function: data management, summary statistics, population stratification, association analysis, and identity-by-descent estimation. In particular, we focus on the estimation and use of identity-by-state and identity-by-descent information in the context of population-based whole-genome studies. This information can be used to detect and correct for population stratification and to identify extended chromosomal segments that are shared identical by descent between very distantly related individuals. Analysis of the patterns of segmental sharing has the potential to map disease loci that contain multiple rare variants in a population-based linkage analysis.
TL;DR: The genetic constitution of a population: Hardy-Weinberg equilibrium and changes in gene frequency: migration mutation, changes of variance, and heritability are studied.
Abstract: Part 1 Genetic constitution of a population: Hardy-Weinberg equilibrium. Part 2 Changes in gene frequency: migration mutation. Part 3 Small populations - changes in gene frequency under simplified conditions. Part 4 Small populations - less simplified conditions. Part 5 Small populations - pedigreed populations and close inbreeding. Part 6 Continuous variation. Part 7 Values and means. Part 8 Variance. Part 9 Resemblance between relatives. Part 10 Heritability. Part 11 Selection - the response and its prediction. Part 12 Selection - the results of experiments. Part 13 Selection - information from relatives. Part 14 Inbreeding and crossbreeding - changes of mean value. Part 15 Inbreeding and crossbreeding - changes of variance. Part 16 Inbreeding and crossbreeding - applications. Part 17 Scale. Part 18 Threshold characters. Part 19 Correlated characters. Part 20 Metric characters under natural selection.
TL;DR: For the next few weeks the course is going to be exploring a field that’s actually older than classical population genetics, although the approach it’ll be taking to it involves the use of population genetic machinery.
Abstract: So far in this course we have dealt entirely with the evolution of characters that are controlled by simple Mendelian inheritance at a single locus. There are notes on the course website about gametic disequilibrium and how allele frequencies change at two loci simultaneously, but we didn’t discuss them. In every example we’ve considered we’ve imagined that we could understand something about evolution by examining the evolution of a single gene. That’s the domain of classical population genetics. For the next few weeks we’re going to be exploring a field that’s actually older than classical population genetics, although the approach we’ll be taking to it involves the use of population genetic machinery. If you know a little about the history of evolutionary biology, you may know that after the rediscovery of Mendel’s work in 1900 there was a heated debate between the “biometricians” (e.g., Galton and Pearson) and the “Mendelians” (e.g., de Vries, Correns, Bateson, and Morgan). Biometricians asserted that the really important variation in evolution didn’t follow Mendelian rules. Height, weight, skin color, and similar traits seemed to
TL;DR: This study has demonstrated that careful use of a shared control group represents a safe and effective approach to GWA analyses of multiple disease phenotypes; generated a genome-wide genotype database for future studies of common diseases in the British population; and shown that, provided individuals with non-European ancestry are excluded, the extent of population stratification in theBritish population is generally modest.
Abstract: There is increasing evidence that genome-wide association ( GWA) studies represent a powerful approach to the identification of genes involved in common human diseases. We describe a joint GWA study ( using the Affymetrix GeneChip 500K Mapping Array Set) undertaken in the British population, which has examined similar to 2,000 individuals for each of 7 major diseases and a shared set of similar to 3,000 controls. Case-control comparisons identified 24 independent association signals at P < 5 X 10(-7): 1 in bipolar disorder, 1 in coronary artery disease, 9 in Crohn's disease, 3 in rheumatoid arthritis, 7 in type 1 diabetes and 3 in type 2 diabetes. On the basis of prior findings and replication studies thus-far completed, almost all of these signals reflect genuine susceptibility effects. We observed association at many previously identified loci, and found compelling evidence that some loci confer risk for more than one of the diseases studied. Across all diseases, we identified a large number of further signals ( including 58 loci with single-point P values between 10(-5) and 5 X 10(-7)) likely to yield additional susceptibility loci. The importance of appropriately large samples was confirmed by the modest effect sizes observed at most loci identified. This study thus represents a thorough validation of the GWA approach. It has also demonstrated that careful use of a shared control group represents a safe and effective approach to GWA analyses of multiple disease phenotypes; has generated a genome-wide genotype database for future studies of common diseases in the British population; and shown that, provided individuals with non-European ancestry are excluded, the extent of population stratification in the British population is generally modest. Our findings offer new avenues for exploring the pathophysiology of these important disorders. We anticipate that our data, results and software, which will be widely available to other investigators, will provide a powerful resource for human genetics research.
9,244 citations
"A model of compound heterozygous, l..." refers methods in this paper
...These observations concerning the GBR model are consistent with the finding of [27] that dominance effects of SNPs do not contribute significantly to the heritability for complex traits....
[...]
...For example, influential power studies informing the design of GWAS assign effect sizes directly to SNPs and assume Risch's model of multiplicative epistasis [32]....
[...]
...A weakness of the multiplicative epistasis model [30, 31] when applied to SNPs is that the concept of a gene, defined as a physical region where loss-of-function mutations have the same phenotype [35], is lost....
[...]
...43 Similarly, the single-marker logistic regression used as the primary analysis of GWAS data typically assumes 44 an additive or recessive model at the level of individual SNPs [33]....
[...]
...Instead, the bias shown for large values of λ is likely due to the presence of substantial non-additive heritability, which is not captured by the dominance effects of SNPs....
TL;DR: Although it is true that most text-books of genetics open with a chapter on biometry, closer inspection will reveal that this has little connexion with the body of the work, and that more often than not it is merely belated homage to a once fashionable study.
Abstract: PROBABLY most geneticists to-day are some-what sceptical as to the value of the mathematical treatment of their problems. With the deepest respect, and even awe, for that association of complex symbols and human genius that can bring a universe to heel, they are nevertheless content to let it stand at that, believing that in their own particular line it is, after all, plodding that does it. Although it is true that most text-books of genetics open with a chapter on biometry, closer inspection will reveal that this has little connexion with the body of the work, and that more often than not it is merely belated homage to a once fashionable study. The Genetical Theory of Natural Selection. Dr. R. A. Fisher. Pp. xiv + 272 + 2 plates. (Oxford: Clarendon Press; London: Oxford University Press, 1930.) 17s. 6d. net.
7,883 citations
"A model of compound heterozygous, l..." refers background in this paper
...Well-tagged intermediate frequency variants may not reach 9 genome-wide significance in an association study if they have smaller effect sizes [9,10]....