A model of compound heterozygous, loss-of-function alleles is broadly consistent with observations from complex-disease GWAS datasets
Summary (2 min read)
- Risk for complex diseases in humans, such as diabetes and hypertension, is highly heritable yet the causal DNA sequence variants responsible for that risk remain largely unknown.
- The authors have declared that no competing interests exist.
- This model assigns an effect size to a mutant allele, but formally makes no concrete statement regarding the molecular nature of the allele.
- When applied to molecular data, such as SNP genotypes in a GWAS, these models treat the SNPs themselves as the loci of interest.
- Further, the model of gene-based recessivity best explains the differences between estimates of additive and dominance variance components from SNP-based methods  and from twin studies  and is consistent with the distribution of frequencies of significant associations in GWAS [4, 26].
Results and Discussion
- The models As in ,the authors simulate a 100 kilobase region of human genome, contributing to a complex disease phenotype and fitness.
- The expected fitness effect of a mutation is always deleterious because trait effects are sampled from an exponential distribution.
- Specifically, the authors studied three different genetic models and two different demographic models, holding the fitness model as a constant.
- Parameters are briefly described in Table 1.
- This reflects the competing forces of increasing average genetic effect and decreasing average allele frequency which occurs as λ increases (S5 Fig).
- Yet, when simulating iMR model, the authors find that an intermediate degree of dominance, h = 0.25, results in distribution of significant hits which is similar to the GBR results (Fig 4).
- The reason the authors emphasize this feature of the data is to demonstrate that models with rare alleles of large effect do not necessarily imply a visual excess of rare significant GWAS hits.
- Of the models the authors explored, only the gene-based recessive model with intermediate to large effects is consistent with the difference between twin and SNP based estimates of dominance variance (Fig 2).
Materials and Methods
- Forward simulation Using the fwdpp template library v0.2.8 , the authors implemented a forward in time individualbased simulation of a Wright-Fisher population with mutation under the infinitely many sites model , recombination, and selection occurring each generation.
- For comparison, the authors calculated the true total heritability in the sample as H2sample ¼ ðVG;sampleÞ=ðVP;sampleÞ.
- Under MSGREMLd, many replicates resulted in numerical errors in GCTA.
- For dizygotic (DZ) twins the authors used two child gamete pairs, each with a unique environmental deviate.
S10 Fig. Distribution of significant hits under site based recessive models with incomplete
- Horizontal violin plots depict the distribution of minor allele frequencies (MAF) of the most strongly associated single marker in a GWAS.
- Moments were calculated using the boost C++ statistical accumulators library.
- Data are plotted as the mean across model replicates ± the standard error of the mean.
- Shown are the additive co-dominant (AC), gene-based (GBR) and complete multiplicative recessive (Mult. recessive (h = 0); cMR) models.
- (TIFF) S14 Fig. Regression Based Estimates of Genetic Variance.
S17 Fig. Additive genetic variance explained over allele frequency under the Tennessen
- Et al.  model for European demography.
- The left column of panels shows how VG changes over time under this model.
- The burden ratio  is calculated as the ratio of genetic load between simulations with only ancient growth and those with an additional recent bottleneck and growth.
- Here load is calculated as the average deviation from optimum fitness due to (left) fixed mutations, segregating mutations and all mutations.
- For large effect size models, under which there are relatively more mutations that experience strong selection, the authors see the characteristic drop in the burden ratio following the bottleneck and rebound following re-expansion .
S21 Fig. Non-parametric comparison between empirical and simulated GWAS hits. A
- Non-parametric comparison between distribution of allele frequencies between simulated and empirical GWAS hits.
- In cases where more than one marker was tied for the lowest p-value, one was chosen at random.
- Specific information regarding the empirical data can be obtained in S1 Table.
- The dashed lines show the analytical result and the solid curves are empirical cumulative distribution functions based on a sample of 500 mutation effects from an exponential distribution.
- The authors are thankful to Joseph Farran, Harry Mangalam, Adam Brenner, Garr Updegraff, and Edward Xia for administering the University of California, Irvine High Performance Computing cluster.
- The authors are thankful to Kirk Lohmueller for helpful detailed comments throughout this project.
- The authors would like to thank Peter Andolfatto, Bogdan Pasaniuc and Nick Mancuso for helpful discussion.
Did you find this useful? Give us your feedback
...Importantly, very recent results from simulations appear to favor incomplete recessivity models for complex trait etiologies, demonstrating consistency with both realistic population genetic models, heritability data, and GWAS findings (Sanjak et al., 2016)....
"A model of compound heterozygous, l..." refers methods in this paper
...These observations concerning the GBR model are consistent with the finding of  that dominance effects of SNPs do not contribute significantly to the heritability for complex traits....
...For example, influential power studies informing the design of GWAS assign effect sizes directly to SNPs and assume Risch's model of multiplicative epistasis ....
...A weakness of the multiplicative epistasis model [30, 31] when applied to SNPs is that the concept of a gene, defined as a physical region where loss-of-function mutations have the same phenotype , is lost....
...43 Similarly, the single-marker logistic regression used as the primary analysis of GWAS data typically assumes 44 an additive or recessive model at the level of individual SNPs ....
...Instead, the bias shown for large values of λ is likely due to the presence of substantial non-additive heritability, which is not captured by the dominance effects of SNPs....
"A model of compound heterozygous, l..." refers background in this paper
...Well-tagged intermediate frequency variants may not reach 9 genome-wide significance in an association study if they have smaller effect sizes [9,10]....