A model of compound heterozygous, loss-of-function alleles is broadly consistent with observations from complex-disease GWAS datasets
Summary (2 min read)
Introduction
- Risk for complex diseases in humans, such as diabetes and hypertension, is highly heritable yet the causal DNA sequence variants responsible for that risk remain largely unknown.
- The authors have declared that no competing interests exist.
- This model assigns an effect size to a mutant allele, but formally makes no concrete statement regarding the molecular nature of the allele.
- When applied to molecular data, such as SNP genotypes in a GWAS, these models treat the SNPs themselves as the loci of interest.
- Further, the model of gene-based recessivity best explains the differences between estimates of additive and dominance variance components from SNP-based methods [27] and from twin studies [28] and is consistent with the distribution of frequencies of significant associations in GWAS [4, 26].
Results and Discussion
- The models As in [36],the authors simulate a 100 kilobase region of human genome, contributing to a complex disease phenotype and fitness.
- The expected fitness effect of a mutation is always deleterious because trait effects are sampled from an exponential distribution.
- Specifically, the authors studied three different genetic models and two different demographic models, holding the fitness model as a constant.
- Parameters are briefly described in Table 1.
Parameter Description
- This reflects the competing forces of increasing average genetic effect and decreasing average allele frequency which occurs as λ increases (S5 Fig).
- Yet, when simulating iMR model, the authors find that an intermediate degree of dominance, h = 0.25, results in distribution of significant hits which is similar to the GBR results (Fig 4).
- The reason the authors emphasize this feature of the data is to demonstrate that models with rare alleles of large effect do not necessarily imply a visual excess of rare significant GWAS hits.
- Of the models the authors explored, only the gene-based recessive model with intermediate to large effects is consistent with the difference between twin and SNP based estimates of dominance variance (Fig 2).
Materials and Methods
- Forward simulation Using the fwdpp template library v0.2.8 [87], the authors implemented a forward in time individualbased simulation of a Wright-Fisher population with mutation under the infinitely many sites model [88], recombination, and selection occurring each generation.
- For comparison, the authors calculated the true total heritability in the sample as H2sample ¼ ðVG;sampleÞ=ðVP;sampleÞ.
- Under MSGREMLd, many replicates resulted in numerical errors in GCTA.
- For dizygotic (DZ) twins the authors used two child gamete pairs, each with a unique environmental deviate.
S10 Fig. Distribution of significant hits under site based recessive models with incomplete
- Horizontal violin plots depict the distribution of minor allele frequencies (MAF) of the most strongly associated single marker in a GWAS.
- Moments were calculated using the boost C++ statistical accumulators library.
- Data are plotted as the mean across model replicates ± the standard error of the mean.
- Shown are the additive co-dominant (AC), gene-based (GBR) and complete multiplicative recessive (Mult. recessive (h = 0); cMR) models.
- (TIFF) S14 Fig. Regression Based Estimates of Genetic Variance.
S17 Fig. Additive genetic variance explained over allele frequency under the Tennessen
- Et al. [40] model for European demography.
- The left column of panels shows how VG changes over time under this model.
- The burden ratio [91] is calculated as the ratio of genetic load between simulations with only ancient growth and those with an additional recent bottleneck and growth.
- Here load is calculated as the average deviation from optimum fitness due to (left) fixed mutations, segregating mutations and all mutations.
- For large effect size models, under which there are relatively more mutations that experience strong selection, the authors see the characteristic drop in the burden ratio following the bottleneck and rebound following re-expansion [91].
S21 Fig. Non-parametric comparison between empirical and simulated GWAS hits. A
- Non-parametric comparison between distribution of allele frequencies between simulated and empirical GWAS hits.
- In cases where more than one marker was tied for the lowest p-value, one was chosen at random.
- Specific information regarding the empirical data can be obtained in S1 Table.
- The dashed lines show the analytical result and the solid curves are empirical cumulative distribution functions based on a sample of 500 mutation effects from an exponential distribution.
Acknowledgments
- The authors are thankful to Joseph Farran, Harry Mangalam, Adam Brenner, Garr Updegraff, and Edward Xia for administering the University of California, Irvine High Performance Computing cluster.
- The authors are thankful to Kirk Lohmueller for helpful detailed comments throughout this project.
- The authors would like to thank Peter Andolfatto, Bogdan Pasaniuc and Nick Mancuso for helpful discussion.
Did you find this useful? Give us your feedback
Citations
9,847 citations
521 citations
16 citations
9 citations
Additional excerpts
...Importantly, very recent results from simulations appear to favor incomplete recessivity models for complex trait etiologies, demonstrating consistency with both realistic population genetic models, heritability data, and GWAS findings (Sanjak et al., 2016)....
[...]
9 citations
References
157 citations
154 citations
"A model of compound heterozygous, l..." refers result in this paper
...5 (Gaussian function is greater than or equal to its quadratic 209 approximation), which is consistent with recent attempts at estimating that parameter [20, 65]....
[...]
151 citations
"A model of compound heterozygous, l..." refers background or methods or result in this paper
...The exact relationship between rare alleles [4, 17, 26,62,63], and the demographic 201 and/or selective scenarios from which they arose [21, 22, 64], and the genetic architecture of common complex 202 diseases in humans is an active area of research....
[...]
...However, our findings contrast with those of Zuk [24] and agree with those of Lohmueller [22], in that we predict that population expansion will substantially increase the heritability, or portion of genetic variance, that is due to rare variants....
[...]
...In agreement with [22,73], we 323 predict that population growth reduces the power to associate variants in a causal gene region with disease 324 status (Fig 3) when the disease also impacts evolutionary fitness....
[...]
...4 showing that the genetic load is approximately unaffected by changes in population size over time, [21, 22]....
[...]
...However, this simple model allows us to more easily 414 get a sense of the impact of population expansion [21,22]....
[...]
149 citations
"A model of compound heterozygous, l..." refers result in this paper
...5 (Gaussian function is greater than or equal to its quadratic 209 approximation), which is consistent with recent attempts at estimating that parameter [20, 65]....
[...]
149 citations