scispace - formally typeset
Search or ask a question
Posted ContentDOI

A model of compound heterozygous, loss-of-function alleles is broadly consistent with observations from complex-disease GWAS datasets

08 Oct 2016-bioRxiv (Cold Spring Harbor Laboratory)-pp 048819
TL;DR: It is demonstrated that the model of gene action, relating genotype to phenotype, has a qualitative effect on several relevant aspects of the population genetic architecture of a complex trait, and the genetic model impacts genetic variance component partitioning across the allele frequency spectrum and the power of statistical tests.
Abstract: The genetic component of complex disease risk in humans remains largely unexplained. A corollary is that the allelic spectrum of genetic variants contributing to complex disease risk is unknown. Theoretical models that relate population genetic processes to the maintenance of genetic variation for quantitative traits may suggest profitable avenues for future experimental design. Here we use forward simulation to model a genomic region evolving under a balance between recurrent deleterious mutation and Gaussian stabilizing selection. We consider multiple genetic and demographic models, and several different methods for identifying genomic regions harboring variants associated with complex disease risk. We demonstrate that the model of gene action, relating genotype to phenotype, has a qualitative effect on several relevant aspects of the population genetic architecture of a complex trait. In particular, the genetic model impacts genetic variance component partitioning across the allele frequency spectrum and the power of statistical tests. Models with partial recessivity closely match the minor allele frequency distribution of significant hits from empirical genome-wide association studies without requiring homozygous effect-sizes to be small. We highlight a particular gene-based model of incomplete recessivity that is appealing from first principles. Under that model, deleterious mutations in a genomic region partially fail to complement one another. This model of gene-based recessivity predicts the empirically observed inconsistency between twin and SNP based estimated of dominance heritability. Furthermore, this model predicts considerable levels of unexplained variance associated with intralocus epistasis. Our results suggest a need for improved statistical tools for region based genetic association and heritability estimation.

Summary (2 min read)

Introduction

  • Risk for complex diseases in humans, such as diabetes and hypertension, is highly heritable yet the causal DNA sequence variants responsible for that risk remain largely unknown.
  • The authors have declared that no competing interests exist.
  • This model assigns an effect size to a mutant allele, but formally makes no concrete statement regarding the molecular nature of the allele.
  • When applied to molecular data, such as SNP genotypes in a GWAS, these models treat the SNPs themselves as the loci of interest.
  • Further, the model of gene-based recessivity best explains the differences between estimates of additive and dominance variance components from SNP-based methods [27] and from twin studies [28] and is consistent with the distribution of frequencies of significant associations in GWAS [4, 26].

Results and Discussion

  • The models As in [36],the authors simulate a 100 kilobase region of human genome, contributing to a complex disease phenotype and fitness.
  • The expected fitness effect of a mutation is always deleterious because trait effects are sampled from an exponential distribution.
  • Specifically, the authors studied three different genetic models and two different demographic models, holding the fitness model as a constant.
  • Parameters are briefly described in Table 1.

Parameter Description

  • This reflects the competing forces of increasing average genetic effect and decreasing average allele frequency which occurs as λ increases (S5 Fig).
  • Yet, when simulating iMR model, the authors find that an intermediate degree of dominance, h = 0.25, results in distribution of significant hits which is similar to the GBR results (Fig 4).
  • The reason the authors emphasize this feature of the data is to demonstrate that models with rare alleles of large effect do not necessarily imply a visual excess of rare significant GWAS hits.
  • Of the models the authors explored, only the gene-based recessive model with intermediate to large effects is consistent with the difference between twin and SNP based estimates of dominance variance (Fig 2).

Materials and Methods

  • Forward simulation Using the fwdpp template library v0.2.8 [87], the authors implemented a forward in time individualbased simulation of a Wright-Fisher population with mutation under the infinitely many sites model [88], recombination, and selection occurring each generation.
  • For comparison, the authors calculated the true total heritability in the sample as H2sample ¼ ðVG;sampleÞ=ðVP;sampleÞ.
  • Under MSGREMLd, many replicates resulted in numerical errors in GCTA.
  • For dizygotic (DZ) twins the authors used two child gamete pairs, each with a unique environmental deviate.

S10 Fig. Distribution of significant hits under site based recessive models with incomplete

  • Horizontal violin plots depict the distribution of minor allele frequencies (MAF) of the most strongly associated single marker in a GWAS.
  • Moments were calculated using the boost C++ statistical accumulators library.
  • Data are plotted as the mean across model replicates ± the standard error of the mean.
  • Shown are the additive co-dominant (AC), gene-based (GBR) and complete multiplicative recessive (Mult. recessive (h = 0); cMR) models.
  • (TIFF) S14 Fig. Regression Based Estimates of Genetic Variance.

S17 Fig. Additive genetic variance explained over allele frequency under the Tennessen

  • Et al. [40] model for European demography.
  • The left column of panels shows how VG changes over time under this model.
  • The burden ratio [91] is calculated as the ratio of genetic load between simulations with only ancient growth and those with an additional recent bottleneck and growth.
  • Here load is calculated as the average deviation from optimum fitness due to (left) fixed mutations, segregating mutations and all mutations.
  • For large effect size models, under which there are relatively more mutations that experience strong selection, the authors see the characteristic drop in the burden ratio following the bottleneck and rebound following re-expansion [91].

S21 Fig. Non-parametric comparison between empirical and simulated GWAS hits. A

  • Non-parametric comparison between distribution of allele frequencies between simulated and empirical GWAS hits.
  • In cases where more than one marker was tied for the lowest p-value, one was chosen at random.
  • Specific information regarding the empirical data can be obtained in S1 Table.
  • The dashed lines show the analytical result and the solid curves are empirical cumulative distribution functions based on a sample of 500 mutation effects from an exponential distribution.

Acknowledgments

  • The authors are thankful to Joseph Farran, Harry Mangalam, Adam Brenner, Garr Updegraff, and Edward Xia for administering the University of California, Irvine High Performance Computing cluster.
  • The authors are thankful to Kirk Lohmueller for helpful detailed comments throughout this project.
  • The authors would like to thank Peter Andolfatto, Bogdan Pasaniuc and Nick Mancuso for helpful discussion.

Did you find this useful? Give us your feedback

Figures (5)

Content maybe subject to copyright    Report

RESEARCH ARTICLE
A Model of Compound Heterozygous, Loss-of-
Function Alleles Is Broadly Consistent with
Observations from Complex-Disease GWAS
Datasets
Jaleal S. Sanjak
1,2
*, Anthony D. Long
1,2
, Kevin R. Thornton
1,2
*
1 Department of Ecology and Evolutionary Biology, University of California, Irvine, Irvine, California, USA,
2 Center for Complex Biological Systems, University of California, Irvine, Irvine, California, USA
* jsanjak@uci.edu (JSS); krthornt@uci.edu (KRT)
Abstract
The genetic component of complex disease risk in humans remains largely unexplained. A
corollary is that the allelic spectrum of genetic variants contributing to complex disease risk
is unknown. Theoretical models that relate population genetic processes to the maintenance
of genetic variation for quantitative traits may suggest profitable avenues for future experi-
mental design. Here we use forward simulation to model a genomic region evolving under a
balance between recurrent deleterious mutation and Gaussian stabilizing selection. We
consider multiple genetic and demographic models, and several different methods for identi-
fying genomic regions harboring variants associated with complex disease risk. We demon-
strate that the model of gene action, relating genotype to phenotype, has a qualitative effect
on several relevant aspects of the population genetic architecture of a complex trait. In par-
ticular, the genetic model impacts genetic variance component partitioning across the allele
frequency spectrum and the power of statistical tests. Models with partial recessivity closely
match the minor allele frequency distribution of significant hits from empirical genome-wide
association studies without requiring homozygous effect sizes to be small. We highlight a
particular gene-based model of incomplete recessivity that is appealing from first principles.
Under that model, deleterious mutations in a genomic region partially fail to complement
one another. This model of gene-based recessivity predicts the empirically observed incon-
sistency between twin and SNP based estimated of dominance heritability. Furthermore,
this model predicts considerable levels of unexplained variance associated with intralocus
epistasis. Our results suggest a need for improved statistical tools for region based genetic
association and heritability estimation.
Author Summary
Gene action determines how mutations affect phenotype. When placed in an evolutionary
context, the details of the genotype-to-phenotype model can impact the maintenance of
genetic variation for complex traits. Likewise, non-equilibrium demographic history may
PLOS Genetics | DOI:10.1371/journal.pgen.1006573 January 19, 2017 1 / 30
a1111111111
a1111111111
a1111111111
a1111111111
a1111111111
OPEN ACCESS
Citation: Sanjak JS, Long AD, Thornton KR (2017)
A Model of Compound Heterozygous, Loss-of-
Function Alleles Is Broadly Consistent with
Observations from Complex-Disease GWAS
Datasets. PLoS Genet 13(1): e1006573.
doi:10.1371/journal.pgen.1006573
Editor: Simon Gravel, McGill University, CANADA
Received: April 18, 2016
Accepted: January 5, 2017
Published: January 19, 2017
Copyright: © 2017 Sanjak et al. This is an open
access article distributed under the terms of the
Creative Commons Attribution License, which
permits unrestricted use, distribution, and
reproduction in any medium, provided the original
author and source are credited.
Data Availability Statement: Our simulation code
and code for downstream analyses are freely
available at: http://github.com/ThorntonLab/
disease_sims, http://github.com/molpopgen/
buRden, http://github.com/molpopgen/fwdpy, and
http://github.com/molpopgen/TennessenEAonly.
Funding: This work was supported by NIH grant
R01-GM115564 to KRT. This work was supported
by NIH grant R01-GM115562 to ADL. This material
is based upon work supported by the National
Science Foundation Graduate Research Fellowship
Program under Grant No. DGE-1321846. Any

affect patterns of genetic variation. Here, we explore the impact of genetic model and pop-
ulation growth on distribution of genetic variance across the allele frequency spectrum
underlying risk for a complex disease. Using forward-in-time population genetic simula-
tions, we show that the genetic model has important impacts on the composition of
variation for complex disease risk in a population. We explicitly simulate genome-wide
association studies (GWAS) and perform heritability estimation on population samples. A
particular model of gene-based partial recessivity, based on allelic non-complementation,
aligns well with empirical results. This model is congruent with the dominance variance
estimates from both SNPs and twins, and the minor allele frequency distribution of
GWAS hits.
Introduction
Risk for complex diseases in humans, such as diabetes and hypertension, is highly heritable yet
the causal DNA sequence variants responsible for that risk remain largely unknown. Genome-
wide association studies (GWAS) have found many genetic markers associated with disease
risk [1]. However, follow-up studies have shown that these markers explain only a small por-
tion of the total heritability for most traits [2, 3].
There are many hypotheses which attempt to explain the ‘missing heritability’ problem [2
5]. Genetic variance due to epistatic or gene-by-environment interactions is difficult to identify
statistically because of, among other reasons, increased multiple hypothesis testing burden [6,
7], and could artificially inflate estimates of broad-sense heritability [8]. Well-tagged interme-
diate frequency variants may not reach genome-wide significance in an association study if
they have smaller effect sizes [9, 10]. One appealing verbal hypothesis for this ‘missing herita-
bility’ is that there are rare causal alleles of large effect that are difficult to detect [4, 11, 12].
These hypotheses are not mutually exclusive, and it is probable that a combination of models
will be needed to explain all heritable disease risk [13].
The standard GWAS attempts to identify genetic polymorphisms that differ in frequency
between cases and controls. A complementary approach is to estimate the heritability
explained by genotyped (and imputed) markers (SNPs) under different population sampling
schemes [14, 15]. Stratifying markers by minor allele frequency (MAF) prior to performing
SNP-based heritability estimation allows the partitioning of genetic variation across the allele
frequency spectrum to be estimated [16], which is an important summary of the genetic archi-
tecture of a complex trait [1623]. This approach has inferred a contribution of rare alleles to
genetic variance in both human height and body mass index (BMI) [16], consistent with theo-
retical work showing that rare alleles will have large effect sizes if fitness effects and trait effects
are correlated [18, 2025]. Yet, simulations of causal loci harboring multiple rare variants with
large additive effects predict an excess of low-frequency significant markers relative to empiri-
cal findings [4, 26].
SNP-based heritability estimates have concluded that there is little missing heritability for
height and BMI, and that the causal loci simply have effect sizes that are too small to reach
genome-wide significance under current GWAS sample sizes [14, 16]. Further, extensions to
these methods decompose genetic variance into additive and dominance components and find
that dominance variance is approximately one fifth of the additive genetic variance on average
across seventy-nine complex traits [27]. When taken into account together with results from
GWAS, these observations can be interpreted as evidence that the genetic architecture of
human traits is best-explained by a model of small additive effects. However, a recent large
Compound Heterozygosity and Complex Traits
PLOS Genetics | DOI:10.1371/journal.pgen.1006573 January 19, 2017 2 / 30
opinions, findings, and conclusions or
recommendations expressed in this material are
those of the authors and do not necessarily reflect
the views of the National Science Foundation. The
funders had no role in study design, data collection
and analysis, decision to publish, or preparation of
the manuscript.
Competing Interests: The authors have declared
that no competing interests exist.

twin study found a substantial contribution of dominance variance for fourteen out of eighteen
traits [28]. The reason for this discrepancy in results remains unclear. One possibility is a sta-
tistical artifact; for example, twin studies may be prone to mistakenly infer non-additive effects
when none exist. Another possibility, which we return to later, is that this apparently contra-
dictory results are expected under a different model of gene action.
The design, analysis, and interpretation of GWAS are heavily influenced by the “standard
model” of quantitative genetics [29]. This model assigns an effect size to a mutant allele, but
formally makes no concrete statement regarding the molecular nature of the allele. Early appli-
cations of this model to the problem of human complex traits include Risch’s work on the
power to detect causal mutations [30, 31] and Pritchard’s work showing that rare alleles under
purifying selection may contribute to heritable variation in complex traits [17]. When applied
to molecular data, such as SNP genotypes in a GWAS, these models treat the SNPs themselves
as the loci of interest. For example, influential power studies informing the design of GWAS
assign effect sizes directly to SNPs and assume Risch’s model of multiplicative epistasis [32].
Similarly, the single-marker logistic regression used as the primary analysis of GWAS data
typically assumes an additive or recessive model at the level of individual SNPs [33]. Finally,
recent methods designed to estimate the heritability of a trait explained by genotyped markers
assigns additive and dominance effects directly to SNPs [14, 16, 27, 34]. Naturally, the results
of such analyses are interpreted in light of the assumed model of gene action.
A weakness of the multiplicative epistasis model [30, 31] when applied to SNPs is that the
concept of a gene, defined as a physical region where loss-of-function mutations have the same
phenotype [35], is lost. Specifically, under the standard model, the genetic concept of a failure
to complement is a property of SNPs and not “gene regions” (see [36] for a detailed discussion
of this issue). We have recently introduced an alternative model of gene action, one in which
risk mutations are unconditionally deleterious and fail to complement at the level of a “gene
region” [36]. This model, influenced by the standard operational definition of a gene [35],
gives rise to the sort of allelic heterogeneity typically observed for human Mendelian diseases
[37], and to a distribution of GWAS “hit” minor allele frequencies [4, 26] consistent with
empirical results [36]. In this article, we explore this “gene-based” model under more complex
demographic scenarios as well as its properties with respect to the estimation of variance com-
ponents using SNP-based approaches [34] and twin studies. We also compare this model to
the standard models of strictly additive co-dominant effects, and multiplicative epistasis with
dominance.
We further explore the power of several association tests to detect a causal gene region
under each genetic and demographic model. We find significant heterogeneity in the perfor-
mance of burden tests [36, 38, 39] across models of the trait and demographic history. We find
that population expansion reduces the power to detect causal gene-regions due to an increase
in rare variation, in agreement with work by [22, 23]. The behavior of the tests under different
models provides us with insight as to the circumstances in which each test is best suited.
In total, our results show that modeling gene action is key to modeling GWAS, and thus
plays an important role in both the design and interpretation of such studies. Further, the
model of gene-based recessivity best explains the differences between estimates of additive and
dominance variance components from SNP-based methods [27] and from twin studies [28]
and is consistent with the distribution of frequencies of significant associations in GWAS [4,
26]. Further, the genetic model plays a much more important role than the demographic
model, which is expected based on previous work on additive models showing that the genetic
load is approximately unaffected by changes in population size over time, [21, 22]. Consistent
with recent work by [23], we find that rapid population growth in the recent past increases the
contribution of rare variants to total genetic variance. However, we show here that different
Compound Heterozygosity and Complex Traits
PLOS Genetics | DOI:10.1371/journal.pgen.1006573 January 19, 2017 3 / 30

models of gene action are qualitatively different with respect to the partitioning of genetic vari-
ance across the allele frequency spectrum. We also show that these conclusions hold under the
more complex demographic models that have been proposed for human populations [21, 40].
Results and Discussion
The models
As in [36],we simulate a 100 kilobase region of human genome, contributing to a complex dis-
ease phenotype and fitness. The region evolves forward in time subject to neutral and deleteri-
ous mutation, recombination, selection, and drift. To perform genetic association and
heritability estimation studies in silico, we need to impose a trait onto simulated individuals. In
doing so, we introduce strong assumptions about the molecular underpinnings of a trait and
its evolutionary context.
How does the molecular genetic basis of a trait under natural selection influence population
genetic signatures in the genome? This question is very broad, and therefore it was necessary
to restrict ourselves to a small subset of molecular and evolutionary scenarios. We analyzed a
set of approaches to modeling a single gene region experiencing recurrent unconditionally-
deleterious mutation contributing to a quantitative trait subject to Gaussian stabilizing selec-
tion. The expected fitness effect of a mutation is always deleterious because trait effects are
sampled from an exponential distribution. Therefore, we do not allow for compensatory muta-
tions that may occur in more general models of stabilizing selection. Specifically, we studied
three different genetic models and two different demographic models, holding the fitness
model as a constant. Parameters are briefly described in Table 1.
We implemented three disease-trait models of the phenotypic form P = G + E. G is the
genetic component, and E ¼ Nð0; s
2
e
Þ is the environmental noise expressed as a Gaussian ran-
dom variable with mean 0 and variance s
2
e
. In this context, s
2
e
should be thought of as both the
contribution from the environment and from the remaining genetic variance at loci in linkage
equilibrium with the simulated 100kb region. The genetic models are named the additive co-
dominant (AC) model, multiplicative recessive (Mult. recessive; MR) model and the gene-
based recessive (GBR) model. The MR model has a parameter, h, that controls the degree of
Table 1. Description of parameters used in the models.
Parameter Description
N Population size
P Phenotype
P
opt
Optimum phenotype
G Genetic contribution to phenotype
E Environmental contribution to phenotype
λ Mean and standard deviation of trait effects
c
i
Specific trait effect of site i
h Dominance coefficient for trait effects
w Fitness, based on Gaussian function
s
2
s
The total inverse selection intensity
s
2
e
Environmental variance
V
A
Additive genetic variance
V
D
Dominance genetic variance
V
G
Genetic variance
V
A;q x
Additive variance explained by variance below frequency q
doi:10.1371/journal.pgen.1006573.t001
Compound Heterozygosity and Complex Traits
PLOS Genetics | DOI:10.1371/journal.pgen.1006573 January 19, 2017 4 / 30

recessivity; we call this model the complete MR (cMR) when h = 0 and the incomplete MR
(iMR) when 0 h 1. Here, h = 1 corresponds to co-dominance, which is different from the
typical formulation used when modeling the fitness effects of mutations directly. It is also
important to note that here recessivity is being defined in terms of phenotypic effects; this may
be unusual for those more accustomed to dealing directly with recessivity for fitness effects.
An idealized relationship between dominance for fitness effects and trait effects of a mutation
on an unaffected genetic background is shown in S15 Fig.
The critical conceptual difference between recessive models is whether dominance is a
property of a locus (nucleotide/SNP) in a gene or the gene overall. Mathematically, this
amounts to whether one first determines diploid genotypes at sites (and then multiplies across
sites to get a total genetic effect) or calculates a score for each haplotype (the maternal and
paternal alleles). For completely co-dominant models, this distinction is irrelevant, however
for a model with arbitrary dominance one needs to be more specific. As an example, imagine a
compound heterozygote for two biallelic loci, i.e. genotype Ab/aB. In the case of traditional
multiplicative recessivity the compound heterozygote is wild type for both loci and therefore
wild-type over all; this implies that these loci are in different genes (or independent functional
units of the same gene) because the mutations are complementary. However, in the case of
gene-based recessivity [36], neither haplotype is wild-type and so the individual is not wild-
type; the failure of mutant alleles to complement defines these loci as being in the same gene
[35].
For a diploid with m
i
causative mutations on the i
th
haplotype, we may define the additive
model as
G
AC
¼
X
2
i¼1
X
m
i
j¼1
c
i;j
; ð1Þ
where c
i,j
is the effect size of the j
th
mutation on the i
th
haplotype. Each c
i,j
is sampled from an
exponential distribution with mean of λ, to reflect unconditionally deleterious mutation. In
other words, when a new mutation arises its effect c is drawn from an exponential distribution,
and remains constant throughout its entire sojourn in the population.
The GBR model is the geometric mean of the sum of effect sizes on each haplotype [36].
We sum the causal mutation effects on each allele (paternal and maternal) to obtain a haplo-
type score. We then take the square root of the product of the haplotype scores to determine
the total genetic value of the diploid.
G
GBR
¼
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
X
m
1
j¼1
c
1;j
X
m
2
j¼1
c
2;j
v
u
u
t
ð2Þ
Finally, the MR model depends on the number of positions for which a diploid is heterozy-
gous (m
Aa
) or homozygous (m
aa
) for causative mutations,
G
MR
¼
Y
m
Aa
j¼1
ð1 þhc
j
Þ
!
Y
m
aa
j¼1
ð1 þ 2c
j
Þ
!
1: ð3Þ
Thus, h = 0 is a model of multiplicative epistasis with complete recessivity (cMR), and h = 1
closely approximates the additive model when effect sizes are small.
Here, phenotypes are subject to Gaussian stabilizing selection with an optimum at zero and
standard deviation of σ
s
= 1 such that the fitness, w, of a diploid is proportional to a Gaussian
Compound Heterozygosity and Complex Traits
PLOS Genetics | DOI:10.1371/journal.pgen.1006573 January 19, 2017 5 / 30

Citations
More filters
Journal Article
TL;DR: For the next few weeks the course is going to be exploring a field that’s actually older than classical population genetics, although the approach it’ll be taking to it involves the use of population genetic machinery.
Abstract: So far in this course we have dealt entirely with the evolution of characters that are controlled by simple Mendelian inheritance at a single locus. There are notes on the course website about gametic disequilibrium and how allele frequencies change at two loci simultaneously, but we didn’t discuss them. In every example we’ve considered we’ve imagined that we could understand something about evolution by examining the evolution of a single gene. That’s the domain of classical population genetics. For the next few weeks we’re going to be exploring a field that’s actually older than classical population genetics, although the approach we’ll be taking to it involves the use of population genetic machinery. If you know a little about the history of evolutionary biology, you may know that after the rediscovery of Mendel’s work in 1900 there was a heated debate between the “biometricians” (e.g., Galton and Pearson) and the “Mendelians” (e.g., de Vries, Correns, Bateson, and Morgan). Biometricians asserted that the really important variation in evolution didn’t follow Mendelian rules. Height, weight, skin color, and similar traits seemed to

9,847 citations

Posted ContentDOI
07 Nov 2016-bioRxiv
TL;DR: Results provide strong empirical support for an important role for incomplete dominance of deleterious alleles in explaining heterosis and demonstrate the utility of incorporating functional annotation in phenotypic prediction and plant breeding.
Abstract: Complementation of deleterious alleles has long been proposed as a major contributor to the hybrid vigor observed in the offspring of inbred parents. We test this hypothesis using evolutionary measures of sequence conservation to ask whether incorporating information about putatively deleterious alleles can inform genomic selection (GS) models and improve phenotypic prediction. We measured a number of agronomic traits in both the inbred parents and hybrids of an elite maize partial diallel population and re-sequenced the parents of the population. Inbred elite maize lines vary for more than 500,000 putatively deleterious sites, but show less genetic load than a comparable set of inbred landraces. Our modeling reveals widespread evidence for incomplete dominance at these loci, and supports theoretical models that more damaging variants are usually more recessive. We identify haplotype blocks using an identity-by-decent (IBD) analysis and perform genomic prediction analyses in which we weight blocks on the basis of segregating putatively deleterious variants. Cross-validation results show that incorporating sequence conservation in genomic selection improves prediction accuracy for yield and several other traits as well as heterosis for those traits. Our results provide strong empirical support for an important role for incomplete dominance of deleterious alleles in explaining heterosis and demonstrate the utility of incorporating functional annotation in phenotypic prediction and plant breeding.

16 citations

Journal ArticleDOI
TL;DR: It is argued that increased effort in disease genetics theory, complementing experimental, and statistical efforts, will escalate the unraveling of molecular etiologies of complex diseases.
Abstract: Development of human genetics theoretical models and the integration of those models with experiment and statistical evaluation are critical for scientific progress. This perspective argues that increased effort in disease genetics theory, complementing experimental, and statistical efforts, will escalate the unraveling of molecular etiologies of complex diseases. In particular, the development of new, realistic disease genetics models will help elucidate complex disease pathogenesis, and the predicted patterns in genetic data made by these models will enable the concurrent, more comprehensive statistical testing of multiple aspects of disease genetics predictions, thereby better identifying disease loci. By theoretical human genetics, I intend to encompass all investigations devoted to modeling the heritable architecture underlying disease traits and studies of the resulting principles and dynamics of such models. Hence, the scope of theoretical disease genetics work includes construction and analysis of models describing how disease-predisposing alleles (1) arise, (2) are transmitted across families and populations, and (3) interact with other risk and protective alleles across both the genome and environmental factors to produce disease states. Theoretical work improves insight into viable genetic models of diseases consistent with empirical results from linkage, transmission, and association studies as well as population genetics. Furthermore, understanding the patterns of genetic data expected under realistic disease models will enable more powerful approaches to discover disease-predisposing alleles and additional heritable factors important in common diseases. In spite of the pivotal role of disease genetics theory, such investigation is not particularly vibrant.

9 citations


Additional excerpts

  • ...Importantly, very recent results from simulations appear to favor incomplete recessivity models for complex trait etiologies, demonstrating consistency with both realistic population genetic models, heritability data, and GWAS findings (Sanjak et al., 2016)....

    [...]

Journal ArticleDOI
TL;DR: In this article, a comprehensive analysis of cancer predisposition in the UK Biobank cohort using a new gene-based method for detecting protein-coding genes that are functionally interpretable was conducted.
Abstract: The characterization of germline genetic variation affecting cancer risk, known as cancer predisposition, is fundamental to preventive and personalized medicine. Studies of genetic cancer predisposition typically identify significant genomic regions based on family-based cohorts or genome-wide association studies (GWAS). However, the results of such studies rarely provide biological insight or functional interpretation. In this study, we conducted a comprehensive analysis of cancer predisposition in the UK Biobank cohort using a new gene-based method for detecting protein-coding genes that are functionally interpretable. Specifically, we conducted proteome-wide association studies (PWAS) to identify genetic associations mediated by alterations to protein function. With PWAS, we identified 110 significant gene-cancer associations in 70 unique genomic regions across nine cancer types and pan-cancer. In 48 of the 110 PWAS associations (44%), estimated gene damage is associated with reduced rather than elevated cancer risk, suggesting a protective effect. Together with standard GWAS, we implicated 145 unique genomic loci with cancer risk. While most of these genomic regions are supported by external evidence, our results also highlight many novel loci. Based on the capacity of PWAS to detect non-additive genetic effects, we found that 46% of the PWAS-significant cancer regions exhibited exclusive recessive inheritance. These results highlight the importance of recessive genetic effects, without relying on familial studies. Finally, we show that many of the detected genes exert substantial cancer risk in the studied cohort determined by a quantitative functional description, suggesting their relevance for diagnosis and genetic consulting.

9 citations

References
More filters
Journal ArticleDOI
TL;DR: The model makes predictions about the shape of the site frequency spectrum at the front of range expansion, and about correlations between heterozygosity and fitness in different parts of the species range, and these predictions provide opportunities to empirically validate theoretical results.
Abstract: Expanding populations incur a mutation burden – the so-called expansion load. Previous studies of expansion load have focused on codominant mutations. An important consequence of this assumption is that expansion load stems exclusively from the accumulation of new mutations occurring in individuals living at the wave front. Using individual-based simulations, we study here the dynamics of standing genetic variation at the front of expansions, and its consequences on mean fitness if mutations are recessive. We find that deleterious genetic diversity is quickly lost at the front of the expansion, but the loss of deleterious mutations at some loci is compensated by an increase of their frequencies at other loci. The frequency of deleterious homozygotes therefore increases along the expansion axis, whereas the average number of deleterious mutations per individual remains nearly constant across the species range. This reveals two important differences to codominant models: (i) mean fitness at the front of the expansion drops much faster if mutations are recessive, and (ii) mutation load can increase during the expansion even if the total number of deleterious mutations per individual remains constant. We use our model to make predictions about the shape of the site frequency spectrum at the front of range expansion, and about correlations between heterozygosity and fitness in different parts of the species range. Importantly, these predictions provide opportunities to empirically validate our theoretical results. We discuss our findings in the light of recent results on the distribution of deleterious genetic variation across human populations and link them to empirical results on the correlation of heterozygosity and fitness found in many natural range expansions.

157 citations

Journal ArticleDOI
TL;DR: An integrated simulation framework is developed, calibrated to empirical data, to enable the systematic evaluation of contradictory hypotheses about features of genetic architecture, including those where rare variants explain either little or most of T2D heritability.
Abstract: The genetic architecture of human diseases governs the success of genetic mapping and the future of personalized medicine. Although numerous studies have queried the genetic basis of common disease, contradictory hypotheses have been advocated about features of genetic architecture (for example, the contribution of rare versus common variants). We developed an integrated simulation framework, calibrated to empirical data, to enable the systematic evaluation of such hypotheses. For type 2 diabetes (T2D), two simple parameters--(i) the target size for causal mutation and (ii) the coupling between selection and phenotypic effect--define a broad space of architectures. Whereas extreme models are excluded by the combination of epidemiology, linkage and genome-wide association studies, many models remain consistent, including those where rare variants explain either little ( 80%) of T2D heritability. Ongoing sequencing and genotyping studies will further constrain the space of possible architectures, but very large samples (for example, >250,000 unselected individuals) will be required to localize most of the heritability underlying T2D and other traits characterized by these models.

154 citations


"A model of compound heterozygous, l..." refers result in this paper

  • ...5 (Gaussian function is greater than or equal to its quadratic 209 approximation), which is consistent with recent attempts at estimating that parameter [20, 65]....

    [...]

Journal ArticleDOI
TL;DR: Simulations under population genetic models where a proportion of the heritability of the trait is accounted for by mutations in a subset of the exome show that recent population growth increases the proportion of nonsynonymous variants segregating in the population, but does not affect the genetic load relative to a population that did not expand.
Abstract: Population genetic studies have found evidence for dramatic population growth in recent human history. It is unclear how this recent population growth, combined with the effects of negative natural selection, has affected patterns of deleterious variation, as well as the number, frequency, and effect sizes of mutations that contribute risk to complex traits. Because researchers are performing exome sequencing studies aimed at uncovering the role of low-frequency variants in the risk of complex traits, this topic is of critical importance. Here I use simulations under population genetic models where a proportion of the heritability of the trait is accounted for by mutations in a subset of the exome. I show that recent population growth increases the proportion of nonsynonymous variants segregating in the population, but does not affect the genetic load relative to a population that did not expand. Under a model where a mutation's effect on a trait is correlated with its effect on fitness, rare variants explain a greater portion of the additive genetic variance of the trait in a population that has recently expanded than in a population that did not recently expand. Further, when using a single-marker test, for a given false-positive rate and sample size, recent population growth decreases the expected number of significant associations with the trait relative to the number detected in a population that did not expand. However, in a model where there is no correlation between a mutation's effect on fitness and the effect on the trait, common variants account for much of the additive genetic variance, regardless of demography. Moreover, here demography does not affect the number of significant associations detected. These findings suggest recent population history may be an important factor influencing the power of association tests and in accounting for the missing heritability of certain complex traits.

151 citations


"A model of compound heterozygous, l..." refers background or methods or result in this paper

  • ...The exact relationship between rare alleles [4, 17, 26,62,63], and the demographic 201 and/or selective scenarios from which they arose [21, 22, 64], and the genetic architecture of common complex 202 diseases in humans is an active area of research....

    [...]

  • ...However, our findings contrast with those of Zuk [24] and agree with those of Lohmueller [22], in that we predict that population expansion will substantially increase the heritability, or portion of genetic variance, that is due to rare variants....

    [...]

  • ...In agreement with [22,73], we 323 predict that population growth reduces the power to associate variants in a causal gene region with disease 324 status (Fig 3) when the disease also impacts evolutionary fitness....

    [...]

  • ...4 showing that the genetic load is approximately unaffected by changes in population size over time, [21, 22]....

    [...]

  • ...However, this simple model allows us to more easily 414 get a sense of the impact of population expansion [21,22]....

    [...]

Journal ArticleDOI
TL;DR: In this paper, the authors reported targeted sequencing of 63 known prostate cancer risk regions in a multi-ancestry study of 9,237 men and used the data to explore the contribution of low-frequency variation to disease risk.
Abstract: We report targeted sequencing of 63 known prostate cancer risk regions in a multi-ancestry study of 9,237 men and use the data to explore the contribution of low-frequency variation to disease risk. We show that SNPs with minor allele frequencies (MAFs) of 0.1-1% explain a substantial fraction of prostate cancer risk in men of African ancestry. We estimate that these SNPs account for 0.12 (standard error (s.e.) = 0.05) of variance in risk (∼42% of the variance contributed by SNPs with MAF of 0.1-50%). This contribution is much larger than the fraction of neutral variation due to SNPs in this class, implying that natural selection has driven down the frequency of many prostate cancer risk alleles; we estimate the coupling between selection and allelic effects at 0.48 (95% confidence interval [0.19, 0.78]) under the Eyre-Walker model. Our results indicate that rare variants make a disproportionate contribution to genetic risk for prostate cancer and suggest the possibility that rare variants may also have an outsize effect on other common traits.

149 citations


"A model of compound heterozygous, l..." refers result in this paper

  • ...5 (Gaussian function is greater than or equal to its quadratic 209 approximation), which is consistent with recent attempts at estimating that parameter [20, 65]....

    [...]

Journal ArticleDOI
TL;DR: The perceived dichotomy between ‘common’ and ‘rare’ variants is not only false, but unhelpful in making progress towards increasing the authors' understanding of the genetic basis of psychiatric disorders.
Abstract: In this article, we review some of the data that contribute to our understanding of the genetic architecture of psychiatric disorders. These include results from evolutionary modelling (hence no data), the observed recurrence risk to relatives and data from molecular markers. We briefly discuss the common-disease common-variant hypothesis, the success (or otherwise) of genome-wide association studies, the evidence for polygenic variance and the likely success of exome and whole-genome sequencing studies. We conclude that the perceived dichotomy between ‘common’ and ‘rare’ variants is not only false, but unhelpful in making progress towards increasing our understanding of the genetic basis of psychiatric disorders. Strong evidence has been accumulated that is consistent with the contribution of many genes to risk of disease, across a wide range of allele frequencies and with a substantial proportion of genetic variation in the population in linkage disequilibrium with single-nucleotide polymorphisms (SNPs) on commercial genotyping arrays. At the same time, most causal variants that segregate in the population are likely to be rare and in total these variants also explain a significant proportion of genetic variation. It is the combination of allele frequency, effect size and functional characteristics that will determine the success of new experimental paradigms such as whole exome/genome sequencing to detect such loci. Empirical results suggest that roughly half the genetic variance is tagged by SNPs on commercial genome-wide chips, but that individual causal variants have a small effect size, on average. We conclude that larger experimental sample sizes are essential to further our understanding of the biology underlying psychiatric disorders.

149 citations