Limited statistical evidence for shared genetic
effects of eQTLs and autoimmune disease-
associated loci in three major immune cell types
Citation
Chun, Sung, Alexandra Casparino, Nikolaos A Patsopoulos, Damien C Croteau-Chonka,
Benjamin A Raby, Philip L De Jager, Shamil R Sunyaev, and Chris Cotsapas. 2017. “Limited
statistical evidence for shared genetic effects of eQTLs and autoimmune disease-associated loci
in three major immune cell types.” Nature genetics 49 (4): 600-605. doi:10.1038/ng.3795. http://
dx.doi.org/10.1038/ng.3795.
Published Version
doi:10.1038/ng.3795
Permanent link
http://nrs.harvard.edu/urn-3:HUL.InstRepos:34375208
Terms of Use
This article was downloaded from Harvard University’s DASH repository, and is made available
under the terms and conditions applicable to Other Posted Material, as set forth at http://
nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of-use#LAA
Share Your Story
The Harvard community has made this article openly available.
Please share how this access benefits you. Submit a story .
Accessibility
Limited statistical evidence for shared genetic effects of eQTLs
and autoimmune disease-associated loci in three major immune
cell types
Sung Chun
1,2,3
, Alexandra Casparino
4
, Nikolaos A Patsopoulos
3,5,6
, Damien C Croteau-
Chonka
2,7
, Benjamin A Raby
2,7
, Philip L De Jager
2,3,5,6
, Shamil R Sunyaev
1,2,3,*
, and Chris
Cotsapas
3,4,8,*
1
Division of Genetics, Brigham and Women’s Hospital, Boston MA, USA
2
Department of Medicine, Harvard School of Medicine, Boston MA, USA
3
Broad Institute of Harvard and MIT, Cambridge MA, USA
4
Department of Neurology, Yale School of Medicine, New Haven CT, USA
5
Department of Neurology, Brigham and Women’s Hospital, Boston MA, USA
6
Ann Romney Center for Neurological Diseases, Brigham and Women’s Hospital, Boston MA,
USA
7
Channing Division of Network Medicine, Brigham and Women’s Hospital, Boston MA, USA
8
Department of Genetics, Yale School of Medicine, New Haven CT, USA
Most autoimmune disease risk effects identified by genome-wide association studies
(GWAS) localize to open chromatin with gene regulatory activity. GWAS loci are also
enriched for expression quantitative trait loci (eQTLs), suggesting that most risk variants
alter gene expression
1,2
. However, because causal variants are difficult to identify and
cis
-
eQTLs occur frequently, it remains challenging to identify specific instances of disease-
relevant changes to gene regulation. Here, we use a novel joint likelihood framework with
Users may view, print, copy, and download text and data-mine the content in such documents, for the purposes of academic research,
subject always to the full Conditions of use: http://www.nature.com/authors/editorial_policies/license.html#terms
*
correspondence to: SRS (ssunyaev@rics.bwh.harvard.edu) and CC (cotsapas@broadinstitute.org).
Code availability
The current implementation of JLIM is available from the Cotsapas and Sunyaev labs: http://www.github.com/cotsapaslab/jlim and
http://genetics.bwh.harvard.edu/wiki/sunyaevlab/jlim
Data availability
The publicly available 1000 Genomes genotype data were downloaded from: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/
20130502/
. The publicly available gEUVADIS LCL eQTL data were accessed via EBI ArrayExpress site under accession E-GEUV-1.
Gene expression data for CD4
+
T cell and CD14
+
monocytes were accessed viaNCBI Gene Expression Omnibus accession no.
GSE56035. Immunochip GWAS summary statistics are available at http://www.immunobase.org.
Author Contributions
SC designed and performed research and authored the manuscript; AC performed research; NP contributed data and approved the
manuscript, DCC contributed data and approved the manuscript; BR contributed data and approved the manuscript; PDJ contributed
data and approved the manuscript; SRS designed and performed research and authored the manuscript; CC designed and performed
research and authored the manuscript.
Competing Financial Interests statement
The authors declare no competing financial interests.
HHS Public Access
Author manuscript
Nat Genet
. Author manuscript; available in PMC 2017 August 20.
Published in final edited form as:
Nat Genet
. 2017 April ; 49(4): 600–605. doi:10.1038/ng.3795.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
higher resolution than previous methods to identify loci where autoimmune disease risk and
an eQTL are driven by a single, shared genetic effect. Using eQTLs from three major
immune subpopulations, we find shared effects in only ~25% of loci. Thus, we uncover a
fraction of gene regulatory changes as strong mechanistic hypotheses for disease risk, but
conclude that most risk mechanisms likely do not involve changes to basal gene expression.
The autoimmune and inflammatory diseases (AID) are heritable, complex diseases where
loss of tolerance to self-antigens results in either systemic or tissue-specific immune
attack
3,4
. GWAS have identified hundreds of genomic regions mediating risk to several AID.
These associations are primarily non-coding: lead GWAS SNPs are more likely to be
associated with expression levels of neighboring genes than expected by chance
12,13
, and the
same lead SNPs are enriched in regulatory regions marked by chromatin accessibility and
modification
1,14
. Fine-mapping reveals enrichment of AID-associated variants in enhancer
elements active in stimulated T cell subpopulations
15
, with heritability strongly enriched in
such regulatory regions
16,17
. Collectively, these strands of evidence suggest that the majority
of disease risk is mediated by changes to gene regulation in specific cell subpopulations.
However, these bulk analyses do not formally assess whether expression levels and disease
risk can be attributed to a single underlying variant or to independent effects in a locus
18,19
.
Though several methods have been developed to assess these alternatives using eQTL
data
20–23
, they show limited resolution to detect cases where distinct disease and eQTL
causal variants are in linkage disequilibrium. Here, we present an approach to test if a
GWAS risk association and an eQTL are driven by the same underlying genetic effect,
accounting for the LD between causal variants. Using data from ImmunoChip studies of
seven AID comprising >180,000 samples in total (Supplementary Table 1), we test if
associations in 272 known risk loci are consistent with
cis
-eQTL for genes in each region,
measured in three relevant immune cell populations: lymphoblastoid cell lines (LCLs),
CD4
+
T cells and CD14
+
monocytes
24,25
.
When associations to two traits – here, disease trait and eQTL – are driven by the same
underlying causal variant, the joint evidence of association should be maximized at the
markers in tightest LD with the causal variant
19,26
. Here, we directly evaluate this joint
likelihood (Supplementary Figure 1), unlike previous approaches that look for similarities in
the shape of the association curve over multiple markers
20,21,27,28
. When the underlying
causal effect is shared, joint likelihood is maximized when we model the same causal variant
in both traits; conversely, when the underlying causal variants are different, we expect
maximum joint likelihood when we model their closest proxies. We empirically derive the
null distribution of the joint likelihood ratio statistic by comparing disease associations to
permuted eQTL data(see Methods, Supplementary Figure 2 and Supplementary Notes). We
thus directly evaluate whether two associations in the same locus, observed in different
cohorts, are due to the same underlying effect.
To assess the performance of our method, we benchmarked it against three recently reported
methods:
coloc
20
, a well-calibrated Bayesian framework that considers spatial similarities in
association data across sets of markers; gwas-pw
29
, which extends this idea to hierarchical
priors and optimizes model parameters; and HEIDI/SMR
22
, which applies Mendelian
Chun et al.
Page 2
Nat Genet
. Author manuscript; available in PMC 2017 August 20.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
randomization between traits. We simulated pairs of case-control cohorts with either the
same or distinct causal variants driving association, and find that our approach shows the
best overall performance (Supplementary Tables 2 and 3). When independent causal variants
(i.e. not in LD) drive GWAS and eQTL associations, our own method,
coloc
and gwas-pw
all had excellent performance. As the LD between the causal variants increases, our method
shows the best performance, maintaining high resolution even when the underlying causal
variants are in strong LD (AUC = 0.883 when 0.7 < r
2
< 0.8, Supplementary Figures 3 and
4), whereas the other methods show substantial false positive rates, reporting distinct effects
as shared. We also found that our method is robust to within-continent levels of population
structure (Supplementary Figures 5 and 6), and when limiting analysis to a subset of SNPs
for computational efficiency (Supplementary Figure 7;
coloc
fares similarly, Supplementary
Figure 8). Our method also performs well when multiple independent causal variants affect
one or both traits (Supplementary Figures 9–11). In practice, our resolution becomes limited
at high LD levels (r
2
>0.8), where the false positive rate increases dramatically. We also have
limited resolution when the eQTL effect is very weak (
p
> 0.01, Supplementary Figures 12–
15). Thus, within these limits, we can accurately detect cases of shared genetic effects
between two traits.
To dissect AID risk loci, we first identified densely genotyped ImmunoChip loci showing
genome-wide significant association, excluding the Major Histocompatibility Locus due to
the extensive LD structure in the region (immunobase.org; Table 1). We next identified
genes in a 1Mb window centered on the most associated variant in each locus. Consistent
with previous observations that eQTLs are frequently found in GWAS loci, we found that
260/272 loci had at least one gene with an eQTL (p < 0.01) in at least one cell type, with
most such effects common across all three tissues (Table 1). We tested if any eQTLs in these
loci appear driven by the same underlying effect as the disease associations. We find
evidence for shared effects for only 77/5,749 pairs in 55/260 (21%) loci across all diseases,
with the proportion varying from 4/34 (12%) for rheumatoid arthritis loci to 6/10 (60%) for
ulcerative colitis loci (false discovery rate < 5%; Tables 1 and 2). Of these 77 shared effects,
45 pass even the more stringent family-wise multiple testing correction (Bonferroni
corrected P < 0.05). Thus, our analysis reveals that in the majority of AID loci, variants
causally involved in disease phenotypes do not overlap variants responsible for eQTL signals
in the three broad cell populations we analyzed, which represent the major arms of the
immune lineages. Overall, we find that >75% of tested disease-eQTL pairs appear associated
to distinct genetic variants in the same locus (Figure 1).
We sought to explain this lack of overlap between disease associations and eQTLs, despite
their frequent co-occurrence in the same loci. In particular, although our method showed
good performance in simulated data (Supplementary Figure 4), we remained concerned that
this lack of overlap may be due to low statistical power in the eQTL data, which come from
cohorts of limited sample size. However, we find that even amongst the most strongly
supported eQTLs (nominal
p
< 10
−5
), <25% show evidence of shared effects with disease
associations. Conversely, we find strong evidence for distinct effects for the majority of
disease-eQTL pairs, with only a subset of comparisons being ambiguous, suggesting that our
method is adequately powered to detect shared effects where they exist (Figure 1a and
Chun et al.
Page 3
Nat Genet
. Author manuscript; available in PMC 2017 August 20.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
Supplementary Figures 16–18). To assess whether power affects the total number of loci,
rather than eQTL, that can be resolved, we looked more deeply at our significance threshold
settings. We find that more liberal thresholds do not increase the number of true positive
results after adjusting for false positive rate, indicating that most loci do not contain
any
gene with an eQTL consistent with the disease association (Figure 1b and Supplementary
Figure 19). Cumulatively, our results demonstrate that only a minority of AID risk effects
drive eQTLs in the three cell populations we tested, which are drawn from diverse lineages
of the immune system.
We next focused on the subset of 77 disease/eQTL pairs in 55 loci where we could detect
strong evidence of a shared effect (Table 2). We find that 59/77 (77%) of effects are
restricted to one cell population, indicating that tissue-specific eQTLs are important
components of the molecular underpinnings of disease (Supplementary Figures 20 and 21).
The remaining 18 effects are detected in multiple cell populations; for example, the multiple
sclerosis association at rs10783847 on chromosome 12 is consistent with eQTLs for the
transcript of methyltransferase-like 21B (
METTL21B
) in both CD4
+
T cells and CD14
+
monocytes, but not for the remaining 31 genes in the immediate locus (Figure 2). Although
METTL21B
is expressed in LCLs, there is no evidence of an eQTL in this tissue within
1Mb from rs10783847. Similarly, for the multiple sclerosis association at rs1966115 on
chromosome 8 and eQTLs for
ZC2HC1A
, and for the inflammatory bowel disease
association at rs55770741 on chromosome 5 and eQTLs for
ERAP2
, we detect a shared
effect in all three cell populations. In several cases we find tissue-specific shared effects
despite strong eQTLs for the same gene in other tissues: for
ZFP90
and ulcerative colitis risk
at rs889561 on chromosome 16, we also find shared effects in CD4
+
and CD14
+
but not
LCLs, where we observe a
ZFP90
eQTL at p = 0.005 that has a low likelihood of shared
effect with GWAS (joint likelihood P = 0.85). Instead, we find evidence of sharing between
disease risk and an eQTL for
NFAT5
in LCLs. Thus, despite the presence of eQTLs for a
gene in multiple tissues, not all these effects are consistent with disease associations
suggesting that disease-relevant eQTLs are tissue specific.
We also find cases where an eQTL is consistent with associations to multiple diseases. The
ankyrin repeat domain 55 (
ANKRD55
) transcript encoded on chromosome 5 has an eQTL in
CD4
+
T cells that is shared with associations to multiple sclerosis, Crohn disease and
rheumatoid arthritis (Figure 3, all observations are significant after Bonferroni correction).
We also find weaker evidence for shared effects between all three diseases and an eQTL for
interleukin 6 signal transducer (
IL6ST
) in CD4
+
T cells, which passes the false discovery
rate threshold but not Bonferroni correction (Supplementary Figure 22). Similarly, a CD4
+
eQTL for
ELMO1
on chromosome 7 is consistent with associations to both celiac disease
and multiple sclerosis (Supplementary Figure 23), a CD14
+
eQTL for
RGS1
on
chromosome 1 is consistent with associations to both celiac disease and multiple sclerosis
(Supplementary Figure 24), and three other eQTLs are consistent with associations in
multiple diseases (Supplementary Figures 25–27). In all cases, these are the only genome-
wide significant disease associations reported in these loci. As we consider each disease
association independently, these results indicate that the same underlying risk variants drive
risk to multiple diseases in these loci by altering gene expression, consistent with
observations of shared effects across diseases
7
.
Chun et al.
Page 4
Nat Genet
. Author manuscript; available in PMC 2017 August 20.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript