scispace - formally typeset
Open AccessJournal ArticleDOI

Systematic identification of genetic influences on methylation across the human life course

Reads0
Chats0
TLDR
It is suggested that methylation may have a causal role consistent with an infinitesimal model in which many methylation sites each have a small influence, amounting to a large overall contribution.
Abstract
The influence of genetic variation on complex diseases is potentially mediated through a range of highly dynamic epigenetic processes exhibiting temporal variation during development and later life. Here we present a catalogue of the genetic influences on DNA methylation (methylation quantitative trait loci (mQTL)) at five different life stages in human blood: children at birth, childhood, adolescence and their mothers during pregnancy and middle age. We show that genetic effects on methylation are highly stable across the life course and that developmental change in the genetic contribution to variation in methylation occurs primarily through increases in environmental or stochastic effects. Though we map a large proportion of the cis-acting genetic variation, a much larger component of genetic effects influencing methylation are acting in trans. However, only 7 % of discovered mQTL are trans-effects, suggesting that the trans component is highly polygenic. Finally, we estimate the contribution of mQTL to variation in complex traits and infer that methylation may have a causal role consistent with an infinitesimal model in which many methylation sites each have a small influence, amounting to a large overall contribution. DNA methylation contains a significant heritable component that remains consistent across the lifespan. Our results suggest that the genetic component of methylation may have a causal role in complex traits. The database of mQTL presented here provide a rich resource for those interested in investigating the role of methylation in disease.

read more

Content maybe subject to copyright    Report

Gaunt, T., Shihab, H., Hemani, G., Min, J., Woodward, G., Lyttleton,
O., Zheng, J., Duggirala, A., McArdle, W., Ho, K., Ring, S., Evans, D.,
Davey Smith, G., & Relton, C. (2016). Systematic identification of
genetic influences on methylation across the human life course.
Genome Biology
,
17
, [61]. https://doi.org/10.1186/s13059-016-0926-z
Publisher's PDF, also known as Version of record
License (if available):
CC BY
Link to published version (if available):
10.1186/s13059-016-0926-z
Link to publication record in Explore Bristol Research
PDF-document
This is the final published version of the article (version of record). It first appeared online via BioMed Central at
http://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0926-z. Please refer to any applicable
terms of use of the publisher.
University of Bristol - Explore Bristol Research
General rights
This document is made available in accordance with publisher policies. Please cite only the
published version using the reference above. Full terms of use are available:
http://www.bristol.ac.uk/red/research-policy/pure/user-guides/ebr-terms/

RES E A R C H Open Access
Systematic identification of genetic
influences on methylation across the
human life course
Tom R. Gaunt
1*
, Hashem A. Shihab
1
, Gibran Hemani
1
, Josine L. Min
1
, Geoff Woodward
1
, Oliver Lyttleton
2
,
Jie Zheng
1
, Aparna Duggirala
2
, Wendy L. McArdle
2
, Karen Ho
2
, Susan M. Ring
1,2
, David M. Evans
1,3
,
George Davey Smith
1
and Caroline L. Relton
1,4
Abstract
Background: The influence of genetic variation on complex diseases is potentially mediated through a range of
highly dynamic epigenetic processes exhibiting temporal variation during development and later life. Here we
present a catalogue of the genetic influences on DNA methylation (methylation quantitative trait loci (mQTL)) at
five different life stages in human blood: children at birth, childhood, adolescence and their mothers during
pregnancy and middle age.
Results: We show that genetic effects on methylation are highly stable across the life course and that developmental
change in the genetic contribution to variation in methylation occurs primarily through increases in environmental or
stochastic effects. Though we map a large proportion of the cis-acting genetic variation, a much larger component of
genetic effects influencing methylation are acting in trans. However, only 7 % of discovered mQTL are trans-effects,
suggesting that the trans component is highly polygenic. Finally, we estimate the contribution of mQTL to variation in
complex traits and infer that methylation may have a causal role consistent with an infinitesimal model in which many
methylation sites each have a small influence, amounting to a large overall contribution.
Conclusions: DNA methylation contains a significant heritable component that remains consistent across the lifespan.
Our results suggest that the genetic component of methylation may have a causal role in complex traits. The database
of mQTL presented here provide a rich resource for those interested in investigating the role of methylation in disease.
Keywords: Methylation quantitative trait loci, mQTL, Cohort, Genetic association, DNA methylation
Background
Epigenetic mechanisms play a central role in the regula-
tion of cellular processes by inf luencing genomic activity
[1]. DNA methylation, defined as the covalent bonding
of a methyl group to a cytosine in the context of a CpG
dinucleotide, is an important component of these
mechanisms in mammals. Canonically, DNA methyla-
tion typically repr esses transcription, which can occur by
inhibiting the binding of transcription factors or by
recruiting DNA binding proteins that remodel chro-
matin structure. Consequently, the establishment and
maintenance of DNA methylation patterns are crucial for
normal cellular function and developmental processes and,
indeed, these patterns are highly heterogeneous at different
life stages [2] and between different tissue types [3].
Genome-wide DNA methylation can be considered to
be a large set of measurable traits (one per CpG site) in
which variation can arise from environmental [4],
stochastic [5] or genetic [6] perturbations, and there is
growing evidence that DNA methylation could mediate
the relatio nship between these processes in influencing
complex diseases [7]. An imp ortant step in understand-
ing the processes underpinning DNA methylation is
mapping the gen etic factors that influence its variation.
* Correspondence: tom.gaunt@bristol.ac.uk
Tom R. Gaunt, Hashem A. Shihab, Gibran Hemani, George Davey Smith and
Caroline L. Relton are joint authors.
1
MRC Integrative Epidemiology Unit (IEU) & School of Social and Community
Medicine, University of Bristol, Oakfield House, Oakfield Grove, Bristol BS8
2BN, UK
Full list of author information is available at the end of the article
© 2016 Gaunt et al. Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and
reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to
the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver
(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Gaunt et al. Genome Biology (2016) 17:61
DOI 10.1186/s13059-016-0926-z

A recent study estimated that almost 20 % of reliably
assayed variation in blood DNA methylation is heritable
and that 50 % of CpG sites showed evidence of a signifi-
cant genetic component [8]. A different perspective is
provided by a study looking at the most variably methyl-
ated regions between neonates, which found 25 % were
best explained by genotype alon e and 75 % by a combin-
ation of genotype and environment [9]. Another large
study showed that DNA methylation variation in adipose
tissue was highly heritable (h
2
median
= 0.34) and that
shared environmental effects correlated with metabolic
phenotype-associated CpGs [10]. Heritability of CpG
methylation levels in whole blood in young people has
also been shown to correlate highly with stability of
methylation at the same sites in later life [11]. However,
those studies that have attempted to map genetic effe cts
influencing DNA methylation (methylation quantitative
trait loci (mQTL)) have so far only explained a small pro-
portion of the genetic variance that is estimated to exist
and it is evident that much larger sample sizes will be re-
quired to map the majority of the predicted genetic effects.
One can address the question of how genetic variation
influences DNA methylation in a number of ways and
here we focus on three specific research avenues. First,
the genetic architecture of methylation variation can
provide information about the level of complexity that
underlies population-level differences. This can be exam-
ined by (a) estimating the proportion of explanatory
common genetic variation (mQTL) that occurs close to
the methylation site (in cis) versus the proportion that
occurs elsewhere in the genome (in trans), and (b) the
level of polygenicity of the genetic component for cis
and trans regions.
Second, in characterising and mapping mQTL it is of
interest to know the extent to which genetic effects are
stable over time. Because epigenetic change is a corner-
stone of mammalian development, elucidating whether
genetic effects have a consistent influence across the life
course or are specific to certa in developmental windows
is important for gauging the extent to which mQTL
could be involved in epigenetic restructuring and per-
turbation of developmental trajectories.
Third, a comprehensive catalogue of mQTL can be
used to investigate (a) whether those regions of methyla-
tion that are influenced by genetic variation are likely to
be inert or are involved in cellular function and (b) if
these elements are functional, then to what extent do
mQTL influence complex disease a s a consequence of
their influe nce on DNA methylation.
Here we present a comprehensive genome-wide cis
and trans mQTL longitudinal analysis in blood DNA at
three time points in the life course of a large number of
participants in the Avon Longitudinal Study of Parents
and Children (ALSPAC) [12] and two ti me points in the
life course of their mothers [13], in the form of an online
searchable database (http://www.mqtldb.org/). We assess
the stability of mQTL across the life course and identify
the biological pathways in which they function. We
evaluate the relationship of mQTL with other down-
stream phenotypes, including gene expression, traits and
diseases, and quantify the contribution made by mQTL
to genetic variance in several common complex diseases
that have previously been the subject of genome-wide
association studies (GWAS).
Results
Cis and trans mQTL mapping
The ARIES dataset [14] represents DNA methylation
levels collected at five different time points across the
life course from individuals in ALSPAC: in young people
we collected samples at birth (cord blood, n = 771),
childhood (n = 834) and adolescence (n = 837); in their
mothers we sampled during pregnancy (n = 764) and in
middle age (n = 742) (Additional file 1: Table S1). We
performed an exhaustive whole-genome mQTL analysis
by testing approximately 8.3 million common single-
nucleotide polymorph isms (SNPs) against each reliable
CpG probe (395,625 out of 485,577) in each time point
(Additional file 1: Table S2). After conservative multiple
testing correction ( p <1×10
14
) we identified between
24,262 and 31,729 sentinel associations at each time
point (Table 1; Additional file 1: Table S3). Approxi-
mately 93 % of the mQTL were acting in cis (defined as
within ±1 Mb of the CpG probe on the basis of previous
a previous report [15] and our own observation of the
distribution of SNP/CpG distances, although definitions
of cis in the literatu re vary widely from a few hundred
base pairs [16] to 1 Mb [17, 18]).
We also performed conditional analysis which identi-
fied between 2705 and 5446 further mQTL at each time
point that showed secondary, tertiary and quaternary
effects also acting in cis (Additional file 1: Figure S1),
giving 28,94639,833 mQTL discovered at each time
point influencing a total of 43,897 CpG sites across the
genome (Table 1, Fig. 1a). The effect sizes as difference
in median proportion methylated between homozygote
groups is presented in Addit ional file 1: Figure S2.
Genetic architecture of methylation variation
DNA methylation may be influenced by both genetic
and environmental factors. To address the question of
the relative contribution of genetic variation we used
genomic restricted maximum likelihood (GREML) [19].
Here we estimated how much of the total variation in
each methylation probe was captured by all 1.1 million
common HapMap3 [20] tagging SNPs (minor allele fre-
quency (MAF) > 0.01) to estimate what is known as the
SNP heritability. Although the standard error is high for
Gaunt et al. Genome Biology (2016) 17:61 Page 2 of 14

any one probe, when performed on all reliable probes at
five time points this analysis enabled us to estimate the
distribution of genetic contribution to meth ylation vari-
ation (SNP heritability) and how it varies over time. In
addition, for each probe we partitioned the genetic
variance into two components, the first using cis SNPs
only and the second using trans SNPs only.
The results demonstrate that although the majority of
mQTL act in cis, the majority of estimated genetic vari-
ation that influences methylation levels is acting in trans
Table 1 Number of mQTL and associated CpGs reaching the significance threshold for each time point
Counts Birth Childhood Adolescence Pregnancy Middle age
Sentinel mQTL
Cis
a
24,262 31,729 30,294 29,038 27,043
Trans
b
1979 2658 2442 2394 2144
Conditionally independent
c
2705 5446 5040 4454 3463
Total mQTL 28,946 39,833 37,776 35,886 32,650
Total unique CpGs 27,387 36,705 34,886 33,344 30,676
a
Number of CpG sites with a cis SNP
b
Number of independent (±1 Mb) trans effects
c
Number of mQTL further detected after performing conditional analysis
Fig. 1 Temporal pattern of mQTL. a The total number of cis and trans mQTL discovered at each time point. b Total bars represent the SNP heritability
at each time point. Each bar is split into genetic variation due to common SNPs acting in cis (blue)andtrans (green). Cis and trans variation is further
divided into the proportion that is explained by mapped SNPs (p <1×10
14
). c The proportion of discovered mQTL at a specific time point that
replicate at p <1×10
7
in each of the other time points. Darker colours correspond to lower replication rates
Gaunt et al. Genome Biology (2016) 17:61 Page 3 of 14

(Fig. 1b). This applies even if we extend the definition of
cis to include the entire chromosome (data not shown).
Variance explained by detected mQTL
Having partitioned the phenotypic variance of each CpG
probe into cis-acting genetic variance, trans-acting gen-
etic variance and environmental variance (which in-
cludes any genetic variance not captured by 1.1 million
HapMap3 SNPs), we then went on to estimate how
much of the genetic variation was explained by the
mQTL that we had detected in our associat ion analyses.
We estimated that the discovery mQTL detected in our
association analysis explained over half of the proportion
of total cis variation, whereas the trans mQTL explained
less than 1 % of total trans variation. This result is con-
sistent with the hypothesis that genetic perturbation
close to the CpG site tends to have large effects whereas
the complex network of regulatory interactions and a
large mutational target size makes the trans component,
on average, highly polygenic.
Stability of genetic effects over time
DNA methylation changes in response to environmental
exposures. We addressed the question of how stable the
mQTL effects were over time by estimating the rate of
replication of discovery mQTL from one time point in all
other time points. Fig. 1c and Additional file 1: Figure S3
show that the proportion that replicated at a threshold of
p <1×10
7
was typically in excess of 95 %, although the
rate of replication of post-birth discovery mQTL in the
birth time point was consistently lower (8486 %).
The distributions of estimated SNP heritabilities
illustrate that average SNP heritability gradually falls
from 0.24 in childhood to 0.21 at midd le age (Fig. 1a,
b). A regression of SNP herita bility on age indicated a
reduction of heritability of 0.0009 per year from
childhood to adulthood (0.0009, se=1.6e5). There
are t wo simple explanations for this obser vation: first ,
that the influence of genetic variation is reducing
over time; or se cond, that the influence of environ-
mental or stochastic perturbations is increasing over
time. In the former ca se we would expect that the
average coefficient of variation of methylation to de-
crease over time due to there being fewer genetic fac-
tors, wherea s in the latter ca se w e would expect it to
increase due to more environmental factors or higher
stochasticity. The latter explanation is supported by a
clear increa se (3.0 %, standard error 0.015 %) in the
average coeffici ent of variation of methylation from
childhood to middle age (data not sh own).
In addit ion, while replication of mQTL from middle
age to childhood is high, replication of mQTL from
childhood is lower in later time points (Fig. 1c). To-
gether this suggests that genetic effects are largely stable
and that environmental or stochastic perturbation is
gradually increasing over the life course, leading to lower
SNP heritability estimates and lower power to detect
mQTL as age increases. Two caveats to these observa-
tions are that the later two time points are different indi-
viduals from the earlier ones and they are comprised
exclusively of women.
The stability of genetic effects is surprisingly high given
the observational correlations of DNA methylation be-
tween different time points (Additional file 1: Figure S4).
However, we observe that the mean correlation of methy-
lation probes that have at least one significant mQTL is
substantially higher than the average value (e.g. for all
probes
r ¼ 0:09 and for mQTL probes
r ¼ 0:31 when
comparing childhood with adolescence; Additional file 1:
Figure S5).
Long range influences of methylation levels
In contrast to what has been seen in expression quanti-
tative trait loci (eQTL) studi es, there is little evidence of
individual mQTL influencing many CpG sites across the
genome, with the vast majority of trans mQTL in our
data just representing a single or small number of
associations (Additional file 1: Figure S6). To gauge the
extent to which methylation levels were influenced by
CpG sites elsewhere in the genome, we performed a
mediation analysis testing for mediation of trans mQTL
effects by cis methylation. Mediation analyses are
particularly susceptible to measurement error in the
mediator (which will attenuate estimates of mediation).
However, we provide some evidence that, amongst
mQTL with both cis and trans effects, a proportion
demonstrate some degr ee of mediation of the trans
association by cis CpG sites. Additional file 1: Figure S7
presents this mediation analysis comparing a regression
of trans CpG on SNP to a regression of trans CpG on
(SNP + cis CpG). Whilst a large number of sites follow
the y = x diagonal (providing little evidence of mediation),
a proportion deviate from this line, showing that if the
effects of cis methylation are taken into account, the trans
association is attenuated. This non-independence between
cis and trans effects at these loci does not prove a causal
effect of cis methylation on trans methylation. To deter-
mine the likely impact of measurement error on this ana-
lysis, we present simulations of the effect of measurement
error in the cis methylation variable in Additional file 1:
Figure S8, which illustrates that our observed results are
likely to underestimate the true extent of potential medi-
ation in the presence of measurement error.
Functional annotation
If mQTL are functionally important, they are likely to be
distributed differentially across genomic features. To
address this question, we analysed both mQTL and
Gaunt et al. Genome Biology (2016) 17:61 Page 4 of 14

Citations
More filters

Integrative analysis of 111 reference human epigenomes

TL;DR: In this article, the authors describe the integrative analysis of 111 reference human epigenomes generated as part of the NIH Roadmap Epigenomics Consortium, profiled for histone modification patterns, DNA accessibility, DNA methylation and RNA expression.
Journal ArticleDOI

The MR-Base platform supports systematic causal inference across the human phenome

TL;DR: MR-Base is a platform that integrates a curated database of complete GWAS results (no restrictions according to statistical significance) with an application programming interface, web app and R packages that automate 2SMR, and includes several sensitivity analyses for assessing the impact of horizontal pleiotropy and other violations of assumptions.
Journal ArticleDOI

Evaluating the potential role of pleiotropy in Mendelian randomization studies.

TL;DR: This review outlines how newly developed methods can be used together to improve the reliability of Mendelian randomization and discusses the burgeoning treasure trove of genetic associations yielded through genome wide association studies.
Journal ArticleDOI

Quantitative Serum Nuclear Magnetic Resonance Metabolomics in Large-Scale Epidemiology: A Primer on -Omic Technologies.

TL;DR: Although large-scale applications of metabolic profiling are still novel, it seems likely that comprehensive biomarker data will contribute to etiologic understanding of various diseases and abilities to predict disease risks, with the potential to translate into multiple clinical settings.
References
More filters
Journal ArticleDOI

PLINK: A Tool Set for Whole-Genome Association and Population-Based Linkage Analyses

TL;DR: This work introduces PLINK, an open-source C/C++ WGAS tool set, and describes the five main domains of function: data management, summary statistics, population stratification, association analysis, and identity-by-descent estimation, which focuses on the estimation and use of identity- by-state and identity/descent information in the context of population-based whole-genome studies.
Journal ArticleDOI

Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls

Paul Burton, +195 more
- 07 Jun 2007 - 
TL;DR: This study has demonstrated that careful use of a shared control group represents a safe and effective approach to GWA analyses of multiple disease phenotypes; generated a genome-wide genotype database for future studies of common diseases in the British population; and shown that, provided individuals with non-European ancestry are excluded, the extent of population stratification in theBritish population is generally modest.
Journal ArticleDOI

GCTA: a tool for genome-wide complex trait analysis.

TL;DR: The GCTA software is a versatile tool to estimate and partition complex trait variation with large GWAS data sets and focuses on the function of estimating the variance explained by all the SNPs on the X chromosome and testing the hypotheses of dosage compensation.
Journal ArticleDOI

Integrative analysis of 111 reference human epigenomes

Anshul Kundaje, +123 more
- 19 Feb 2015 - 
TL;DR: It is shown that disease- and trait-associated genetic variants are enriched in tissue-specific epigenomic marks, revealing biologically relevant cell types for diverse human traits, and providing a resource for interpreting the molecular basis of human disease.
Journal ArticleDOI

REVIGO Summarizes and Visualizes Long Lists of Gene Ontology Terms

TL;DR: REVIGO is a Web server that summarizes long, unintelligible lists of GO terms by finding a representative subset of the terms using a simple clustering algorithm that relies on semantic similarity measures.
Related Papers (5)

Integrative analysis of 111 reference human epigenomes

Anshul Kundaje, +123 more
- 19 Feb 2015 -