PDF hosted at the Radboud Repository of the Radboud University
Nijmegen
The following full text is a publisher's version.
For additional information about this publication click this link.
http://hdl.handle.net/2066/129297
Please be advised that this information was generated on 2022-08-09 and may be subject to
change.
Twin Research and Human Genetics
Volume 16 Number 4 pp. 767–781
C
The Authors 2013 doi:10.1017/thg.2013.30
GWAS of DNA Methylation Variation Within
Imprinting Control Regions Suggests
Parent-of-Origin Association
Miguel E. Renter
´
ıa,
1,2
Marcel W. Coolen,
3,4
Aaron L. Statham,
3
R. Seong Min Choi,
1
Wenjia Qu,
3
Megan J. Campbell,
1
Sara Smith,
1
Anjali K. Henders,
1
Grant W. Montgomery,
1
Susan J. Clark,
3
Nicholas G. Martin,
1
and Sarah E. Medland
1
1
Queensland Institute of Medical Research, Brisbane, Queensland, Australia
2
The University of Queensland, School of Psychology, Brisbane, Queensland, Australia
3
Epigenetics Research Group, Cancer Program, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
4
Department of Human Genetics, Nijmegen Centre for Molecular Life Sciences, Radboud University Nijmegen Medical
Centre, Nijmegen, The Netherlands
Imprinting control regions (ICRs) play a fundamental role in establishing and maintaining the non-random
monoallelic expression of certain genes, via common regulatory elements such as non-coding RNAs and
differentially methylated regions (DMRs) of DNA. We recently surveyed DNA methylation levels within four
ICRs (
H19-
ICR,
IGF2-
DMR, KvDMR, and
NESPAS-
ICR) in whole-blood genomic DNA from 128 monozygotic
(MZ) and 128 dizygotic (DZ) human twin pairs. Our analyses revealed high individual variation and intra-
domain covariation in methylation levels across CpGs and emphasized the interaction between epigenetic
variation and the underlying genetic sequence in a parent-of-origin fashion. Here, we extend our analysis
to conduct two genome-wide screenings of single nucleotide polymorphisms (SNPs) underlying either
intra-domain covariation or parent-of-origin-dependent association with methylation status at individual
CpG sites located within ICRs. Although genome-wide significance was not surpassed due to sample size
limitations, the most significantly associated SNPs found through multiple-trait genome-wide association
(MQFAM) included the previously described rs10732516, which is located in the vicinity of the
H19
-ICR.
Similarly, we identified an association between rs965808 and methylation status within the
NESPAS
-ICR.
This SNP is positioned within an intronic region of the overlapping genes
GNAS
and
GNAS-AS1
,which
are imprinted genes regulated by the
NESPAS
-ICR. Sixteen other SNPs located in regions apart from
the analyzed regions displayed suggestive association with intra-domain methylation. Additionally, we
identified 13 SNPs displaying parent-of-origin association with individual methylation sites through family-
based association testing. In this exploratory study, we show the value and feasibility of using alternative
GWAS approaches in the study of the interaction between epigenetic state and genetic sequence within
imprinting regulatory domains. Despite the relatively small sample size, we identified a number of SNPs
displaying suggestive association either in a domain-wide or in a parent-of-origin fashion. Nevertheless,
these associations will require future experimental validation or replication in larger and independent
samples.
Keywords: imprinting,
IGF2
,
H19
,
NESPAS
, KvDMR, GWAS, SNP, parent-of-origin, methylation
DNA methylation is a fundamental epigenetic modification
implicated in many cellular processes, such as development,
chromatin structure, maintenance of genomic imprinting,
and X chromosome inactivation in females (Chow et al.,
2005; Delaval & Feil, 2004; Robertson, 2005). DNA methy-
lation status across specific loci in the genome also mediates
the regulation of gene expression and maintains genome in-
tegrity (Weber & Schubeler, 2007). DNA methylation pat-
terns are remarkably stable within functionally important
regions (Kaminsky et al., 2009; Ushijima et al., 2003) but
canbehighlyvariableinothergenomicregions(Fragaetal.,
RECEIVED 9 April 2013; ACCEPTED 18 April 2013. First published
online 3 June 2013.
ADDRESS FOR CORRESPONDENCE: Miguel E. Renter
´
ıa, Queensland
Institute of Medical Research, Locked Bag 2000, Royal Bris-
bane Hospital, Brisbane, Queensland 4029, Australia. E-mail:
Miguel.Renteria@qimr.edu.au
767
https://www.cambridge.org/core/terms. https://doi.org/10.1017/thg.2013.30
Downloaded from https://www.cambridge.org/core. Universiteitsbibliotheek Nijmegen, on 08 Apr 2021 at 13:56:19, subject to the Cambridge Core terms of use, available at
Miguel E. Renter
´
ıa et al.
2005; Mill et al., 2006; Petronis et al., 2003), even in ge-
netically identical individuals (Coolen et al., 2011; Fraga
et al., 2005; Heijmans et al., 2007). This suggests a possible
involvement of environmental factors in the modulation
of these changes. Interestingly, there is now increasing ev-
idence that heritable factors may also play a role in the
determination of DNA methylation patterns, at least at the
genome-wide level (Bjornsson et al., 2008; Boks et al., 2009;
Numata et al., 2012).
Imprinting control regions (ICRs) are differentially
methylated regulatory elements located near clusters of im-
printed genes that coordinate preferential gene transcrip-
tion in a parent-of-origin fashion (Constancia et al., 2004;
Li et al., 1993). ICRs entail segments of DNA rich in CpG
dinucleotides, with the cytosine nucleotides displaying dif-
ferential methylation depending upon parent-of-origin al-
leles. Whether the methylated/unmethylated state leads to
silencing or not depends upon the default imprint state
of the region (i.e., methylated or unmethylated; Lee et al.,
2002). Abnormal DNA methylation within ICRs has been
shown to result in loss of appropriate expression of im-
printed genes. It is also associated with a number of dis-
eases, such as Beckwi th–Wiedemann syndrome (Smilinich
et al., 1999; Weksberg et al., 2001, 2002), Silver–Russell syn-
drome (Eggermann et al., 2008; Gicquel et al., 2005), An-
gelman syndrome, and Prader–Willi syndrome (Zeschnigk
et al., 1997). Due to the functional importance of genetic
imprinting, it has been proposed that DNA methylation
within ICRs might be under particular genetic regulation
(Heijmans et al., 2007).
To address the impact of genetic sequence on DNA
methylation variation, we recently performed a detailed
study in a sample comprising whole-blood DNA from 128
pairs of monozygotic (MZ) and 128 pairs of dizygotic (DZ)
human twins (Coolen et al., 2011). Twin studies have been
particularly valuable in genetics research, and now offer
an opportunity for the study of epigenetic variation as
a dynamic quantitative trait. MZ pairs share the entirety
of their genetic sequence and/or DZ pairs share approx-
imately half of their genetic alleles, while both MZ and
DZ pairs are usually subject to similar household condi-
tions. Thus, by studying epigenetic differences in twins, it
is possible to infer the proportion of epigenetic variation
attributable to either environmental or genetic factors. We
recently interrogated DNA methylation status of CpGs lo-
cated within four ICRs (H19-ICR, IGF2-DMR, KvDMR,
and NESPAS-ICR) and the non imprinted gene RUNX1
(Coolen et al., 2011). Details of interrogated domains and
individual CpGs are given in Figure S1 of our previous study
(Coolen et al., 2011). A high degree of variability in indi-
vidual CpG methylation levels, notably at the H19/IGF2
loci, was observed (see Figure 2 of Coolen et al., 2011).
And, overall, a similar pattern of methylation variation
between MZ and DZ twins was found, with imprinted
regions displaying median methylation levels around the
expected 50%, and only a few single nucleotide poly-
morphisms (SNPs) showing increased mean methylation
(H19-ICR
CpG02, H19-ICR CpG03, IGF2-DMR CpG05,
and NESPAS-ICR
CpG14). Part of this DNA methylation
plasticity seems to be clearly attributable to environmental
and sto chastic factors. However, concordant gains or losses
ofmethylationweremorecommoninMZthanDZtwin
pairs, suggesting that de novo and/or maintenance methy-
lation is influenced by the surrounding DNA sequence (see
Table 1 of Coolen et al., 2011). This is in agreement wi th a
previous longitudinal study, which reported familial clus-
tering of methylation variation over time (Bjornsson et al.,
2008). Importantly, we also observed significant intra- but
not inter-domain covariation in methylation state across
ICRs (see Figure 3 in Coolen et al., 2011) and showed that
the rs10732516 (A/G) polymorphism, as well as the co-
inherited rs2839701 (G/C) polymorphism — both located
within the IGF2/H19 locus — are strongly associated with
increased hypermethylation of specific CpG sites in the ma-
ternal H19 allele, which was later confirmed by clonal bisul-
fite sequencing analysis (see Figure 7 in Coolen et al., 2011).
Additionally, we also performed a DNA sequencing analysis
of all interrogated regions to exclude the possibility of bias
introduced by genetic var iation or mutations within primer
binding sequences. Here, we expand our previous analyses
to conduct a genome-wide search for other common poly-
morphisms that may be associated with methylation status
within the ICRs of interest.
The identification of genetic variants that are associated
with gene-specific DNA methylation could open a new av-
enue to the understanding of DNA methylation regulation.
Genome-wide association studies (GWAS) of DNA methy-
lation have been recently published and a number of methy-
lation quantitative trait loci (mQTL) have been identified
(Bell et al., 2011; Gibbs et al., 2010), demonstrating the
usefulness of incorporating this approach into epigenetics
research. However, by treating each individual CpG as an
independent variable, such studies h ave ignored the under-
lying correlation that exists across neighboring methylation
sites.
In order to maximize power to detect common SNPs
associated with DNA methylation within the ICRs of inter-
est, we have used the MQFAM algorithm implementation
for PLINK (Ferreira & Purcell, 2009), a multivariate GWAS
approach that employs canonical correlation analysis to ac-
count for cross-trait covariance (Ferreira & Purcell, 2009).
Multiple-trait GWAS improves power to detect association
signals with pleiotropic effects over sets of correlated traits,
without an increase in the false discovery rate (Bolormaa
et al., 2010), and has been applied to combinations of phe-
notypes such a s blood lipid levels and gene expression (In-
ouye et al., 2010) or obesity, and osteoporosis measures (Liu
et al., 2009).
Furthermore, as our previous analysis (Coolen et al.,
2011) highlighted that SNP rs10732516 appears to have a
768 TWIN RESEARCH AND HUMAN GENETICS
https://www.cambridge.org/core/terms. https://doi.org/10.1017/thg.2013.30
Downloaded from https://www.cambridge.org/core. Universiteitsbibliotheek Nijmegen, on 08 Apr 2021 at 13:56:19, subject to the Cambridge Core terms of use, available at
Parent-of-Origin and Multivariate GWAS of DNA Methylation
substantial impact on the DNA methylation status of the
H19-ICR region only when inherited on the maternal al-
lele, we also conducted genome-wide family-based associa-
tion using the quantitative transmission disequilibrium test
(QTDT) package to test for quantitative association of both
paternally and maternally inherited SNP alleles with the
DNA methylation status at each methylation site (Spielman
& Ewens, 1996).
Materials and Methods
Subjects
The twin samples analyzed are part of a study on moliness
and cognitive function (Bataille et al., 2000; Wright et al.,
2001) and comprise 512 adolescent twins (70 female/female
and 58 male/male MZ twin pairs; 25 female/female,
29 male/male, and 74 opposite sex DZ twin pairs), with
a mean age of 14.15 years (SD = 2.46; range 12–22.85). The
samples are predominantly (>95%) of northern European
origin (mainly Anglo-Celtic). Zygosity of the twins was con-
firmed using microsatellite repeat marker testing. Written
informed consent to participate in this study was given by all
participants and by their parents, legal guardians, or care-
takers when participants were under-aged. The study pro-
tocol was approved by the Queensland Institute of Medical
Research Human Research Ethics Committee (Ethics ap-
provals P193 and P455). Additionally, as part of the study,
parents were asked to complete a series of questionnaires
and to donate blood samples.
Genotyping
DNA was extracted from blood samples and SNP geno-
typing performed with Illumina HumanHap 610W Quad
arrays (http://support.illumina.com/array/array
kits/
human610-quad
beadchip kit/documentation.ilmn) by
deCODE Genetics (Reykjav
´
ık, Finland). Genotype data
were screened through a series of quality control criteria
including Mendelian errors, minor allele frequency (MAF)
≥ 1%, p value of a Hardy–Weinberg equilibrium (HWE)
test ≥ 10
−6
, SNP call r ate > 95%, and Illumina Beadstudio
GenCall score ≥ 0.7. We also screened for ancestral outliers
by using principal component analysis (PCA; Price et al.,
2006), comparing the genotyped data in the discovery
sample with 16 global populations sourced from HapMap
Phase 3 and northern European subpopulations from a
previous study (McEvoy et al., 2009).
DNA Methylation Assay Design
MassCLEAVE
TM
assays against the genomic regions of in-
terest were designed a nd tested using the AmpliconReport
R-script that we described previously (Coolen et al., 2007).
In brief, the method consists of a Polymerase chain reac-
tion (PCR) amplification of bisulfite-converted DNAtagged
with a T7-promoter, which is followed by the generation of
a single-stranded RNA molecule and base-specific cleavage
(3
to either rUTP or rCTP) with RNase A. The result-
ing mixture of cleavage products with different length and
mass is analyzed with MALDI-TOF mass spectrometry. Dif-
ferences in original DNA methylation state are reflected in
changesinnucleotidesequenceafterbisulfitetreatment,
and therefore will produce different fragment masses in the
assay. The abundance of each fragment (signal/noise level
in the spectrum) acts as an indicator of the amount of DNA
methylation in the interrogated sequence (Coolen et al.,
2007). The analyzed regions comprised imprinted regions
H19-ICR (Takai et al., 2001), IGF2-DMR (Cui et al., 2003),
KvDMR (Nakano et al., 2006; Smilinich et al., 1999), and
NESPAS-ICR (Liu et al., 2005), and the promoter of the
non-imprinted RUNX1 gene (see Figure S1 in Coolen et al.,
2011). In this study, we focus our analyses exclusively on
the ICRs.
Genomic Bisulfite Treatment
DNA methylation measurements were perfor m ed on ge-
nomic DNA extracted from whole blood. Bisulfite treat-
ment was carried out u sing the EZ-96 DNA Methylation-
Gold Kit (Zymo Research: Cat No. D5008) according to the
manufacturer’s instructions. In brief, 500 ng DNA was used
in the bisulfite reaction and incubated for 8 hours at 55
◦
C.
After desulfonation and clean up, the bisulfite treated DNA
was resuspended in 50 Lofwhich2Lwasusedineach
PCR.
PCR-Tagging, In Vitro Transcription and Mass
Spectrometry Analysis
The target regions were amplified in t riplicate using the
primer pairs and annealing temperatures (Ta) described
in Table S1 of Coolen et al. (2011). The MassCleave
methylation analysis was performed as described prev iously
(Coolen et al., 2007). In brief, the PCR reactions were car-
ried out in a total volume of 5 L using 200 nM forward and
reverse primer, 200 M Deoxyribonucleotide triphosphate
(dNTP), 1.5 mM MgCl
2
1:100,000 dilution of SYBR Green
(Invitrogen) and 0.35 U Platinum Taq DNApolymerase (In-
vitrogen) in 1 × PCR buffer without magnesium. PCR suc-
cess was determined via melt curve analysis (ABI PRISM
R
7700). Unincorporated dNTPs were dephosphorylated by
incubation at 37
◦
C for 20 minutes in the presence of 1.7
LH
2
O and 0.3 U Shrimp Alkaline Phosphatase (SAP),
followed by a heat-inactivation for 5 minutes at 85
◦
C. The
triplicate PCR samples after SAP treatment were pooled
and of this pool, 2 Lwereusedina7L transcription
reaction, containing 3.14 mM DTT, 2.5 mM dCTP, 1 mM
rUTP,1mMrGTP,1mMrATP,20UT7R&DNApoly-
merase, and 0.09 mg/mL RNase A in 0.64 × T7 polymerase
buffer (all reagents from SEQUENOM, San Diego). Tran-
scription and digestion were performed in the same step at
37
◦
C for 3 hours. After the addition of 20 LH
2
O, con-
ditioning of t he phosphate backbone prior to MALDI-TOF
MS was achieved by the addition of 6 mg CLEAN Resin
(SEQUENOM, San Diego). Twenty-two nanoliters of the
TWIN RESEARCH AND HUMAN GENETICS 769
https://www.cambridge.org/core/terms. https://doi.org/10.1017/thg.2013.30
Downloaded from https://www.cambridge.org/core. Universiteitsbibliotheek Nijmegen, on 08 Apr 2021 at 13:56:19, subject to the Cambridge Core terms of use, available at
Miguel E. Renter
´
ıa et al.
cleavage reactions were robotically dispensed onto silicon
chips preloaded with matr ix (SpectroCHIPs; SEQUENOM,
San Diego). Mass spectra were collected using a MassAR-
RAY mass spectrometer (SEQUENOM).
Calculation of Methylation Ratios
We used a sensitive and high-throughput method for DNA
methylation analysis that is quantitative to 5% methylation
for each informative CpG residue (Coolen et al., 2007). Cal-
culation of the DNA methylation ratios was performed us-
ing the R-script Analyze Sequenom Function (ASF; Coolen
et al., 2007), which is an adaptation of the formula used
by MassCLEAVE
TM
software. A full description of ASF has
been published previously (Coolen et al., 2007). All sta-
tistical calculations were carried out using either Stata 9
(StataCorp LP, Texas, USA) or the free ‘R’ software package
for statistical computing (http://www.R-project.org).
Multiple-Trait Association
Four parallel multiple-trait association were conducted
with the MQFAM (Ferreira & Purcell, 2009) algorithm as
implemented in PLINK (Purcell et al., 2007). Each study
comprised CpGs contained within each ICR domain. H19-
ICR (12 variables), IGF2-DMR (seven variables), KvDMR
(11 v ariables), and NESPAS-ICR (11 variables). Descriptive
statistics for these variables are presented in Table 1. MQ-
FAM uses canonical correlation analysis (CCA) to measure
the association between two sets of variables. To explain
how CCA works, it is possible to make the following anal-
ogy with PCA: PCA is usually applied to one set (X)of
possibly correlated traits to extract a number of indepen-
dent variates (components) that explain as much variance
in the original set, whereas CCA is applied to two sets of
variables (X and Y)toextractanumberofindependent
pairs of variates (U
i
, V
i
) that explain as much covariance
between the two original sets. More details on the method
can be found at the MQFAM support site (http://genepi.
qimr.edu.au/staff/manuelF/multivariate/main.html). Data
were adjusted to correct for any sex, age, age
2
,sex
∗
age,
and sex
∗
age
2
effects. Although PLINK does not account
for family structure, MQFAM offers the possibility of
running permutation testing to correct for relatedness.
However, given that this analysis is slow and computa-
tionally intensive when many traits are analyzed at the
same time, we performed a two-stage analysis in or-
der to reduce computation time. In the first stage, as-
sociation testing was performed on a filtered dataset
which included only one individual per family (N =
256), hence making permutation correction unnecessary,
and allowing us to identify a subset of potentially as-
sociated ‘candidate’ SNPs (cut-off was set at p value <
.0001). Then, in the second stage of the analysis, we tested
this subset of ‘candidate’ SNPs for association in the sample
consisting of all DZ twins (N = 256 individuals from 128
pairs) and one MZ from each pair ( N = 128 individuals),
resulting in a total sample of 384 individuals. We accounted
for relatedness using 1 million permutations at each locus.
Single-Trait Association
DNA methylation status was tested for standard single-trait
association at the individual methylation unit level using
MERLIN (Abecasis et al., 2002). Unlike PLINK, MERLIN
corrects for relatedness by incorporating pedigree data. We
also controlled for any age, sex, age
2
,sex
∗
age, and sex
∗
age
2
effects. The total sample size for single-trait association was
N = 512.
Family-Based Association Testing (Parent-of-
Origin Ass)
We conducted family-based genome-wide testing for quan-
titative total association of both paternally and maternally
inherited SNP alleles with DNA methylation levels at in-
dividual methylation sites across four ICRs using QTDT
(Spielman & Ewens, 1996). QTDT is a software package for
family-based linkage disequilibrium analyses of quantita-
tive and discrete traits that can use all the information in a
pedigree to construct powerful tests of association (Spiel-
man & Ewens, 1996). The total sample size (N = 1,024)
consisted of 512 twins (128 MZ and 128 DZ pairs) and
their parents (N = 512).
Results
Multivariate Intra-Domain GWAS
We performed multivariate GWAS by testing 515,966
genotyped SNPs for association with intra-domain DNA
methylation variation at four ICRs (H19-ICR, IGF2-DMR,
KvDMR, and NESPAS-ICR) using the MQFAM (Ferreira &
Purcell, 2009) algorithm implementation for PLINK (Pur-
cell et al., 2007). MQFAM performs CCA to measure the
association between two sets of variables by extracting the
linear combination of traits that explain the largest possi-
ble amount of the covariation between the marker (SNP)
and all traits. As detailed elsewhere (Coolen et al., 2011),
the number of analyzed DNA methylation sites by region
was 14 (H19-ICR), 8 (IGF2-DMR), 15 (KvDMR), and 17
(NESPAS-ICR).Datawereadjustedforanyageorsexef-
fects (and their interactions) prior to association analysis.
Due to method restrictions (PLINK is unable to account
for relatedness of individuals in the sample), we structured
our analysis in two stages. First, we tested for association in
a sample subset containing only one individual per family
selected at random (N = 256) and identified possible ‘can-
didate’ SNPs (those with p value < .0001). The candidate
SNPs were then tested for association in a larger dataset con-
sisting of 384 individuals (all DZ pairs, N = 256; and one
MZ twin per family, N = 128; this is because although MQ-
FAM permutation can account for DZ zygosity, it cannot
adjust for genetically identical individuals).
After per mutation correction, multivariate GWAS re-
sults (shown in Figure 1) revealed no SNPs with p values
770 TWIN RESEARCH AND HUMAN GENETICS
https://www.cambridge.org/core/terms. https://doi.org/10.1017/thg.2013.30
Downloaded from https://www.cambridge.org/core. Universiteitsbibliotheek Nijmegen, on 08 Apr 2021 at 13:56:19, subject to the Cambridge Core terms of use, available at