scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Epigenome-wide association studies for common human diseases

TL;DR: This work discusses EWAS design, cohort and sample selections, statistical significance and power, confounding factors and follow-up studies, and how integration of EWASs with GWASs can help to dissect complex GWAS haplotypes for functional analysis.
Abstract: Despite the success of genome-wide association studies (GWASs) in identifying loci associated with common diseases, a substantial proportion of the causality remains unexplained. Recent advances in genomic technologies have placed us in a position to initiate large-scale studies of human disease-associated epigenetic variation, specifically variation in DNA methylation. Such epigenome-wide association studies (EWASs) present novel opportunities but also create new challenges that are not encountered in GWASs. We discuss EWAS design, cohort and sample selections, statistical significance and power, confounding factors and follow-up studies. We also discuss how integration of EWASs with GWASs can help to dissect complex GWAS haplotypes for functional analysis.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: A suite of computational tools that incorporate state-of-the-art statistical techniques for the analysis of DNAm data are described that include methods for preprocessing, quality assessment and detection of differentially methylated regions from the kilobase to the megabase scale.
Abstract: Motivation The recently released Infinium HumanMethylation450 array (the '450k' array) provides a high-throughput assay to quantify DNA methylation (DNAm) at ∼450 000 loci across a range of genomic features. Although less comprehensive than high-throughput sequencing-based techniques, this product is more cost-effective and promises to be the most widely used DNAm high-throughput measurement technology over the next several years. Results Here we describe a suite of computational tools that incorporate state-of-the-art statistical techniques for the analysis of DNAm data. The software is structured to easily adapt to future versions of the technology. We include methods for preprocessing, quality assessment and detection of differentially methylated regions from the kilobase to the megabase scale. We show how our software provides a powerful and flexible development platform for future methods. We also illustrate how our methods empower the technology to make discoveries previously thought to be possible only with sequencing-based methods. Availability and implementation http://bioconductor.org/packages/release/bioc/html/minfi.html. Contact khansen@jhsph.edu; rafa@jimmy.harvard.edu Supplementary information Supplementary data are available at Bioinformatics online.

2,961 citations

Journal ArticleDOI
15 Jan 2015-Nature
TL;DR: These observations indicate that the underlying DNA sequence largely accounts for local patterns of methylation, which is highly informative when studying gene regulation in normal and diseased cells, and it can potentially function as a biomarker.
Abstract: Cytosine methylation is a DNA modification generally associated with transcriptional silencing. Factors that regulate methylation have been linked to human disease, yet how they contribute to malignances remains largely unknown. Genomic maps of DNA methylation have revealed unexpected dynamics at gene regulatory regions, including active demethylation by TET proteins at binding sites for transcription factors. These observations indicate that the underlying DNA sequence largely accounts for local patterns of methylation. As a result, this mark is highly informative when studying gene regulation in normal and diseased cells, and it can potentially function as a biomarker. Although these findings challenge the view that methylation is generally instructive for gene silencing, several open questions remain, including how methylation is targeted and recognized and in what context it affects genome readout.

1,564 citations

Journal ArticleDOI
01 Oct 2011-Genomics
TL;DR: The ability to determine genome-wide methylation patterns will rapidly advance methylation research.

1,552 citations


Cites background from "Epigenome-wide association studies ..."

  • ...And while important questions pertaining to study design remain, the potential value of epigenome-wide association studies as well as the integration of genotype and methylation data across sample populations has already begun to be explored [13,37,38]....

    [...]

Journal ArticleDOI
TL;DR: A novel model-based intra-array normalization strategy for 450 k data, called BMIQ (Beta MIxture Quantile dilation), to adjust the beta-values of type2 design probes into a statistical distribution characteristic of type1 probes is proposed.
Abstract: Motivation: The Illumina Infinium 450 k DNA Methylation Beadchip is a prime candidate technology for Epigenome-Wide Association Studies (EWAS). However, a difficulty associated with these beadarrays is that probes come in two different designs, characterized by widely different DNA methylation distributions and dynamic range, which may bias downstream analyses. A key statistical issue is therefore how best to adjust for the two different probe designs. Results: Here we propose a novel model-based intra-array normalization strategy for 450 k data, called BMIQ (Beta MIxture Quantile dilation), to adjust the beta-values of type2 design probes into a statistical distribution characteristic of type1 probes. The strategy involves application of a three-state beta-mixture model to assign probes to methylation states, subsequent transformation of probabilities into quantiles and finally a methylation-dependent dilation transformation to preserve the monotonicity and continuity of the data. We validate our method on cell-line data, fresh frozen and paraffin-embedded tumour tissue samples and demonstrate that BMIQ compares favourably with two competing methods. Specifically, we show that BMIQ improves the robustness of the normalization procedure, reduces the technical variation and bias of type2 probe values and successfully eliminates the type1 enrichment bias caused by the lower dynamic range of type2 probes. BMIQ will be useful as a preprocessing step for any study using the Illumina Infinium 450 k platform. Availability: BMIQ is freely available from http://code.google.com/p/bmiq/. Contact: a.teschendorff@ucl.ac.uk Supplementary information:Supplementary data are available at Bioinformatics online

1,257 citations

Journal ArticleDOI
25 Jul 2012-PLOS ONE
TL;DR: In healthy male blood donors there is important variation in the methylation profiles of whole blood, mononuclear cells, granulocytes, and cells from seven selected purified lineages, indicating that whole blood methylation results might be unintelligible.
Abstract: Methylation of cytosines at CpG sites is a common epigenetic DNA modification that can be measured by a large number of methods, now even in a genome-wide manner for hundreds of thousands of sites. The application of DNA methylation analysis is becoming widely popular in complex disorders, for example, to understand part of the “missing heritability”. The DNA samples most readily available for methylation studies are derived from whole blood. However, blood consists of many functionally and developmentally distinct cell populations in varying proportions. We studied whether such variation might affect the interpretation of methylation studies based on whole blood DNA. We found in healthy male blood donors there is important variation in the methylation profiles of whole blood, mononuclear cells, granulocytes, and cells from seven selected purified lineages. CpG methylation between mononuclear cells and granulocytes differed for 22% of the 8252 probes covering the selected 343 genes implicated in immune-related disorders by genome-wide association studies, and at least one probe was differentially methylated for 85% of the genes, indicating that whole blood methylation results might be unintelligible. For individual genes, even if the overall methylation patterns might appear similar, a few CpG sites in the regulatory regions may have opposite methylation patterns (i.e., hypo/hyper) in the main blood cell types. We conclude that interpretation of whole blood methylation profiles should be performed with great caution and for any differences implicated in a disorder, the differences resulting from varying proportions of white blood cell types should be considered.

932 citations


Cites background from "Epigenome-wide association studies ..."

  • ...Results of genome wide association studies together with the marked increase in the prevalence of several complex diseases during the last decades, for example asthma and allergy, suggests that other mechanisms such as epigenetics, including DNA methylation, may also be involved [12,13]....

    [...]

References
More filters
Journal ArticleDOI
Paul Burton1, David Clayton2, Lon R. Cardon, Nicholas John Craddock3  +192 moreInstitutions (4)
07 Jun 2007-Nature
TL;DR: This study has demonstrated that careful use of a shared control group represents a safe and effective approach to GWA analyses of multiple disease phenotypes; generated a genome-wide genotype database for future studies of common diseases in the British population; and shown that, provided individuals with non-European ancestry are excluded, the extent of population stratification in theBritish population is generally modest.
Abstract: There is increasing evidence that genome-wide association ( GWA) studies represent a powerful approach to the identification of genes involved in common human diseases. We describe a joint GWA study ( using the Affymetrix GeneChip 500K Mapping Array Set) undertaken in the British population, which has examined similar to 2,000 individuals for each of 7 major diseases and a shared set of similar to 3,000 controls. Case-control comparisons identified 24 independent association signals at P < 5 X 10(-7): 1 in bipolar disorder, 1 in coronary artery disease, 9 in Crohn's disease, 3 in rheumatoid arthritis, 7 in type 1 diabetes and 3 in type 2 diabetes. On the basis of prior findings and replication studies thus-far completed, almost all of these signals reflect genuine susceptibility effects. We observed association at many previously identified loci, and found compelling evidence that some loci confer risk for more than one of the diseases studied. Across all diseases, we identified a large number of further signals ( including 58 loci with single-point P values between 10(-5) and 5 X 10(-7)) likely to yield additional susceptibility loci. The importance of appropriately large samples was confirmed by the modest effect sizes observed at most loci identified. This study thus represents a thorough validation of the GWA approach. It has also demonstrated that careful use of a shared control group represents a safe and effective approach to GWA analyses of multiple disease phenotypes; has generated a genome-wide genotype database for future studies of common diseases in the British population; and shown that, provided individuals with non-European ancestry are excluded, the extent of population stratification in the British population is generally modest. Our findings offer new avenues for exploring the pathophysiology of these important disorders. We anticipate that our data, results and software, which will be widely available to other investigators, will provide a powerful resource for human genetics research.

9,244 citations

Journal ArticleDOI
John W. Belmont1, Andrew Boudreau, Suzanne M. Leal1, Paul Hardenbol  +229 moreInstitutions (40)
27 Oct 2005
TL;DR: A public database of common variation in the human genome: more than one million single nucleotide polymorphisms for which accurate and complete genotypes have been obtained in 269 DNA samples from four populations, including ten 500-kilobase regions in which essentially all information about common DNA variation has been extracted.
Abstract: Inherited genetic variation has a critical but as yet largely uncharacterized role in human disease. Here we report a public database of common variation in the human genome: more than one million single nucleotide polymorphisms (SNPs) for which accurate and complete genotypes have been obtained in 269 DNA samples from four populations, including ten 500-kilobase regions in which essentially all information about common DNA variation has been extracted. These data document the generality of recombination hotspots, a block-like structure of linkage disequilibrium and low haplotype diversity, leading to substantial correlations of SNPs with many of their neighbours. We show how the HapMap resource can guide the design and analysis of genetic association studies, shed light on structural variation and recombination, and identify loci that may have been subject to natural selection during human evolution.

5,479 citations


"Epigenome-wide association studies ..." refers background in this paper

  • ...The single most useful resource empowering GWASs was the availability of a detailed SNP map of the human genom...

    [...]

Journal ArticleDOI
15 May 2009-Science
TL;DR: It is shown here that TET1, a fusion partner of the MLL gene in acute myeloid leukemia, is a 2-oxoglutarate (2OG)- and Fe(II)-dependent enzyme that catalyzes conversion of 5mC to 5-hydroxymethylcytosine (hmC) in cultured cells and in vitro.
Abstract: DNA cytosine methylation is crucial for retrotransposon silencing and mammalian development. In a computational search for enzymes that could modify 5-methylcytosine (5mC), we identified TET proteins as mammalian homologs of the trypanosome proteins JBP1 and JBP2, which have been proposed to oxidize the 5-methyl group of thymine. We show here that TET1, a fusion partner of the MLL gene in acute myeloid leukemia, is a 2-oxoglutarate (2OG)- and Fe(II)-dependent enzyme that catalyzes conversion of 5mC to 5-hydroxymethylcytosine (hmC) in cultured cells and in vitro. hmC is present in the genome of mouse embryonic stem cells, and hmC levels decrease upon RNA interference–mediated depletion of TET1. Thus, TET proteins have potential roles in epigenetic regulation through modification of 5mC to hmC.

5,155 citations

Journal ArticleDOI
18 Oct 2007-Nature
TL;DR: The Phase II HapMap is described, which characterizes over 3.1 million human single nucleotide polymorphisms genotyped in 270 individuals from four geographically diverse populations and includes 25–35% of common SNP variation in the populations surveyed, and increased differentiation at non-synonymous, compared to synonymous, SNPs is demonstrated.
Abstract: We describe the Phase II HapMap, which characterizes over 3.1 million human single nucleotide polymorphisms (SNPs) genotyped in 270 individuals from four geographically diverse populations and includes 25-35% of common SNP variation in the populations surveyed. The map is estimated to capture untyped common variation with an average maximum r2 of between 0.9 and 0.96 depending on population. We demonstrate that the current generation of commercial genome-wide genotyping products captures common Phase II SNPs with an average maximum r2 of up to 0.8 in African and up to 0.95 in non-African populations, and that potential gains in power in association studies can be obtained through imputation. These data also reveal novel aspects of the structure of linkage disequilibrium. We show that 10-30% of pairs of individuals within a population share at least one region of extended genetic identity arising from recent ancestry and that up to 1% of all common variants are untaggable, primarily because they lie within recombination hotspots. We show that recombination rates vary systematically around genes and between genes of different function. Finally, we demonstrate increased differentiation at non-synonymous, compared to synonymous, SNPs, resulting from systematic differences in the strength or efficacy of natural selection between populations.

4,565 citations

Journal ArticleDOI
19 Nov 2009-Nature
TL;DR: The first genome-wide, single-base-resolution maps of methylated cytosines in a mammalian genome, from both human embryonic stem cells and fetal fibroblasts, along with comparative analysis of messenger RNA and small RNA components of the transcriptome, several histone modifications, and sites of DNA-protein interaction for several key regulatory factors were presented in this article.
Abstract: DNA cytosine methylation is a central epigenetic modification that has essential roles in cellular processes including genome regulation, development and disease. Here we present the first genome-wide, single-base-resolution maps of methylated cytosines in a mammalian genome, from both human embryonic stem cells and fetal fibroblasts, along with comparative analysis of messenger RNA and small RNA components of the transcriptome, several histone modifications, and sites of DNA-protein interaction for several key regulatory factors. Widespread differences were identified in the composition and patterning of cytosine methylation between the two genomes. Nearly one-quarter of all methylation identified in embryonic stem cells was in a non-CG context, suggesting that embryonic stem cells may use different methylation mechanisms to affect gene regulation. Methylation in non-CG contexts showed enrichment in gene bodies and depletion in protein binding sites and enhancers. Non-CG methylation disappeared upon induced differentiation of the embryonic stem cells, and was restored in induced pluripotent stem cells. We identified hundreds of differentially methylated regions proximal to genes involved in pluripotency and differentiation, and widespread reduced methylation levels in fibroblasts associated with lower transcriptional activity. These reference epigenomes provide a foundation for future studies exploring this key epigenetic modification in human disease and development.

4,266 citations

Related Papers (5)