A data-driven approach to preprocessing Illumina 450K methylation array data

doi:10.1186/1471-2164-14-293

Open AccessJournal ArticleDOI

A data-driven approach to preprocessing Illumina 450K methylation array data

Ruth Pidsley, +6 more

- 01 May 2013 -

BMC Genomics

- Vol. 14, Iss: 1, pp 293-293

Chats0

TLDR

It is demonstrated that quantile normalization methods produce marked improvement, even in highly consistent data, by all three metrics, and that careful selection of preprocessing steps can minimize variance and thus improve statistical power, especially for the detection of the small absolute DNA methylation changes likely associated with complex disease phenotypes.

Abstract:

As the most stable and experimentally accessible epigenetic mark, DNA methylation is of great interest to the research community. The landscape of DNA methylation across tissues, through development and in disease pathogenesis is not yet well characterized. Thus there is a need for rapid and cost effective methods for assessing genome-wide levels of DNA methylation. The Illumina Infinium HumanMethylation450 (450K) BeadChip is a very useful addition to the available methods for DNA methylation analysis but its complex design, incorporating two different assay methods, requires careful consideration. Accordingly, several normalization schemes have been published. We have taken advantage of known DNA methylation patterns associated with genomic imprinting and X-chromosome inactivation (XCI), in addition to the performance of SNP genotyping assays present on the array, to derive three independent metrics which we use to test alternative schemes of correction and normalization. These metrics also have potential utility as quality scores for datasets. The standard index of DNA methylation at any specific CpG site is β = M/(M + U + 100) where M and U are methylated and unmethylated signal intensities, respectively. Betas (βs) calculated from raw signal intensities (the default GenomeStudio behavior) perform well, but using 11 methylomic datasets we demonstrate that quantile normalization methods produce marked improvement, even in highly consistent data, by all three metrics. The commonly used procedure of normalizing betas is inferior to the separate normalization of M and U, and it is also advantageous to normalize Type I and Type II assays separately. More elaborate manipulation of quantiles proves to be counterproductive. Careful selection of preprocessing steps can minimize variance and thus improve statistical power, especially for the detection of the small absolute DNA methylation changes likely associated with complex disease phenotypes. For the convenience of the research community we have created a user-friendly R software package called wateRmelon, downloadable from bioConductor, compatible with the existing methylumi, minfi and IMA packages, that allows others to utilize the same normalization methods and data quality tests on 450K data.

A data-driven approach to preprocessing Illumina 450K methylation array data

Citations

Classification of large DNA methylation datasets for identifying cancer drivers

Epigenome-Wide Association Study Indicates Hypomethylation of MTRNR2L8 in Large-Artery Atherosclerosis Stroke.

Cumulative lifetime maternal stress and epigenome-wide placental DNA methylation in the PRISM cohort.

DNA methylation and inflammation marker profiles associated with a history of depression

Methylomic markers of persistent childhood asthma: a longitudinal study of asthma-discordant monozygotic twins.

References

Bioconductor: open software development for computational biology and bioinformatics

Exploration, normalization, and summaries of high density oligonucleotide array probe level data

limma: Linear Models for Microarray Data

Bioinformatics and Computational Biology Solutions Using R and Bioconductor

Bioinformatics and Computational Biology Solutions Using R and Bioconductor

Related Papers (5)

Minfi: A flexible and comprehensive Bioconductor package for the analysis of Infinium DNA Methylation microarrays

DNA methylation arrays as surrogate measures of cell mixture distribution

DNA methylation age of human tissues and cell types

Genome-wide Methylation Profiles Reveal Quantitative Views of Human Aging Rates

Adjusting batch effects in microarray expression data using empirical Bayes methods