A data-driven approach to preprocessing Illumina 450K methylation array data
Ruth Pidsley,Chloe C. Y. Wong,Manuela Volta,Katie Lunnon,Jonathan Mill,Jonathan Mill,Leonard C. Schalkwyk +6 more
Reads0
Chats0
TLDR
It is demonstrated that quantile normalization methods produce marked improvement, even in highly consistent data, by all three metrics, and that careful selection of preprocessing steps can minimize variance and thus improve statistical power, especially for the detection of the small absolute DNA methylation changes likely associated with complex disease phenotypes.Abstract:
As the most stable and experimentally accessible epigenetic mark, DNA methylation is of great interest to the research community. The landscape of DNA methylation across tissues, through development and in disease pathogenesis is not yet well characterized. Thus there is a need for rapid and cost effective methods for assessing genome-wide levels of DNA methylation. The Illumina Infinium HumanMethylation450 (450K) BeadChip is a very useful addition to the available methods for DNA methylation analysis but its complex design, incorporating two different assay methods, requires careful consideration. Accordingly, several normalization schemes have been published. We have taken advantage of known DNA methylation patterns associated with genomic imprinting and X-chromosome inactivation (XCI), in addition to the performance of SNP genotyping assays present on the array, to derive three independent metrics which we use to test alternative schemes of correction and normalization. These metrics also have potential utility as quality scores for datasets. The standard index of DNA methylation at any specific CpG site is β = M/(M + U + 100) where M and U are methylated and unmethylated signal intensities, respectively. Betas (βs) calculated from raw signal intensities (the default GenomeStudio behavior) perform well, but using 11 methylomic datasets we demonstrate that quantile normalization methods produce marked improvement, even in highly consistent data, by all three metrics. The commonly used procedure of normalizing betas is inferior to the separate normalization of M and U, and it is also advantageous to normalize Type I and Type II assays separately. More elaborate manipulation of quantiles proves to be counterproductive. Careful selection of preprocessing steps can minimize variance and thus improve statistical power, especially for the detection of the small absolute DNA methylation changes likely associated with complex disease phenotypes. For the convenience of the research community we have created a user-friendly R software package called wateRmelon, downloadable from bioConductor, compatible with the existing methylumi, minfi and IMA packages, that allows others to utilize the same normalization methods and data quality tests on 450K data.read more
Citations
More filters
Journal ArticleDOI
Newborn DNA-methylation, childhood lung function, and the risks of asthma and COPD across the life course.
Herman T. den Dekker,Kimberley Burrows,Janine F. Felix,Lucas A. Salas,Ivana Nedeljkovic,Jin Yao,Sheryl L. Rifas-Shiman,Carlos Ruiz-Arenas,Najaf Amin,Mariona Bustamante,Dawn L. DeMeo,A. John Henderson,Caitlin G. Howe,Marie-France Hivert,M. Arfan Ikram,Johan C. de Jongste,Lies Lahousse,Lies Lahousse,Pooja R. Mandaviya,Joyce B. J. van Meurs,Mariona Pinart,Gemma C Sharp,Lisette Stolk,André G. Uitterlinden,Josep M. Antó,Augusto A. Litonjua,Carrie V. Breton,Guy Brusselle,Guy Brusselle,Jordi Sunyer,George Davey Smith,Caroline L Relton,Vincent W. V. Jaddoe,Liesbeth Duijts +33 more
TL;DR: The findings suggest that the epigenetic status of the newborn affects respiratory health and disease across the life course, of which up to 30% were associated with later-life asthma and COPD.
Journal ArticleDOI
Bigmelon: tools for analysing large DNA methylation datasets.
T.J. Gorrie-Stone,Melissa C. Smart,Ayden Saffari,Ayden Saffari,Karim Malki,Eilis Hannon,Joe Burrage,Jonathan Mill,Meena Kumari,Leonard C. Schalkwyk +9 more
TL;DR: The bigmelon R package is introduced, which provides a memory efficient workflow that enables users to perform the complex, large scale analyses required in epigenome wide association studies (EWAS) without the need for large RAM.
Journal ArticleDOI
An epigenome-wide association study of Alzheimer's disease blood highlights robust DNA hypermethylation in the HOXB6 gene
Janou A. Y. Roubroeks,Adam Smith,Rebecca G. Smith,Ehsan Pishva,Zina M. Ibrahim,Martina Sattlecker,Eilis Hannon,Iwona Kłoszewska,Patrizia Mecocci,Hilkka Soininen,Magda Tsolaki,Bruno Vellas,Lars-Olof Wahlund,Dag Aarsland,Petroula Proitsi,Angela Hodges,Simon Lovestone,Stephen J. Newhouse,Richard Dobson,Jonathan Mill,Daniel L.A. van den Hove,Katie Lunnon +21 more
TL;DR: This study represents the first large-scale epigenome-wide association study of Alzheimer's disease and mild cognitive impairment using blood, and highlights the differences in various loci and pathways in early disease, suggesting that these patterns relate to cognitive decline at an early stage.
Journal ArticleDOI
Epigenetic prediction of major depressive disorder
Miruna C. Barbu,Xueyi Shen,Rosie M. Walker,David M. Howard,David M. Howard,Kathryn L. Evans,Heather C. Whalley,David J. Porteous,Stewart W. Morris,Ian J. Deary,Yanni Zeng,Riccardo E. Marioni,Toni-Kim Clarke,Andrew M. McIntosh +13 more
TL;DR: Testing the association of MRS with 61 behavioural phenotypes found that whilst PRS were associated with psychosocial and mental health phenotypes, MRS were more strongly associated with lifestyle and sociodemographic factors.
Posted ContentDOI
Disease variants alter transcription factor levels and methylation of their binding sites
Marc Jan Bonder,René Luijk,Daria V. Zhernakova,Matthijs Moed,Patrick Deelen,Martijn Vermaat,Maarten van Iterson,Freerk van Dijk,Michiel van Galen,Jan Bot,Roderick C. Slieker,P. Mila Jhamai,Michael Verbiest,H. Eka D. Suchiman,Marijn Verkerk,Ruud van der Breggen,Jeroen van Rooij,Nico Lakenberg,Wibowo Arindrarto,Szymon M. Kielbasa,Iris Jonkers,Peter van ‘t Hof,Irene Nooren,Marian Beekman,Joris Deelen,Diana van Heemst,Alexandra Zhernakova,Ettje F. Tigchelaar,Morris A. Swertz,Albert Hofman,André G. Uitterlinden,René Pool,Jenny van Dongen,Jouke J. Hottenga,Coen D.A. Stehouwer,Carla J.H. van der Kallen,Casper G. Schalkwijk,Leonard H. van den Berg,Erik W. van Zwet,Hailiang Mei,Mathieu Lemire,Thomas J. Hudson,P. Eline Slagboom,Cisca Wijmenga,Jan H. Veldink,Marleen M.J. van Greevenbroek,Cornelia M. van Duijn,Dorret I. Boomsma,Aaron Isaacs,Aaron Isaacs,Rick Jansen,Joyce B. J. van Meurs,Peter A C 't Hoen,Lude Franke,Bastiaan T. Heijmans +54 more
TL;DR: It is shown that disease variants have wide-spread effects on DNA methylation in trans that likely reflect the downstream effects on binding sites of cis-regulated transcription factors.
References
More filters
Journal ArticleDOI
Bioconductor: open software development for computational biology and bioinformatics
Robert Gentleman,Vincent J. Carey,Douglas M. Bates,Benjamin M. Bolstad,Marcel Dettling,Sandrine Dudoit,Byron Ellis,Laurent Gautier,Yongchao Ge,Jeff Gentry,Kurt Hornik,Torsten Hothorn,Wolfgang Huber,Stefano Maria Iacus,Rafael A. Irizarry,Friedrich Leisch,Cheng Li,Martin Maechler,A. J. Rossini,Günther Sawitzki,Colin A. Smith,Gordon K. Smyth,Luke Tierney,Jean Yang,Jianhua Zhang +24 more
TL;DR: Details of the aims and methods of Bioconductor, the collaborative creation of extensible software for computational biology and bioinformatics, and current challenges are described.
Journal ArticleDOI
Exploration, normalization, and summaries of high density oligonucleotide array probe level data
Rafael A. Irizarry,Bridget G. Hobbs,Francois Collin,Yasmin Beazer-Barclay,Kristen J. Antonellis,Uwe Scherf,Terence P. Speed +6 more
TL;DR: There is no obvious downside to using RMA and attaching a standard error (SE) to this quantity using a linear model which removes probe-specific affinities, and the exploratory data analyses of the probe level data motivate a new summary measure that is a robust multi-array average (RMA) of background-adjusted, normalized, and log-transformed PM values.
Book ChapterDOI
limma: Linear Models for Microarray Data
TL;DR: This chapter starts with the simplest replicated designs and progresses through experiments with two or more groups, direct designs, factorial designs and time course experiments with technical as well as biological replication.
Book
Bioinformatics and Computational Biology Solutions Using R and Bioconductor
TL;DR: In this article, the authors present a detailed case study of R algorithms with publicly available data, and a major section of the book is devoted to fully worked case studies, with a companion website where readers can reproduce every number, figure and table on their own computers.
Journal ArticleDOI
Bioinformatics and Computational Biology Solutions Using R and Bioconductor
TL;DR: In this article, the authors present a Bioinformatics and Computational Biology Solutions Using R and Bioconductor (BIBOS) using R and BIBOS, which is a combination of R and CRF.