scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Epigenome-wide association data implicate DNA methylation as an intermediary of genetic risk in rheumatoid arthritis

TL;DR: DNA methylation is a potential mediator of genetic risk for rheumatoid arthritis and is corrected for cellular heterogeneity by estimating and adjusting for cell-type proportions in blood-derived DNA samples and used mediation analysis to filter out associations likely to be a consequence of disease.
Abstract: Epigenetic mechanisms integrate genetic and environmental causes of disease, but comprehensive genome-wide analyses of epigenetic modifications have not yet demonstrated robust association with common diseases. Using Illumina HumanMethylation450 arrays on 354 anti-citrullinated protein antibody-associated rheumatoid arthritis cases and 337 controls, we identified two clusters within the major histocompatibility complex (MHC) region whose differential methylation potentially mediates genetic risk for rheumatoid arthritis. To reduce confounding factors that have hampered previous epigenome-wide studies, we corrected for cellular heterogeneity by estimating and adjusting for cell-type proportions in our blood-derived DNA samples and used mediation analysis to filter out associations likely to be a consequence of disease. Four CpGs also showed an association between genotype and variance of methylation. The associations for both clusters replicated at least one CpG (P < 0.01), with the rest showing suggestive association, in monocyte cell fractions in an independent cohort of 12 cases and 12 controls. Thus, DNA methylation is a potential mediator of genetic risk.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: It is proposed that DNA methylation age measures the cumulative effect of an epigenetic maintenance system, and can be used to address a host of questions in developmental biology, cancer and aging research.
Abstract: It is not yet known whether DNA methylation levels can be used to accurately predict age across a broad spectrum of human tissues and cell types, nor whether the resulting age prediction is a biologically meaningful measure. I developed a multi-tissue predictor of age that allows one to estimate the DNA methylation age of most tissues and cell types. The predictor, which is freely available, was developed using 8,000 samples from 82 Illumina DNA methylation array datasets, encompassing 51 healthy tissues and cell types. I found that DNA methylation age has the following properties: first, it is close to zero for embryonic and induced pluripotent stem cells; second, it correlates with cell passage number; third, it gives rise to a highly heritable measure of age acceleration; and, fourth, it is applicable to chimpanzee tissues. Analysis of 6,000 cancer samples from 32 datasets showed that all of the considered 20 cancer types exhibit significant age acceleration, with an average of 36 years. Low age-acceleration of cancer tissue is associated with a high number of somatic mutations and TP53 mutations, while mutations in steroid receptors greatly accelerate DNA methylation age in breast cancer. Finally, I characterize the 353 CpG sites that together form an aging clock in terms of chromatin states and tissue variance. I propose that DNA methylation age measures the cumulative effect of an epigenetic maintenance system. This novel epigenetic clock can be used to address a host of questions in developmental biology, cancer and aging research.

4,233 citations


Cites background from "Epigenome-wide association data imp..."

  • ...Data set 44 consists of whole blood from [64]....

    [...]

Journal ArticleDOI
TL;DR: All of the major steps in RNA-seq data analysis are reviewed, including experimental design, quality control, read alignment, quantification of gene and transcript levels, visualization, differential gene expression, alternative splicing, functional analysis, gene fusion detection and eQTL mapping.
Abstract: RNA-sequencing (RNA-seq) has a wide variety of applications, but no single analysis pipeline can be used in all cases. We review all of the major steps in RNA-seq data analysis, including experimental design, quality control, read alignment, quantification of gene and transcript levels, visualization, differential gene expression, alternative splicing, functional analysis, gene fusion detection and eQTL mapping. We highlight the challenges associated with each step. We discuss the analysis of small RNAs and the integration of RNA-seq with other functional genomics techniques. Finally, we discuss the outlook for novel technologies that are changing the state of the art in transcriptomics.

1,963 citations


Cites methods from "Epigenome-wide association data imp..."

  • ...General linear models [141–143], logistic regression models [143] and empirical Bayes model [144] have been attempted among other modeling approaches....

    [...]

Journal ArticleDOI
TL;DR: This work examines data from five previously published studies, and finds strong evidence of cell composition change across age in blood, and demonstrates that, in these studies, cellular composition explains much of the observed variability in DNA methylation.
Abstract: Epigenome-wide association studies of human disease and other quantitative traits are becoming increasingly common. A series of papers reporting age-related changes in DNA methylation profiles in peripheral blood have already been published. However, blood is a heterogeneous collection of different cell types, each with a very different DNA methylation profile. Using a statistical method that permits estimating the relative proportion of cell types from DNA methylation profiles, we examine data from five previously published studies, and find strong evidence of cell composition change across age in blood. We also demonstrate that, in these studies, cellular composition explains much of the observed variability in DNA methylation. Furthermore, we find high levels of confounding between age-related variability and cellular composition at the CpG level. Our findings underscore the importance of considering cell composition variability in epigenetic studies based on whole blood and other heterogeneous tissue sources. We also provide software for estimating and exploring this composition confounding for the Illumina 450k microarray.

920 citations


Cites background or methods from "Epigenome-wide association data imp..."

  • ...With many of these studies completed, few disease-associated loci have been reported outside of cancer [3], type 1 diabetes [4], and rheumatoid arthritis [5]....

    [...]

  • ...There were five publicly available datasets on the Illumina 450k platform [5,8,10,11,26] performed on blood samples in the Gene Expression Omnibus (GEO) available through the National Center for Biotechnology Information (NCBI) as of February 2013 [27]....

    [...]

  • ...A simple linear regression model including the cell composition percentages as covariates has been suggested as a way to adjust for the confounding [5]....

    [...]

Journal ArticleDOI
TL;DR: The emerging approaches for data integration — including meta-dimensional and multi-staged analyses — which aim to deepen the understanding of the role of genetics and genomics in complex outcomes are explored.
Abstract: Recent technological advances have expanded the breadth of available omic data, from whole-genome sequencing data, to extensive transcriptomic, methylomic and metabolomic data. A key goal of analyses of these data is the identification of effective models that predict phenotypic traits and outcomes, elucidating important biomarkers and generating important insights into the genetic underpinnings of the heritability of complex traits. There is still a need for powerful and advanced analysis strategies to fully harness the utility of these comprehensive high-throughput data, identifying true associations and reducing the number of false associations. In this Review, we explore the emerging approaches for data integration - including meta-dimensional and multi-staged analyses - which aim to deepen our understanding of the role of genetics and genomics in complex outcomes. With the use and further development of these approaches, an improved understanding of the relationship between genomic variation and human phenotypes may be revealed.

825 citations

Journal ArticleDOI
TL;DR: The analyses suggest that DNA methylation changes may have a role in the onset of AD given that they were observed in presymptomatic subjects and that six of the validated genes connect to a known AD susceptibility gene network.
Abstract: We used a collection of 708 prospectively collected autopsied brains to assess the methylation state of the brain's DNA in relation to Alzheimer's disease (AD). We found that the level of methylation at 71 of the 415,848 interrogated CpGs was significantly associated with the burden of AD pathology, including CpGs in the ABCA7 and BIN1 regions, which harbor known AD susceptibility variants. We validated 11 of the differentially methylated regions in an independent set of 117 subjects. Furthermore, we functionally validated these CpG associations and identified the nearby genes whose RNA expression was altered in AD: ANK1, CDH23, DIP2A, RHBDF2, RPL13, SERPINF1 and SERPINF2. Our analyses suggest that these DNA methylation changes may have a role in the onset of AD given that we observed them in presymptomatic subjects and that six of the validated genes connect to a known AD susceptibility gene network.

720 citations


Cites background from "Epigenome-wide association data imp..."

  • ...It is too early to confidently differentiate between three possibilities that could explain these modest but robust changes in methylation that occur in relation to AD pathology: (1) a fraction of the constituent cortical cells are changing, such as activated astrocytes in the vicinity of neuritic plaques that overexpress CDH23, (2) the relative proportion of the constituent cell populations of the cortex is changing as some populations such as neurons are lost, or (3) there is a modest influx of immune cells from the systemic circulation that alters the relative abundance of the different cortical cell populations....

    [...]

References
More filters
Book
17 Jan 2008
TL;DR: In this paper, the authors introduce the statistical, methodological, and conceptual aspects of mediation analysis applications from health, social, and developmental psychology, sociology, communication, exercise science, and epidemiology are emphasized throughout Singlemediator, multilevel, and longitudinal models are reviewed.
Abstract: This volume introduces the statistical, methodological, and conceptual aspects of mediation analysis Applications from health, social, and developmental psychology, sociology, communication, exercise science, and epidemiology are emphasized throughout Single-mediator, multilevel, and longitudinal models are reviewed The author's goal is to help the reader apply mediation analysis to their own data and understand its limitations Each chapter features an overview, numerous worked examples, a summary, and exercises (with answers to the odd numbered questions) The accompanying CD contains outputs described in the book from SAS, SPSS, LISREL, EQS, MPLUS, and CALIS, and a program to simulate the model The notation used is consistent with existing literature on mediation in psychology The book opens with a review of the types of research questions the mediation model addresses Part II describes the estimation of mediation effects including assumptions, statistical tests, and the construction of confidence limits Advanced models including mediation in path analysis, longitudinal models, multilevel data, categorical variables, and mediation in the context of moderation are then described The book closes with a discussion of the limits of mediation analysis, additional approaches to identifying mediating variables, and future directions Introduction to Statistical Mediation Analysis is intended for researchers and advanced students in health, social, clinical, and developmental psychology as well as communication, public health, nursing, epidemiology, and sociology Some exposure to a graduate level research methods or statistics course is assumed The overview of mediation analysis and the guidelines for conducting a mediation analysis will be appreciated by all readers

4,473 citations

Journal ArticleDOI
TL;DR: This work presents a method, similar to regression calibration, for inferring changes in the distribution of white blood cells between different subpopulations using DNA methylation signatures, in combination with a previously obtained external validation set consisting of signatures from purified leukocyte samples.
Abstract: Background: There has been a long-standing need in biomedical research for a method that quantifies the normally mixed composition of leukocytes beyond what is possible by simple histological or flow cytometric assessments. The latter is restricted by the labile nature of protein epitopes, requirements for cell processing, and timely cell analysis. In a diverse array of diseases and following numerous immune-toxic exposures, leukocyte composition will critically inform the underlying immuno-biology to most chronic medical conditions. Emerging research demonstrates that DNA methylation is responsible for cellular differentiation, and when measured in whole peripheral blood, serves to distinguish cancer cases from controls. Results: Here we present a method, similar to regression calibration, for inferring changes in the distribution of white blood cells between different subpopulations (e.g. cases and controls) using DNA methylation signatures, in combination with a previously obtained external validation set consisting of signatures from purified leukocyte samples. We validate the fundamental idea in a cell mixture reconstruction experiment, then demonstrate our method on DNA methylation data sets from several studies, including data from a Head and Neck Squamous Cell Carcinoma (HNSCC) study and an ovarian cancer study. Our method produces results consistent with prior biological findings, thereby validating the approach. Conclusions: Our method, in combination with an appropriate external validation set, promises new opportunities for large-scale immunological studies of both disease states and noxious exposures.

2,431 citations

Journal ArticleDOI
TL;DR: This timeline traces the field from its conception to the present day and addresses the genetic basis of epigenetic changes — an emerging area that promises to unite cancer genetics and epigenetics, and might serve as a model for understanding the epigenetic basis of human disease more generally.
Abstract: Since its discovery in 1983, the epigenetics of human cancer has been in the shadows of human cancer genetics. But this area has become increasingly visible with a growing understanding of specific epigenetic mechanisms and their role in cancer, including hypomethylation, hypermethylation, loss of imprinting and chromatin modification. This timeline traces the field from its conception to the present day. It also addresses the genetic basis of epigenetic changes — an emerging area that promises to unite cancer genetics and epigenetics, and might serve as a model for understanding the epigenetic basis of human disease more generally.

2,240 citations


"Epigenome-wide association data imp..." refers background in this paper

  • ...This may be due in part to several limitation to such studies including (1) the cellular heterogeneity of the sample material, and (2) the potential for methylation changes that are a consequence of disease rather than part of the etiology Here, we apply a series of ad hoc filtering steps that address these issues to identify CpG methylation that likely mediates genetic risk for rheumatoid arthritis (RA) from genome-wide epigenetic and genetic data....

    [...]

Journal ArticleDOI
TL;DR: This work introduces “surrogate variable analysis” (SVA) to overcome the problems caused by heterogeneity in expression studies and shows that SVA increases the biological accuracy and reproducibility of analyses in genome-wide expression studies.
Abstract: It has unambiguously been shown that genetic, environmental, demographic, and technical factors may have substantial effects on gene expression levels. In addition to the measured variable(s) of interest, there will tend to be sources of signal due to factors that are unknown, unmeasured, or too complicated to capture through simple models. We show that failing to incorporate these sources of heterogeneity into an analysis can have widespread and detrimental effects on the study. Not only can this reduce power or induce unwanted dependence across genes, but it can also introduce sources of spurious signal to many genes. This phenomenon is true even for well-designed, randomized studies. We introduce “surrogate variable analysis” (SVA) to overcome the problems caused by heterogeneity in expression studies. SVA can be applied in conjunction with standard analysis techniques to accurately capture the relationship between expression and any modeled variables of interest. We apply SVA to disease class, time course, and genetics of gene expression studies. We show that SVA increases the biological accuracy and reproducibility of analyses in genome-wide expression studies.

1,779 citations

Journal ArticleDOI
TL;DR: It is argued that batch effects (as well as other technical and biological artefacts) are widespread and critical to address and experimental and computational approaches for doing so are reviewed.
Abstract: High-throughput technologies are widely used, for example to assay genetic variants, gene and protein expression, and epigenetic modifications. One often overlooked complication with such studies is batch effects, which occur because measurements are affected by laboratory conditions, reagent lots and personnel differences. This becomes a major problem when batch effects are correlated with an outcome of interest and lead to incorrect conclusions. Using both published studies and our own analyses, we argue that batch effects (as well as other technical and biological artefacts) are widespread and critical to address. We review experimental and computational approaches for doing so.

1,768 citations