scispace - formally typeset

Posted ContentDOI

Bayesian reassessment of the epigenetic architecture of complex traits

22 Oct 2018-bioRxiv (Cold Spring Harbor Laboratory)-pp 450288

TL;DR: A statistical approach that estimates associations between disease risk and all measured epigenetic probes jointly, automatically controlling for both data structure and correlations among probes facilitates better understanding of the underlying epigenetic architecture of complex common disease and is applicable to any kind of genomics data.
Abstract: Epigenetic DNA modification is partly under genetic control, and occurs in response to a wide range of environmental exposures. Linking epigenetic marks to clinical outcomes may provide greater insight into underlying molecular processes of disease, assist in the identification of therapeutic targets, and improve risk prediction. Here, we present a statistical approach, based on Bayesian inference, that estimates associations between disease risk and all measured epigenetic probes jointly, automatically controlling for both data structure (including cell-count effects, relatedness, and experimental batch effects) and correlations among probes. We benchmark our approach in simulation study, finding improved estimation of probe associations across a wide range of scenarios over existing approaches. Our method estimates the total proportion of disease risk captured by epigenetic probe variation, and when we applied it to measures of body mass index (BMI) and cigarette consumption behaviour in 5,101 individuals, we find that 66.7% (95% CI 60.0-72.8) of the variation in BMI and 67.7% (95% CI 58.4-76.9) of the variation in cigarette consumption can be captured by methylation array data from whole blood, independent of the variation explained by single nucleotide polymorphism markers. We find novel associations, with smoking behaviour associated with a methylation probe at the MNDA gene with >95% posterior inclusion probability, which is a myeloid cell nuclear differentiation antigen gene previously implicated as a biomarker for inflammation and non-Hodgkin lymphoma risk. We conduct unique genome-wide enrichment analyses, identifying blood cholesterol, lipid transport and sterol metabolism pathways for BMI, and response to xenobiotic stimulus and negative regulation of RNA polymerase II promoter transcription for smoking, all with >95% posterior inclusion probability of having methylation probes with associations >1.5 times larger than the average. Finally, we improve phenotypic prediction in two independent cohorts by 28.7% and 10.2% for BMI and smoking respectively over a LASSO model. These results imply that probe measures may capture large amounts of variance because they are likely a consequence of the phenotype rather than a cause. As a result, omics data may enable accurate characterization of disease progression and identification of individuals who are on a path to disease. Our approach facilitates better understanding of the underlying epigenetic architecture of complex common disease and is applicable to any kind of genomics data.

Content maybe subject to copyright    Report

Author Correction: Bayesian reassessment of the
epigenetic architecture of complex traits
Daniel Trejo Banos , Daniel L. McCartney, Marion Patxot , Lucas Anchieri, Thomas Battram,
Colette Christiansen, Ricardo Costeira, Rosie M. Walker
, Stewart W. Morris, Archie Campbell , Qian Zhang,
David J. Porteous
, Allan F. McRae, Naomi R. Wray , Peter M. Visscher , Chris S. Haley , Kathryn L. Evans,
Ian J. Deary, Andrew M. McIntosh
, Gibran Hemani , Jordana T. Bell , Riccardo E. Marioni &
Matthew R. Robinson
Correction to: Nature Communications https://doi.org/10.1038/s41467-020-16520-1, published online 8 June 2020.
The original version of this Article contains an error in Fig. 3 in which panel B was inadvertently duplicated from panel A. This has
been corrected in both the PDF and HTML versions of the Article.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and
reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license,
and indicate if changes were made. The images or other third party material in this article are included in the articles Creative Commons license, unless indicated otherwise in a credit
line to the material. If material is not included in the articles Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use,
you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
© The Author(s) 2020
https://doi.org/10.1038/s41467-020-19099-9
OPEN
NATURE COMMUNICATIONS | (2020) 11:5186 | https://doi.org/10.1038/s41467-020-19099-9 | www.nature.com /naturecommunications 1
1234567890():,;
Citations
More filters

Journal ArticleDOI
TL;DR: This work has shown that genetic variants associated with intermediate traits, termed molecular quantitative trait loci (molQTLs), can be used as instrumental variables in a Mendelian randomization approach to identify the causal features and mechanisms of complex traits.
Abstract: Large genome-wide association studies (GWAS) have identified loci that are associated with complex traits and diseases, but index variants are often not causal and reside in non-coding regions of the genome. To gain a better understanding of the relevant biological mechanisms, intermediate traits such as gene expression and protein levels are increasingly being investigated because these are likely mediators between genetic variants and disease outcome. Genetic variants associated with intermediate traits, termed molecular quantitative trait loci (molQTLs), can then be used as instrumental variables in a Mendelian randomization (MR) approach to identify the causal features and mechanisms of complex traits. Challenges such as pleiotropy and the non-specificity of molQTLs remain, and further approaches and methods need to be developed.

15 citations


Journal ArticleDOI
Ryan Langdon1, Rhona Beynon1, Kate Ingarfield2, Kate Ingarfield1  +12 moreInstitutions (6)
TL;DR: In the context of a clinical cohort of individuals with OPC, DNAm predictors for smoking, alcohol consumption, educational attainment and BMI exhibit similar predictive values for all-cause mortality compared to self-reported data.
Abstract: DNA methylation (DNAm) variation is an established predictor for several traits. In the context of oropharyngeal cancer (OPC), where 5-year survival is ~ 65%, DNA methylation may act as a prognostic biomarker. We examined the accuracy of DNA methylation biomarkers of 4 complex exposure traits (alcohol consumption, body mass index [BMI], educational attainment and smoking status) in predicting all-cause mortality in people with OPC. DNAm predictors of alcohol consumption, BMI, educational attainment and smoking status were applied to 364 individuals with OPC in the Head and Neck 5000 cohort (HN5000; 19.6% of total OPC cases in the study), followed up for median 3.9 years; inter-quartile range (IQR) 3.3 to 5.2 years (time-to-event—death or censor). The proportion of phenotypic variance explained in each trait was as follows: 16.5% for alcohol consumption, 22.7% for BMI, 0.4% for educational attainment and 51.1% for smoking. We then assessed the relationship between each DNAm predictor and all-cause mortality using Cox proportional-hazard regression analysis. DNAm prediction of smoking was most consistently associated with mortality risk (hazard ratio [HR], 1.38 per standard deviation (SD) increase in smoking DNAm score; 95% confidence interval [CI] 1.04 to 1.83; P 0.025, in a model adjusted for demographic, lifestyle, health and biological variables). Finally, we examined the accuracy of each DNAm predictor of mortality. DNAm predictors explained similar levels of variance in mortality to self-reported phenotypes. Receiver operator characteristic (ROC) curves for the DNAm predictors showed a moderate discrimination of alcohol consumption (area under the curve [AUC] 0.63), BMI (AUC 0.61) and smoking (AUC 0.70) when predicting mortality. The DNAm predictor for education showed poor discrimination (AUC 0.57). Z tests comparing AUCs between self-reported phenotype ROC curves and DNAm score ROC curves did not show evidence for difference between the two (alcohol consumption P 0.41, BMI P 0.62, educational attainment P 0.49, smoking P 0.19). In the context of a clinical cohort of individuals with OPC, DNAm predictors for smoking, alcohol consumption, educational attainment and BMI exhibit similar predictive values for all-cause mortality compared to self-reported data. These findings may have translational utility in prognostic model development, particularly where phenotypic data are not available.

4 citations


Cites methods from "Bayesian reassessment of the epigen..."

  • ...[17] EWAS (MethylationEPIC) were conducted using a Bayesian framework....

    [...]

  • ...Where available, the Bayesian-derived DNAm risk scores for BMI and smoking [17] (BMI 24....

    [...]


Posted ContentDOI
19 Feb 2020-bioRxiv
Abstract: The molecular factors which control circulating levels of inflammatory proteins are not well understood. Furthermore, association studies between molecular probes and human traits are often performed by linear model-based methods which may fail to account for complex structure and interrelationships within molecular datasets. Therefore, in this study, we perform genome- and epigenome-wide association studies (GWAS/EWAS) on the levels of 70 plasma-derived inflammatory protein biomarkers in healthy older adults (Lothian Birth Cohort 1936; n = 876; Olink® inflammation panel). We employ a Bayesian framework (BayesR+) which can account for issues pertaining to data structure and unknown confounding variables (with sensitivity analyses using ordinary least squares- (OLS) and mixed model-based approaches). We identified 13 SNPs associated with 13 proteins (n = 1 SNP each) concordant across OLS and Bayesian methods. We identified three CpG sites spread across three proteins (n = 1 CpG each) that were concordant across OLS, mixed-model and Bayesian analyses. Tagged genetic variants accounted for up to 45% of variance in protein levels (for MCP2, 36% of variance alone attributable to one polymorphism). Methylation data accounted for up to 46% of variation in protein levels (for CXCL10). Up to 66% of variation in protein levels (for VEGFA) was explained using genetic and epigenetic data combined. We demonstrated putative causal relationships between CD6 and IL18R1 with inflammatory bowel disease, and between IL12B and Crohn’s disease. Our data may aid understanding of the molecular regulation of the circulating inflammatory proteome as well as causal relationships between inflammatory mediators and disease.

2 citations


Posted ContentDOI
Carmen Amador1, Yanni Zeng2, Yanni Zeng1, Rosie M. Walker1  +8 moreInstitutions (2)
09 Oct 2020-bioRxiv
TL;DR: The results indicate that exploiting omic measures as proxies for environmental variation can improve models for complex traits such as obesity and can be used as a substitute of environmental measures when they are not available or jointly to improve their accuracy.
Abstract: Variation in complex traits related to obesity, such as body weight and body mass index, has a genetic basis with heritabilities between 40 and 70%. Nonetheless, the so-called global obesity pandemic is usually associated with environmental changes related to diet, lifestyle, and sociocultural and socioeconomic changes. However, most genetic studies do not include all relevant environmental covariates so their contribution, alongside genetics, to variation in obesity-related traits can not be assessed. Similarly, some studies have described interactions between a few individual genes linked to obesity and different environmental variables but the total contribution to differences between individuals is unknown. In this study we explored the effect of smoking and gene-by-smoking interactions on obesity related traits from a genome-wide perspective to estimate the amount of variance they explain by modelling them using self-reported data and a proxy created using methylation data. Our results indicate that exploiting omic measures as proxies for environmental variation can improve our models for complex traits such as obesity and can be used as a substitute of environmental measures when they are not available or jointly to improve their accuracy.

1 citations


References
More filters

Journal ArticleDOI
TL;DR: A new method for estimation in linear models called the lasso, which minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant, is proposed.
Abstract: SUMMARY We propose a new method for estimation in linear models. The 'lasso' minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant. Because of the nature of this constraint it tends to produce some coefficients that are exactly 0 and hence gives interpretable models. Our simulation studies suggest that the lasso enjoys some of the favourable properties of both subset selection and ridge regression. It produces interpretable models like subset selection and exhibits the stability of ridge regression. There is also an interesting relationship with recent work in adaptive function estimation by Donoho and Johnstone. The lasso idea is quite general and can be applied in a variety of statistical models: extensions to generalized regression models and tree-based models are briefly described.

36,018 citations


Journal ArticleDOI
TL;DR: The focus is on applied inference for Bayesian posterior distributions in real problems, which often tend toward normal- ity after transformations and marginalization, and the results are derived as normal-theory approximations to exact Bayesian inference, conditional on the observed simulations.
Abstract: The Gibbs sampler, the algorithm of Metropolis and similar iterative simulation methods are potentially very helpful for summarizing multivariate distributions. Used naively, however, iterative simulation can give misleading answers. Our methods are simple and generally applicable to the output of any iterative simulation; they are designed for researchers primarily interested in the science underlying the data and models they are analyzing, rather than for researchers interested in the probability theory underlying the iterative simulations themselves. Our recommended strategy is to use several independent sequences, with starting points sampled from an overdispersed distribution. At each step of the iterative simulation, we obtain, for each univariate estimand of interest, a distributional estimate and an estimate of how much sharper the distributional estimate might become if the simulations were continued indefinitely. Because our focus is on applied inference for Bayesian posterior distributions in real problems, which often tend toward normality after transformations and marginalization, we derive our results as normal-theory approximations to exact Bayesian inference, conditional on the observed simulations. The methods are illustrated on a random-effects mixture model applied to experimental measurements of reaction times of normal and schizophrenic patients.

12,022 citations


"Bayesian reassessment of the epigen..." refers methods in this paper

  • ...We assessed 443 the convergence of the hyperparameters σ(2) , σ(2) G, σ 2 φ through the Geweke test [33] and the R̂ criteria [34], 444 with the help of the R package "ggmcmc" [35]....

    [...]


Journal ArticleDOI
TL;DR: In comparative timings, the new algorithms are considerably faster than competing methods and can handle large problems and can also deal efficiently with sparse features.
Abstract: We develop fast algorithms for estimation of generalized linear models with convex penalties. The models include linear regression, two-class logistic regression, and multinomial regression problems while the penalties include l(1) (the lasso), l(2) (ridge regression) and mixtures of the two (the elastic net). The algorithms use cyclical coordinate descent, computed along a regularization path. The methods can handle large problems and can also deal efficiently with sparse features. In comparative timings we find that the new algorithms are considerably faster than competing methods.

10,799 citations


"Bayesian reassessment of the epigen..." refers methods in this paper

  • ...We ran LASSO and ridge 404 regression with 10-fold cross validation using the default settings of package glmnet [18] version 2....

    [...]

  • ...90 This is further evidenced by comparing LASSO and ridge regression with latent factors implemented in 91 LFMM [17], to LASSO and ridge regression without latent factors as implemented in glmnet [18], where we 92 find that that power is increased, phenotype-probe associations are better estimated, and cell-type confounding 93 is controlled by the models that do not fit latent factors (fig....

    [...]

  • ...We compared out approach to LASSO and Ridge 421 regression implemented in glmnet [18], with a baseline of single marker regression (GWAS) where we first 422 adjusted the phenotype by the first ten principal components of the genotype matrix and then regressed the 423 residuals against the scaled methylation matrix....

    [...]


Journal ArticleDOI
Rudolf Jaenisch1, Adrian Bird2Institutions (2)
TL;DR: Advances in the understanding of the mechanism and role of DNA methylation in biological processes are reviewed, showing that epigenetic mechanisms seem to allow an organism to respond to the environment through changes in gene expression.
Abstract: Cells of a multicellular organism are genetically homogeneous but structurally and functionally heterogeneous owing to the differential expression of genes. Many of these differences in gene expression arise during development and are subsequently retained through mitosis. Stable alterations of this kind are said to be 'epigenetic', because they are heritable in the short term but do not involve mutations of the DNA itself. Research over the past few years has focused on two molecular mechanisms that mediate epigenetic phenomena: DNA methylation and histone modifications. Here, we review advances in the understanding of the mechanism and role of DNA methylation in biological processes. Epigenetic effects by means of DNA methylation have an important role in development but can also arise stochastically as animals age. Identification of proteins that mediate these effects has provided insight into this complex process and diseases that occur when it is perturbed. External influences on epigenetic processes are seen in the effects of diet on long-term diseases such as cancer. Thus, epigenetic mechanisms seem to allow an organism to respond to the environment through changes in gene expression. The extent to which environmental effects can provoke epigenetic responses represents an exciting area of future research.

5,377 citations


"Bayesian reassessment of the epigen..." refers background in this paper

  • ...Epigenetic marks reflect 3 a wide range of environmental exposures and genetic influences, are critical for regulating gene and non- 4 coding RNA expression [1], and have been shown to be associated either as a cause or consequence with 5 disease [2]....

    [...]


Journal ArticleDOI
18 Jul 2011-PLOS ONE
TL;DR: REVIGO is a Web server that summarizes long, unintelligible lists of GO terms by finding a representative subset of the terms using a simple clustering algorithm that relies on semantic similarity measures.
Abstract: Outcomes of high-throughput biological experiments are typically interpreted by statistical testing for enriched gene functional categories defined by the Gene Ontology (GO). The resulting lists of GO terms may be large and highly redundant, and thus difficult to interpret. REVIGO is a Web server that summarizes long, unintelligible lists of GO terms by finding a representative subset of the terms using a simple clustering algorithm that relies on semantic similarity measures. Furthermore, REVIGO visualizes this non-redundant GO term set in multiple ways to assist in interpretation: multidimensional scaling and graph-based visualizations accurately render the subdivisions and the semantic relationships in the data, while treemaps and tag clouds are also offered as alternative views. REVIGO is freely available at http://revigo.irb.hr/.

3,799 citations


"Bayesian reassessment of the epigen..." refers methods in this paper

  • ...We sorted significantly enriched terms by their mean enrichment and generated a tree map of the terms using REVIGO [22]....

    [...]

  • ...We sorted significantly enriched terms by their mean enrichment and generated a tree 185 map of the terms using REVIGO [22]....

    [...]