scispace - formally typeset
Search or ask a question

Showing papers by "Laura M. Raffield published in 2022"


Journal ArticleDOI
TL;DR: In this article , the authors show that common single-nucleotide polymorphisms (SNPs) are predicted to collectively explain 40-50% of phenotypic variation in human height, but identifying the specific variants and associated regions requires huge sample sizes.
Abstract: Common single-nucleotide polymorphisms (SNPs) are predicted to collectively explain 40-50% of phenotypic variation in human height, but identifying the specific variants and associated regions requires huge sample sizes1. Here, using data from a genome-wide association study of 5.4 million individuals of diverse ancestries, we show that 12,111 independent SNPs that are significantly associated with height account for nearly all of the common SNP-based heritability. These SNPs are clustered within 7,209 non-overlapping genomic segments with a mean size of around 90 kb, covering about 21% of the genome. The density of independent associations varies across the genome and the regions of increased density are enriched for biologically relevant genes. In out-of-sample estimation and prediction, the 12,111 SNPs (or all SNPs in the HapMap 3 panel2) account for 40% (45%) of phenotypic variance in populations of European ancestry but only around 10-20% (14-24%) in populations of other ancestries. Effect sizes, associated regions and gene prioritization are similar across ancestries, indicating that reduced prediction accuracy is likely to be explained by linkage disequilibrium and differences in allele frequency within associated regions. Finally, we show that the relevant biological pathways are detectable with smaller sample sizes than are needed to implicate causal genes and variants. Overall, this study provides a comprehensive map of specific genomic regions that contain the vast majority of common height-associated variants. Although this map is saturated for populations of European ancestry, further research is needed to achieve equivalent saturation in other ancestries.

110 citations


Journal ArticleDOI
TL;DR: The results imply that rare variants, in particular those in regions of low linkage disequilibrium, are a major source of the still missing heritability of complex traits and disease.

72 citations


Journal ArticleDOI
Tetsushi Nakao, Alexander G. Bick, Margaret A. Taub, Seyedeh M. Zekavat, Mesbah Uddin, Abhishek Niroula, Cara L. Carty, John Lane, Michael C. Honigberg, Joshua S. Weinstock, Akhil Pampana, Christopher J. Gibson, Gabriel K. Griffin, Shoa L. Clarke, Romit Bhattacharya, Themistocles L. Assimes, Leslie S. Emery, Adrienne M. Stilp, Quenna Wong, Jai G. Broome, Cecelia A. Laurie, Alyna T. Khan, Thomas W. Blackwell, Veryan Codd, Christopher P. Nelson, Zachary T. Yoneda, Juan M. Peralta, Donald W. Bowden, Marguerite R. Irvin, Meher Preethi Boorgula, Wei Zhao, Lisa R. Yanek, Kerri L. Wiggins, James E. Hixson, C. Charles Gu, Gina M. Peloso, Dan M. Roden, Muagututi‘a Sefuiva Reupena, Chii-Min Hwu, Dawn L. DeMeo, Kari E. North, Shannon Jeanne Kelly, Solomon K. Musani, Joshua C. Bis, Donald M. Lloyd-Jones, Jill M. Johnsen, Michael Preuss, Russell P. Tracy, Patricia A. Peyser, Dandi Qiao, Pinkal Desai, Joanne E. Curran, Barry I. Freedman, Hemant K. Tiwari, Sameer Chavan, Jennifer A. Smith, Nicholas L. Smith, Tanika N. Kelly, Bertha Hidalgo, L. Adrienne Cupples, Daniel E. Weeks, Nicola L. Hawley, Ryan L. Minster, Ranjan Deka, Take Toleafoa Naseri, L. de las Fuentes, Laura M. Raffield, Alanna C. Morrison, Paul S. de Vries, Christie M. Ballantyne, Eimear E. Kenny, Stephen S. Rich, Eric A. Whitsel, Michael Chopp, M. Benjamin Shoemaker, Betty S. Pace, John Blangero, Nicholette D. Palmer, Braxton D. Mitchell, Alan R. Shuldiner, Kathleen C. Barnes, Susan Redline, Sharon L.R. Kardia, Gonçalo R. Abecasis, Lewis C. Becker, Susan R. Heckbert, Jiang He, Wendy S. Post, Donna K. Arnett, Ramachandran S. Vasan, Dawood Darbar, Scott T. Weiss, Stephen T. McGarvey, Mariza de Andrade, Yii-Der Ida Chen, Robert C. Kaplan, Deborah A. Meyers, Brian Custer, Adolfo Correa, Bruce M. Psaty, Myriam Fornage, JoAnn E. Manson, Eric Boerwinkle, Barbara A. Konkle, Ruth J. F. Loos, Jerome I. Rotter, Edwin K. Silverman, Charles Kooperberg, John Danesh, Nilesh J. Samani, Siddhartha Jaiswal, Peter Libby, Patrick T. Ellinor, Nathan Pankratz, Benjamin L. Ebert, Alexander P. Reiner, Rasika A. Mathias, Ron Do, Pradeep Natarajan 
TL;DR: The relationship between CHIP, LTL, and CAD in the Trans-Omics for Precision Medicine (TOPMed) program and UK Biobank is investigated to promote an understanding of potential causal relationships across CHIP and LTL toward prevention of CAD.
Abstract: Human genetic studies support an inverse causal relationship between leukocyte telomere length (LTL) and coronary artery disease (CAD), but directionally mixed effects for LTL and diverse malignancies. Clonal hematopoiesis of indeterminate potential (CHIP), characterized by expansion of hematopoietic cells bearing leukemogenic mutations, predisposes both hematologic malignancy and CAD. TERT (which encodes telomerase reverse transcriptase) is the most significantly associated germline locus for CHIP in genome-wide association studies. Here, we investigated the relationship between CHIP, LTL, and CAD in the Trans-Omics for Precision Medicine (TOPMed) program (n = 63,302) and UK Biobank (n = 47,080). Bidirectional Mendelian randomization studies were consistent with longer genetically imputed LTL increasing propensity to develop CHIP, but CHIP then, in turn, hastens to shorten measured LTL (mLTL). We also demonstrated evidence of modest mediation between CHIP and CAD by mLTL. Our data promote an understanding of potential causal relationships across CHIP and LTL toward prevention of CAD.

23 citations


Journal ArticleDOI
Margaret A. Taub, Matthew P. Conomos, Rebecca Keener, Kruthika R. Iyer, Joshua S. Weinstock, Lisa R. Yanek, John Lane, Tyne W. Miller-Fleming, Jennifer A. Brody, Laura M. Raffield, Caitlin P. McHugh, Deepti Jain, Stephanie M. Gogarten, Cecelia A. Laurie, Ali R. Keramati, Marios Arvanitis, Benjamin D. Heavner, Lucas Barwick, Lewis C. Becker, Joshua C. Bis, John Blangero, Eugene R. Bleecker, Esteban G. Burchard, Juan C. Celedón, Yen Pei C. Chang, Brian Custer, Dawood Darbar, L. de las Fuentes, Dawn L. DeMeo, Barry I. Freedman, Melanie E. Garrett, Mark T. Gladwin, Susan R. Heckbert, Bertha Hidalgo, Marguerite R. Irvin, Talat Islam, W. Craig Johnson, Stefan Kaab, Lenore J. Launer, Jiwon Lee, Simin Liu, Arden Moscati, Kari E. North, Patricia A. Peyser, Nicholas Rafaels, Christine E. Seidman, Daniel E. Weeks, Fayun Wen, Marsha M. Wheeler, L. Keoki Williams, Ivana V. Yang, Wei Zhao, Stella Aslibekyan, Paul L. Auer, Donald W. Bowden, Brian E. Cade, Zhanghua Chen, Michael Chopp, L. Adrienne Cupples, Joanne E. Curran, Michelle Daya, Ranjan Deka, Celeste Eng, Tasha E. Fingerlin, Xiuqing Guo, Lifang Hou, Shih-Jen Hwang, Jill M. Johnsen, Eimear E. Kenny, Albert M. Levin, Chunyu Liu, Ryan L. Minster, Take Naseri, Mehdi Nouraie, Muagututi‘a Sefuiva Reupena, Ester Cerdeira Sabino, Jennifer A. Smith, Nicholas L. Smith, Jessica Lasky-Su, James G. Taylor, Marilyn J. Telen, Hemant K. Tiwari, Russell P. Tracy, Marquitta J. White, Yingze Zhang, Kerri L. Wiggins, Scott T. Weiss, Ramachandran S. Vasan, Kent D. Taylor, Moritz F. Sinner, Edwin K. Silverman, M. Benjamin Shoemaker, Wayne H-H Sheu, Frank C. Sciurba, David A. Schwartz, Jerome I. Rotter, Dan Roden, Susan Redline, Benjamin A. Raby, Bruce M. Psaty, Juan M. Peralta, Nicholette D. Palmer, Sergei Nekhai, Courtney G. Montgomery, Braxton D. Mitchell, Deborah A. Meyers, Stephen T. McGarvey, Angel C.Y. Mak, Ruth J. F. Loos, Rajesh P C Kumar, Charles Kooperberg, Barbara A. Konkle, Shannon Jeanne Kelly, Sharon L.R. Kardia, Robert Kaplan, Jiang He, Hongsheng Gui, Frank D. Gilliland, Bruce D. Gelb, Myriam Fornage, Patrick T. Ellinor, Mariza de Andrade, Adolfo Correa, Yii-Der Ida Chen, Eric Boerwinkle, Kathleen C. Barnes, Allison E. Ashley-Koch, Donna K. Arnett, Christine A Albert, Cathy C. Laurie, Gonçalo R. Abecasis, Deborah A. Nickerson, James F. Wilson, Stephen S. Rich, Daniel Levy, Ingo Ruczinski, Abraham Aviv, Thomas W. Blackwell, Timothy A. Thornton, Jeffrey R. O'Connell, Nancy J. Cox, James A. Perry, Mary Armanios, Alexis Battle, Nathan Pankratz, Alexander P. Reiner, Rasika A. Mathias 
TL;DR: This article reported the first sequencing-based association study for TL across ancestrally-diverse individuals (European, African, Asian and Hispanic/Latino) from the NHLBI Trans-Omics for Precision Medicine (TOPMed) program.
Abstract: Genetic studies on telomere length are important for understanding age-related diseases. Prior GWAS for leukocyte TL have been limited to European and Asian populations. Here, we report the first sequencing-based association study for TL across ancestrally-diverse individuals (European, African, Asian and Hispanic/Latino) from the NHLBI Trans-Omics for Precision Medicine (TOPMed) program. We used whole genome sequencing (WGS) of whole blood for variant genotype calling and the bioinformatic estimation of telomere length in n=109,122 individuals. We identified 59 sentinel variants (p-value <5×10-9) in 36 loci associated with telomere length, including 20 newly associated loci (13 were replicated in external datasets). There was little evidence of effect size heterogeneity across populations. Fine-mapping at OBFC1 indicated the independent signals colocalized with cell-type specific eQTLs for OBFC1 (STN1). Using a multi-variant gene-based approach, we identified two genes newly implicated in telomere length, DCLRE1B (SNM1B) and PARN. In PheWAS, we demonstrated our TL polygenic trait scores (PTS) were associated with increased risk of cancer-related phenotypes.

22 citations


Journal ArticleDOI
TL;DR: In this article , the authors used an ensemble method of SNP selection followed by gradient boosted trees (XGBoost) to allow for non-linearities and interaction effects between single nucleotide polymorphisms (SNPs).
Abstract: Polygenic risk scores (PRS) are commonly used to quantify the inherited susceptibility for a trait, yet they fail to account for non-linear and interaction effects between single nucleotide polymorphisms (SNPs). We address this via a machine learning approach, validated in nine complex phenotypes in a multi-ancestry population. We use an ensemble method of SNP selection followed by gradient boosted trees (XGBoost) to allow for non-linearities and interaction effects. We compare our results to the standard, linear PRS model developed using PRSice, LDpred2, and lassosum2. Combining a PRS as a feature in an XGBoost model results in a relative increase in the percentage variance explained compared to the standard linear PRS model by 22% for height, 27% for HDL cholesterol, 43% for body mass index, 50% for sleep duration, 58% for systolic blood pressure, 64% for total cholesterol, 66% for triglycerides, 77% for LDL cholesterol, and 100% for diastolic blood pressure. Multi-ancestry trained models perform similarly to specific racial/ethnic group trained models and are consistently superior to the standard linear PRS models. This work demonstrates an effective method to account for non-linearities and interaction effects in genetics-based prediction models.

17 citations


Journal ArticleDOI
TL;DR: The TOP-LD tool as mentioned in this paper is an online tool to explore LD inferred with high-coverage (∼30×) WGS data from 15,578 individuals in the NHLBI Trans-Omics for Precision Medicine (TOPMed) program.
Abstract: Current publicly available tools that allow rapid exploration of linkage disequilibrium (LD) between markers (e.g., HaploReg and LDlink) are based on whole-genome sequence (WGS) data from 2,504 individuals in the 1000 Genomes Project. Here, we present TOP-LD, an online tool to explore LD inferred with high-coverage (∼30×) WGS data from 15,578 individuals in the NHLBI Trans-Omics for Precision Medicine (TOPMed) program. TOP-LD provides a significant upgrade compared to current LD tools, as the TOPMed WGS data provide a more comprehensive representation of genetic variation than the 1000 Genomes data, particularly for rare variants and in the specific populations that we analyzed. For example, TOP-LD encompasses LD information for 150.3, 62.2, and 36.7 million variants for European, African, and East Asian ancestral samples, respectively, offering 2.6- to 9.1-fold increase in variant coverage compared to HaploReg 4.0 or LDlink. In addition, TOP-LD includes tens of thousands of structural variants (SVs). We demonstrate the value of TOP-LD in fine-mapping at the GGT1 locus associated with gamma glutamyltransferase in the African ancestry participants in UK Biobank. Beyond fine-mapping, TOP-LD can facilitate a wide range of applications that are based on summary statistics and estimates of LD. TOP-LD is freely available online.

15 citations


Journal ArticleDOI
TL;DR: In this paper , the authors derived GWAS for annual eGFR-decline and meta-analyzed 62 longitudinal studies with EGFR assessed twice over time in all 343,339 individuals and in high-risk groups.

13 citations


Journal ArticleDOI
TL;DR: This paper conducted genome-wide association meta-analyses for estimated glomerular filtration rate (GFR) based on serum creatinine (eGFR), separately for individuals with or without DM (nDM = 178,691, nnoDM = 1,296,113).
Abstract: Reduced glomerular filtration rate (GFR) can progress to kidney failure. Risk factors include genetics and diabetes mellitus (DM), but little is known about their interaction. We conducted genome-wide association meta-analyses for estimated GFR based on serum creatinine (eGFR), separately for individuals with or without DM (nDM = 178,691, nnoDM = 1,296,113). Our genome-wide searches identified (i) seven eGFR loci with significant DM/noDM-difference, (ii) four additional novel loci with suggestive difference and (iii) 28 further novel loci (including CUBN) by allowing for potential difference. GWAS on eGFR among DM individuals identified 2 known and 27 potentially responsible loci for diabetic kidney disease. Gene prioritization highlighted 18 genes that may inform reno-protective drug development. We highlight the existence of DM-only and noDM-only effects, which can inform about the target group, if respective genes are advanced as drug targets. Largely shared effects suggest that most drug interventions to alter eGFR should be effective in DM and noDM.

10 citations


Journal ArticleDOI
TL;DR: The authors performed a whole genome association study of 2,291 metabolite peaks (known and unknown features) in 2,466 Black individuals from the Jackson Heart Study and identified 519 locus-metabolite associations for 427 metabolites and validated their findings in two multi-ethnic cohorts.
Abstract: Integrating genetic information with metabolomics has provided new insights into genes affecting human metabolism. However, gene-metabolite integration has been primarily studied in individuals of European Ancestry, limiting the opportunity to leverage genomic diversity for discovery. In addition, these analyses have principally involved known metabolites, with the majority of the profiled peaks left unannotated. Here, we perform a whole genome association study of 2,291 metabolite peaks (known and unknown features) in 2,466 Black individuals from the Jackson Heart Study. We identify 519 locus-metabolite associations for 427 metabolite peaks and validate our findings in two multi-ethnic cohorts. A significant proportion of these associations are in ancestry specific alleles including findings in APOE, TTR and CD36. We leverage tandem mass spectrometry to annotate unknown metabolites, providing new insight into hereditary diseases including transthyretin amyloidosis and sickle cell disease. Our integrative omics approach leverages genomic diversity to provide novel insights into diverse cardiometabolic diseases.

9 citations


Journal ArticleDOI
13 Jan 2022
TL;DR: In this article, a systematic variant-to-function study is presented, prioritizing the most likely functional elements of the genome for experimental follow-up, for >148,000 variants identified for hematological traits.
Abstract: Summary Genome-wide association studies (GWASs) have identified hundreds of thousands of genetic variants associated with complex diseases and traits. However, most variants are noncoding and not clearly linked to genes, making it challenging to interpret these GWAS signals. We present a systematic variant-to-function study, prioritizing the most likely functional elements of the genome for experimental follow-up, for >148,000 variants identified for hematological traits. Specifically, we developed VAMPIRE: Variant Annotation Method Pointing to Interesting Regulatory Effects, an interactive web application implemented in R Shiny. This tool efficiently integrates and displays information from multiple complementary sources, including epigenomic signatures from blood-cell-relevant tissues or cells, functional and conservation summary scores, variant impact on protein and gene expression, chromatin conformation information, as well as publicly available GWAS and phenome-wide association study (PheWAS) results. Leveraging data generated from independently performed functional validation experiments, we demonstrate that our prioritized variants, genes, or variant-gene links are significantly more likely to be experimentally validated. This study not only has important implications for systematic and efficient revelation of functional mechanisms underlying GWAS variants for hematological traits but also provides a prototype that can be adapted to many other complex traits, paving the path for efficient variant-to-function (V2F) analyses.

8 citations


Journal ArticleDOI
TL;DR: The genetic architecture of the ACE2 protein is mapped, providing a useful resource for further biological and clinical studies on this coronavirus receptor, and it is detected that plasma ACE2 was genetically correlated with vascular diseases, severe COVID-19, and a wide range of human complex diseases and medications.
Abstract: Background: SARS-CoV-2, the causal agent of COVID-19, enters human cells using the ACE2 (angiotensin-converting enzyme 2) protein as a receptor. ACE2 is thus key to the infection and treatment of the coronavirus. ACE2 is highly expressed in the heart and respiratory and gastrointestinal tracts, playing important regulatory roles in the cardiovascular and other biological systems. However, the genetic basis of the ACE2 protein levels is not well understood. Methods: We have conducted the largest genome-wide association meta-analysis of plasma ACE2 levels in >28 000 individuals of the SCALLOP Consortium (Systematic and Combined Analysis of Olink Proteins). We summarize the cross-sectional epidemiological correlates of circulating ACE2. Using the summary statistics–based high-definition likelihood method, we estimate relevant genetic correlations with cardiometabolic phenotypes, COVID-19, and other human complex traits and diseases. We perform causal inference of soluble ACE2 on vascular disease outcomes and COVID-19 severity using mendelian randomization. We also perform in silico functional analysis by integrating with other types of omics data. Results: We identified 10 loci, including 8 novel, capturing 30% of the heritability of the protein. We detected that plasma ACE2 was genetically correlated with vascular diseases, severe COVID-19, and a wide range of human complex diseases and medications. An X-chromosome cis–protein quantitative trait loci–based mendelian randomization analysis suggested a causal effect of elevated ACE2 levels on COVID-19 severity (odds ratio, 1.63 [95% CI, 1.10–2.42]; P=0.01), hospitalization (odds ratio, 1.52 [95% CI, 1.05–2.21]; P=0.03), and infection (odds ratio, 1.60 [95% CI, 1.08–2.37]; P=0.02). Tissue- and cell type–specific transcriptomic and epigenomic analysis revealed that the ACE2 regulatory variants were enriched for DNA methylation sites in blood immune cells. Conclusions: Human plasma ACE2 shares a genetic basis with cardiovascular disease, COVID-19, and other related diseases. The genetic architecture of the ACE2 protein is mapped, providing a useful resource for further biological and clinical studies on this coronavirus receptor.

Posted ContentDOI
22 Mar 2022-medRxiv
TL;DR: Blood DNAm biomarkers for fitness parameters gait speed, hand grip strength, forced expiratory volume in one second (FEV1), and maximal oxygen uptake (VO2max) are developed and used to construct DNAmFitAge, a new biological age indicator that incorporates physical fitness with epigenetic mortality risk estimators.
Abstract: Physical fitness is a well-known correlate of health and the aging process. DNA methylation (DNAm) data lend themselves for estimating chronological and biological age through epigenetic clocks. However, current epigenetic clocks did not yet use measures of mobility, strength, lung, or endurance physical fitness parameters in their construction. Here, we develop blood DNAm biomarkers for fitness parameters gait speed (walking speed), hand grip strength, forced expiratory volume in one second (FEV1), and maximal oxygen uptake (VO2max). We then use these DNAm biomarkers to construct DNAmFitAge, a new biological age indicator that incorporates physical fitness with epigenetic mortality risk estimators. Adjusting DNAmFitAge for chronological age generates a novel measure of epigenetic age acceleration, FitAgeAcceleration, which is informative for physical activity level (p=1.2E-12), mortality risk (p=5.9E-13), coronary heart disease risk (p=0.0051), comorbidities (p=9.0E-9), and disease-free status (p=1.1E-6) across several large validation datasets. These newly constructed DNAm biomarkers and DNAmFitAge provide researchers and physicians a new method to incorporate physical fitness into epigenetic clocks and emphasizes the effect of lifestyle on the aging process.

Journal ArticleDOI
TL;DR: In this article , the authors utilized whole genome sequencing data in ancestrally diverse participants of the NHLBI Trans Omics for Precision Medicine program (N = 50,675) to detect structural variants associated with hematologic traits.
Abstract: Genome-wide association studies have identified thousands of single nucleotide variants and small indels that contribute to variation in hematologic traits. While structural variants are known to cause rare blood or hematopoietic disorders, the genome-wide contribution of structural variants to quantitative blood cell trait variation is unknown. Here we utilized whole genome sequencing data in ancestrally diverse participants of the NHLBI Trans Omics for Precision Medicine program (N = 50,675) to detect structural variants associated with hematologic traits. Using single variant tests, we assessed the association of common and rare structural variants with red cell-, white cell-, and platelet-related quantitative traits and observed 21 independent signals (12 common and 9 rare) reaching genome-wide significance. The majority of these associations (N = 18) replicated in independent datasets. In genome-editing experiments, we provide evidence that a deletion associated with lower monocyte counts leads to disruption of an S1PR3 monocyte enhancer and decreased S1PR3 expression.

Journal ArticleDOI
TL;DR: MetaSTAAR as mentioned in this paper is a powerful and resource-efficient rare variant meta-analysis framework for large-scale whole genome sequencing/whole exome sequencing (WGS/WES) studies.
Abstract: Meta-analysis of whole genome sequencing/whole exome sequencing (WGS/WES) studies provides an attractive solution to the problem of collecting large sample sizes for discovering rare variants associated with complex phenotypes. Existing rare variant meta-analysis approaches are not scalable to biobank-scale WGS data. Here we present MetaSTAAR, a powerful and resource-efficient rare variant meta-analysis framework for large-scale WGS/WES studies. MetaSTAAR accounts for relatedness and population structure, can analyze both quantitative and dichotomous traits and boosts the power of rare variant tests by incorporating multiple variant functional annotations. Through meta-analysis of four lipid traits in 30,138 ancestrally diverse samples from 14 studies of the Trans Omics for Precision Medicine (TOPMed) Program, we show that MetaSTAAR performs rare variant meta-analysis at scale and produces results comparable to using pooled data. Additionally, we identified several conditionally significant rare variant associations with lipid traits. We further demonstrate that MetaSTAAR is scalable to biobank-scale cohorts through meta-analysis of TOPMed WGS data and UK Biobank WES data of ~200,000 samples.

Journal ArticleDOI
TL;DR: A transcriptome-wide association study of 29 hematological traits in 399 UK Biobank participants of European ancestry using gene expression prediction models trained from whole blood RNA-seq data in 922 individuals discovered 557 gene-trait associations distinct from previously reported GWAS variants in European populations.
Abstract: Previous genome-wide association studies (GWAS) of hematological traits have identified over 10 000 distinct trait-specific risk loci. However, at these loci, the underlying causal mechanisms remain incompletely characterized. To elucidate novel biology and better understand causal mechanisms at known loci, we performed a transcriptome-wide association study (TWAS) of 29 hematological traits in 399 835 UK Biobank (UKB) participants of European ancestry using gene expression prediction models trained from whole blood RNA-seq data in 922 individuals. We discovered 557 gene-trait associations for hematological traits distinct from previously reported GWAS variants in European populations. Among the 557 associations, 301 were available for replication in a cohort of 141 286 participants of European ancestry from the Million Veteran Program (MVP). Of these 301 associations, 108 replicated at a strict Bonferroni adjusted threshold ($\alpha$ = 0.05/301). Using our TWAS results, we systematically assigned 4261 out of 16 900 previously identified hematological trait GWAS variants to putative target genes. Compared to coloc, our TWAS results show reduced specificity and increased sensitivity in external datasets to assign variants to target genes.

Journal ArticleDOI
TL;DR: In this paper , the association of sCD163 levels with cardiovascular disease events and mortality was examined using a Cox regression model, and the association was found to be associated with all-cause mortality (hazard ratio [HR], 1.08 [95% CI, 1.04-1.12] per SD increase), cardiovascular disease mortality, and incident coronary heart disease.
Abstract: Background Monocytes/macrophages participate in cardiovascular disease. CD163 (cluster of differentiation 163) is a monocyte/macrophage receptor, and the shed sCD163 (soluble CD163) reflects monocyte/macrophage activation. We examined the association of sCD163 with incident cardiovascular disease events and performed a genome‐wide association study to identify sCD163‐associated variants. Methods and Results We measured plasma sCD163 in 5214 adults (aged ≥65 years, 58.7% women, 16.2% Black) of the CHS (Cardiovascular Health Study). We used Cox regression models (associations of sCD163 with incident events and mortality); median follow‐up was 26 years. Genome‐wide association study analyses were stratified on race. Adjusted for age, sex, and race and ethnicity, sCD163 levels were associated with all‐cause mortality (hazard ratio [HR], 1.08 [95% CI, 1.04–1.12] per SD increase), cardiovascular disease mortality (HR, 1.15 [95% CI, 1.09–1.21]), incident coronary heart disease (HR, 1.10 [95% CI, 1.04–1.16]), and incident heart failure (HR, 1.18 [95% CI, 1.12–1.25]). When further adjusted (eg, cardiovascular disease risk factors), only incident coronary heart disease lost significance. In European American individuals, genome‐wide association studies identified 38 variants on chromosome 2 near MGAT5 (top result rs62165726, P=3.3×10−18),19 variants near chromosome 17 gene ASGR1 (rs55714927, P=1.5×10−14), and 18 variants near chromosome 11 gene ST3GAL4. These regions replicated in the European ancestry ADDITION‐PRO cohort, a longitudinal cohort study nested in the Danish arm of the Anglo‐Danish‐Dutch study of Intensive Treatment Intensive Treatment In peOple with screeNdetcted Diabetes in Primary Care. In Black individuals, we identified 9 variants on chromosome 6 (rs3129781 P=7.1×10−9) in the HLA region, and 3 variants (rs115391969 P=4.3×10−8) near the chromosome 16 gene MYLK3 . Conclusions Monocyte function, as measured by sCD163, may be predictive of overall and cardiovascular‐specific mortality and incident heart failure.

Posted ContentDOI
08 Oct 2022-bioRxiv
TL;DR: GAUDI as discussed by the authors is a penalized-regression-based method specifically designed for admixed individuals by explicitly modeling ancestry-specific effects and jointly estimating ancestry-shared effects.
Abstract: Polygenic risk scores (PRS) have shown successes in clinics, but most PRS methods have focused only on individuals with one primary continental ancestry, thus poorly accommodating recently-admixed individuals. Here, we develop GAUDI, a novel penalized-regression-based method specifically designed for admixed individuals by explicitly modeling ancestry-specific effects and jointly estimating ancestry-shared effects. We demonstrate marked advantages of GAUDI over other methods through comprehensive simulation and real data analyses.

Journal ArticleDOI
Daniel DiCorpo, Sheila M. Gaynor, Emily M. Russell, Kenneth Westerman, Laura M. Raffield, Timothy D. Majarian, Peitao Wu, Chloé Sarnowski, Heather M. Highland, Anne U. Jackson, Natalie R Hasbani, Paul S. de Vries, Jennifer A. Brody, Bertha Hidalgo, Xiuqing Guo, James A. Perry, Jeffrey R. O'Connell, Samantha Lent, May E. Montasser, Brian E. Cade, Deepti Jain, Heming Wang, Ricardo D’Oliveira Albanus, Arushi Varshney, Lisa R. Yanek, Leslie A. Lange, Nicholette D. Palmer, Marcio Almeida, Juan M. Peralta, Stella Aslibekyan, Abigail S. Baldridge, Alain G. Bertoni, Lawrence F. Bielak, Chung-Shiuan Chen, Yii-Der Ida Chen, Won Jung Choi, Mark O. Goodarzi, James S. Floyd, Marguerite R. Irvin, Rita Kalyani, Tanika N. Kelly, Seonwook Lee, Ching-Ti Liu, Douglas Loesch, JoAnn E. Manson, Ryan L. Minster, Take Naseri, James S. Pankow, Laura J. Rasmussen-Torvik, Alexander P. Reiner, Muagututi‘a Sefuiva Reupena, Elizabeth Selvin, Jennifer A. Smith, Daniel E. Weeks, Huichun Xu, Jie Yao, Wei Zhao, Stephen C. J. Parker, Álvaro Alonso, Donna K. Arnett, John Blangero, Eric Boerwinkle, Adolfo Correa, L. Adrienne Cupples, Joanne E. Curran, Ravindranath Duggirala, Jiang He, Susan R. Heckbert, Sharon L.R. Kardia, Ryan W. Kim, Charles Kooperberg, Simin Liu, Rasika A. Mathias, Stephen T. McGarvey, Braxton D. Mitchell, Alanna C. Morrison, Patricia A. Peyser, Bruce M. Psaty, Susan Redline, Alan R. Shuldiner, Kent D. Taylor, Ramachandran S. Vasan, Karine A. Viaud-Martinez, JC Florez, James F. Wilson, Robert Sladek, Stephen S. Rich, Jerome I. Rotter, Xihong Lin, Josée Dupuis, James B. Meigs, Jennifer Wessel, Alisa K. Manning 
28 Jul 2022
TL;DR: The genetic determinants of fasting glucose (FG) and fasting insulin (FI) have been studied mostly through genome arrays, resulting in over 100 associated variants as discussed by the authors , and the authors extended this work with high-coverage whole genome sequencing analyses from fifteen cohorts in NHLBI's Trans-Omics for Precision Medicine (TOPMed) program.
Abstract: The genetic determinants of fasting glucose (FG) and fasting insulin (FI) have been studied mostly through genome arrays, resulting in over 100 associated variants. We extended this work with high-coverage whole genome sequencing analyses from fifteen cohorts in NHLBI's Trans-Omics for Precision Medicine (TOPMed) program. Over 23,000 non-diabetic individuals from five race-ethnicities/populations (African, Asian, European, Hispanic and Samoan) were included. Eight variants were significantly associated with FG or FI across previously identified regions MTNR1B, G6PC2, GCK, GCKR and FOXA2. We additionally characterize suggestive associations with FG or FI near previously identified SLC30A8, TCF7L2, and ADCY5 regions as well as APOB, PTPRT, and ROBO1. Functional annotation resources including the Diabetes Epigenome Atlas were compiled for each signal (chromatin states, annotation principal components, and others) to elucidate variant-to-function hypotheses. We provide a catalog of nucleotide-resolution genomic variation spanning intergenic and intronic regions creating a foundation for future sequencing-based investigations of glycemic traits.

Journal ArticleDOI
TL;DR: The authors conducted a whole-exome sequencing (WES) study leveraging large cohorts well-phenotyped for chronic kidney disease (CKD) and diabetes to identify rare variants for CKD.
Abstract: Diabetic kidney disease (DKD) is recognized as an important public health challenge. However, its genomic mechanisms are poorly understood. To identify rare variants for DKD, we conducted a whole-exome sequencing (WES) study leveraging large cohorts well-phenotyped for chronic kidney disease (CKD) and diabetes. Our two-stage whole-exome sequencing study included 4372 European and African ancestry participants from the Chronic Renal Insufficiency Cohort (CRIC) and Atherosclerosis Risk in Communities (ARIC) studies (stage-1) and 11 487 multi-ancestry Trans-Omics for Precision Medicine (TOPMed) participants (stage-2). Generalized linear mixed models, which accounted for genetic relatedness and adjusted for age, sex, and ancestry, were used to test associations between single variants and DKD. Gene-based aggregate rare variant analyses were conducted using an optimized sequence kernel association test (SKAT-O) implemented within our mixed model framework. We identified four novel exome-wide significant DKD-related loci through initiating diabetes. In single variant analyses, participants carrying a rare, in-frame insertion in the DIS3L2 gene (rs141560952) exhibited a 193-fold increased odds (95% confidence interval: 33.6, 1105) of DKD compared with non-carriers (P = 3.59 × 10-9). Likewise, each copy of a low-frequency KRT6B splice-site variant (rs425827) conferred a 5.31-fold higher odds (95% confidence interval: 3.06, 9.21) of DKD (P = 2.72 × 10-9). Aggregate gene-based analyses further identified ERAP2 (P = 4.03 × 10-8) and NPEPPS (P = 1.51 × 10-7), which are both expressed in the kidney and implicated in renin-angiotensin-aldosterone system modulated immune response. In the largest WES study of DKD, we identified novel rare variant loci attaining exome-wide significance. These findings provide new insights into the molecular mechanisms underlying DKD.

Journal ArticleDOI
TL;DR: In this article , concentrations of nicotine, cotinine, and hydroxycotinine were measured by mass spectrometry (MS) in supernatants of induced sputum obtained from participants in the SubPopulations and Intermediate Outcome Measures In COPD Study (SPIROMICS), an ongoing observational study that included never smokers, former smokers, and current smokers with and without chronic obstructive pulmonary disease (COPD).
Abstract: Nicotine from cigarette smoke is a biologically active molecule that has pleiotropic effects in the airway, which could play a role in smoking induced lung disease. However, whether nicotine and its metabolites reach sustained, physiologically relevant concentrations on airway surfaces of smokers is not well defined. To address these issues, concentrations of nicotine, cotinine, and hydroxycotinine were measured by mass spectrometry (MS) in supernatants of induced sputum obtained from participants in the SubPopulations and Intermediate Outcome Measures In COPD Study (SPIROMICS), an ongoing observational study that included never smokers, former smokers, and current smokers with and without chronic obstructive pulmonary disease (COPD). A total of 980 sputum supernatants were analyzed from 77 healthy never smokers, 494 former smokers (233 with COPD), and 396 active smokers (151 with COPD). Sputum nicotine, cotinine, and hydroxycotinine concentrations corresponded to self-reported smoking status and were strongly correlated to urine measures. A cutoff of ~8-10 ng/mL of sputum cotinine distinguished never smokers from active smokers. Accounting for sample dilution during processing, active smokers had airway nicotine concentrations in the 70-850 ng/mL (~0.5 to 5 µM) range, and concentrations remained elevated even in current smokers who had not smoked within 24 hours. This study demonstrates that airway nicotine and its metabolites are readily measured in sputum supernatants and can serve as biological markers of smoke exposure. In current smokers, nicotine is present at physiologically relevant concentrations for prolonged periods, supporting a contribution to cigarette induced airways disease.

Journal ArticleDOI
TL;DR: MagicalRsq as mentioned in this paper is a machine learning-based method that integrates variant-level imputation and population genetics statistics to provide a better calibrated imputation quality metric for lower-frequency variants.
Abstract: Whole-genome sequencing (WGS) is the gold standard for fully characterizing genetic variation but is still prohibitively expensive for large samples. To reduce costs, many studies sequence only a subset of individuals or genomic regions, and genotype imputation is used to infer genotypes for the remaining individuals or regions without sequencing data. However, not all variants can be well imputed, and the current state-of-the-art imputation quality metric, denoted as standard Rsq, is poorly calibrated for lower-frequency variants. Here, we propose MagicalRsq, a machine-learning-based method that integrates variant-level imputation and population genetics statistics, to provide a better calibrated imputation quality metric. Leveraging WGS data from the Cystic Fibrosis Genome Project (CFGP), and whole-exome sequence data from UK BioBank (UKB), we performed comprehensive experiments to evaluate the performance of MagicalRsq compared to standard Rsq for partially sequenced studies. We found that MagicalRsq aligns better with true R2 than standard Rsq in almost every situation evaluated, for both European and African ancestry samples. For example, when applying models trained from 1,992 CFGP sequenced samples to an independent 3,103 samples with no sequencing but TOPMed imputation from array genotypes, MagicalRsq, compared to standard Rsq, achieved net gains of 1.4 million rare, 117k low-frequency, and 18k common variants, where net gains were gained numbers of correctly distinguished variants by MagicalRsq over standard Rsq. MagicalRsq can serve as an improved post-imputation quality metric and will benefit downstream analysis by better distinguishing well-imputed variants from those poorly imputed. MagicalRsq is freely available on GitHub.

Journal ArticleDOI
TL;DR: The results may provide insights into pathways that influence the adverse effects of thiazide diuretic use and highlight the large sample sizes needed to detect modest pharmacogenetic effects.
Abstract: Introduction: Although thiazide diuretics are common therapies for the treatment of hypertension, evidence suggests that they are also associated with elevated blood lipid concentrations after their initiation. Pharmacogenomics studies could help identify potential biological pathways underlying this phenomenon, but few well-powered studies have been conducted. Methods: In participants of European ancestry from the UK Biobank, we conducted a genome-wide association study (GWAS) to examine whether common variants (minor allele frequency [MAF] >1%) modified the effect of thiazide diuretic use on three lipid measures: low density lipoprotein cholesterol (LDL-C), high-density lipoprotein cholesterol (HDL-C), and triglyceride (TG) concentrations (mmol/L). TG concentrations were natural log transformed, and LDL-C concentrations were adjusted for statin use. Given the conflicting evidence about the impact of loop diuretics on blood lipid concentrations, unexposed participants were defined as those taking neither thiazides nor loop diuretics. GWAS of UK10K plus 1000-Genomes Phase 3 imputed data were performed using SAIGE, adjusting for age, sex, fasting status, body mass index, study center, and the first 10 ancestral principal components. Results: Approximately 7.1% (n=27,971) of participants (n=395,058; mean age=57 years; 54% female [n=214,334]) used thiazides at study baseline. The thiazide-LDL-C GWAS (n=9,680,686 variants) identified one chromosome 19 locus harboring several genome-wide significant variants that modified the effect of thiazide diuretic use on LDL-C. The lead variant rs2199576 (MAF = 0.18) had an effect estimate (standard error) of 0.159 (0.013) among participants exposed to thiazides, and an effect estimate of 0.083 (0.003) among participants unexposed to any diuretic, yielding a significant variant-by-thiazide interaction effect estimate of 0.076 (0.013) ( p = 1.62 x 10 -9 ). These effect estimates are comparable to previously identified main effects of blood lipids loci. The rs2199576 variant resides near the PVRL2 locus, a cholesterol-responsive gene that is located within 100 kb of TOMM40 and the APO cluster, which have both been associated with blood lipid concentrations and Alzheimer’s disease. Conclusion: Our results may provide insights into pathways that influence the adverse effects of thiazide diuretic use and highlight the large sample sizes needed to detect modest pharmacogenetic effects.

Journal ArticleDOI
TL;DR: The authors' GWAS of obesity subclasses demonstrated unique genetic and functional properties between groups, supporting the contention that CVD risk may differ across obese individuals.
Abstract: Introduction: Despite decades of research linking obesity with cardiovascular disease (CVD) and its risk factors, major research gaps remain, with few studies evaluating the often-described but poorly understood heterogeneity in CVD risk observed within obese populations. One plausible, but largely unexplored source of heterogeneity in obesity-associated CVD risk is the impact of body fat distribution (central versus overall obesity), obesity duration and severity. Longitudinally assessed anthropometric measures and GWAS derived from these exposures may improve understanding of functional properties of loci associated with obesity and downstream influences on CVD. Methods: We respond to these research gaps by using 25 years of Atherosclerosis Risk in Communities (N=14,514, mean baseline age=54 years; 55% Female; 25.73% African-American) data and latent class mixed models to derive four longitudinal measures of obesity distribution, duration, and severity, operationalized as obesity subtypes. Genome wide association studies on the subclasses were conducted using logistic multi-variate regression models to test for associations between variants and BMI. Meta-analyses across race and sex were conducted in METAL. Functional properties of single nucleotide polymorphisms (SNP) were assessed in FUMA. Results: The four obesity subclasses were: decline (4.1%), stable/slow decline (67.8%) [referent], moderate increase (24.6%), and rapid increase (3.6%). The lead SNP from the decline group was known, on chromosome 7 (rs1196508, beta (SE) = -0.44 (0.089), p = 6.84x10 -7 , minor allele frequency (MAF) = 0.36), and nearest to a tyrosine phosphatase receptor, PTPRZ1 , in the central nervous system. We combined moderate and rapid increase for analyses; they shared a common top, unknown, SNP on chromosome 5 (rs76956550, beta (SE) = -0.37 (0.074), MAF=0.12), nearest to PGAM5P1 . The top SNP within the rapid increase group was unknown, on chromosome 14 (rs580888, beta(SE) = -0.57 (0.11), MAF=0.35), and nearest to the OTX2 gene, a transcription factor involved in production of dopaminergic neurons. Mutations of this gene are associated with pituitary hormone deficiency. Conclusion: Our GWAS of obesity subclasses demonstrated unique genetic and functional properties between groups, supporting the contention that CVD risk may differ across obese individuals. We intend to replicate our novel association of OTX2 with BMI, waist-to-hip ratio, and waist circumference in people who rapidly gained weight in other large longitudinal cohorts.

Journal ArticleDOI
TL;DR: This article identified low frequency and rare genetic variants within previously reported linkage regions on chromosomes 1 and 19 in African American families from the Trans-Omics for Precision Medicine (TOPMed) program.
Abstract: While large genome-wide association studies have identified nearly one thousand loci associated with variation in blood pressure, rare variant identification is still a challenge. In family-based cohorts, genome-wide linkage scans have been successful in identifying rare genetic variants for blood pressure. This study aims to identify low frequency and rare genetic variants within previously reported linkage regions on chromosomes 1 and 19 in African American families from the Trans-Omics for Precision Medicine (TOPMed) program. Genetic association analyses weighted by linkage evidence were completed with whole genome sequencing data within and across TOPMed ancestral groups consisting of 60,388 individuals of European, African, East Asian, Hispanic, and Samoan ancestries.Associations of low frequency and rare variants in RCN3 and multiple other genes were observed for blood pressure traits in TOPMed samples. The association of low frequency and rare coding variants in RCN3 was further replicated in UK Biobank samples (N = 403,522), and reached genome-wide significance for diastolic blood pressure (p = 2.01 × 10- 7).Low frequency and rare variants in RCN3 contributes blood pressure variation. This study demonstrates that focusing association analyses in linkage regions greatly reduces multiple-testing burden and improves power to identify novel rare variants associated with blood pressure traits.

Journal ArticleDOI
TL;DR: This article identified low frequency and rare genetic variants within previously reported linkage regions on chromosomes 1 and 19 in African American families from the Trans-Omics for Precision Medicine (TOPMed) program.
Abstract: While large genome-wide association studies have identified nearly one thousand loci associated with variation in blood pressure, rare variant identification is still a challenge. In family-based cohorts, genome-wide linkage scans have been successful in identifying rare genetic variants for blood pressure. This study aims to identify low frequency and rare genetic variants within previously reported linkage regions on chromosomes 1 and 19 in African American families from the Trans-Omics for Precision Medicine (TOPMed) program. Genetic association analyses weighted by linkage evidence were completed with whole genome sequencing data within and across TOPMed ancestral groups consisting of 60,388 individuals of European, African, East Asian, Hispanic, and Samoan ancestries.Associations of low frequency and rare variants in RCN3 and multiple other genes were observed for blood pressure traits in TOPMed samples. The association of low frequency and rare coding variants in RCN3 was further replicated in UK Biobank samples (N = 403,522), and reached genome-wide significance for diastolic blood pressure (p = 2.01 × 10- 7).Low frequency and rare variants in RCN3 contributes blood pressure variation. This study demonstrates that focusing association analyses in linkage regions greatly reduces multiple-testing burden and improves power to identify novel rare variants associated with blood pressure traits.

Journal ArticleDOI
David Stacey, Lingyan Chen, Paulina J Stanczyk, Joanna M. M. Howson, Amy M. Mason, Stephen Burgess, Stephen MacDonald, Jonathan Langdown, Harriet McKinney, Kate Downes, Neda Farahi, James E. Peters, Saonli Basu, James S. Pankow, Weihong Tang, Nathan Pankratz, Maria Sabater-Lleal, Paul S. de Vries, Nicholas L. Smith, Abbas Dehghan, Adam S. Heath, Alanna C. Morrison, Alexander P. Reiner, Andrew D. Johnson, Anne Richmond, Annette Peters, Astrid van Hylckama Vlieg, Barbara McKnight, Bruce M. Psaty, Caroline Hayward, Cavin K. Ward-Caviness, Christopher J. O'Donnell, Daniel I. Chasman, David P. Strachan, David-Alexandre Trégouët, Dennis O. Mook-Kanamori, Dipender Gill, Florian Thibord, Folkert W. Asselbergs, Frank W.G. Leebeek, Frits R. Rosendaal, Gail Davies, Georg Homuth, Gerard Temprano, Harry Campbell, Herman A. Taylor, Jan Bressler, Jennifer E. Huffman, Jerome I. Rotter, Jie Yao, James F. Wilson, Joshua C. Bis, Julie Hahn, Karl C. Desch, Kerri L. Wiggins, Laura M. Raffield, Lawrence F. Bielak, Lisa R. Yanek, Marcus E. Kleber, Martina Mueller, Maryam Kavousi, Massimo Mangino, Matthew P. Conomos, Melissa X. Liu, Min A. Jhun, Ming-Huei Chen, Moniek P.M. de Maat, Patricia A. Peyser, Paul Elliot, Peng Wei, Philipp S. Wild, Pierre-Emmanuel Morange, P. Van der Harst, Qiong Yang, Ngoc Quynh Le, Riccardo E. Marioni, Ruifang Li, Scott M. Damrauer, Simon R. Cox, Stella Trompet, Stephan B. Felix, Uwe Völker, Wolfgang Koenig, J. Wouter Jukema, Xiuqing Guo, Amy D. Gelinas, Daniel J. Schneider, Nebojsa Janjic, Nilesh J. Samani, Shu Ye, Charlotte Summers, Edwin R. Chilvers, John Danesh, Dirk S. Paul 
TL;DR: In this paper , the authors identify the likely causal variant at the locus and protein C as a causal factor, which is associated with lower coronary artery disease (CAD) risk but higher venous thromboembolism (VTE) risk.
Abstract: Many individual genetic risk loci have been associated with multiple common human diseases. However, the molecular basis of this pleiotropy often remains unclear. We present an integrative approach to reveal the molecular mechanism underlying the PROCR locus, associated with lower coronary artery disease (CAD) risk but higher venous thromboembolism (VTE) risk. We identify PROCR-p.Ser219Gly as the likely causal variant at the locus and protein C as a causal factor. Using genetic analyses, human recall-by-genotype and in vitro experimentation, we demonstrate that PROCR-219Gly increases plasma levels of (activated) protein C through endothelial protein C receptor (EPCR) ectodomain shedding in endothelial cells, attenuating leukocyte-endothelial cell adhesion and vascular inflammation. We also associate PROCR-219Gly with an increased pro-thrombotic state via coagulation factor VII, a ligand of EPCR. Our study, which links PROCR-219Gly to CAD through anti-inflammatory mechanisms and to VTE through pro-thrombotic mechanisms, provides a framework to reveal the mechanisms underlying similar cross-phenotype associations.

Posted ContentDOI
08 Nov 2022-bioRxiv
TL;DR: In this paper , an existing mCA calling algorithm was adapted for application to WGS data, and observed higher sensitivity compared with array-based data, in uncovering mCAs at low mutant cell fractions.
Abstract: Mosaic mutations in blood are common with increasing age and are prognostic markers for cancer, cardiovascular dysfunction and other diseases. This group of acquired mutations include megabase-scale mosaic chromosomal alterations (mCAs). These large mutations have mainly been surveyed using SNP array data from individuals of European (EA) or Japanese genetic ancestry. To gain a better understanding of mCA rates and associated risk factors in genetically diverse populations, we surveyed whole genome sequencing data from 67,390 individuals, including 20,132 individuals of African ancestry (AA), and 7,608 of Hispanic ancestry (HA) with deep (30X) whole genome sequencing data from the NHLBI Trans Omics for Precision Medicine (TOPMed) program. We adapted an existing mCA calling algorithm for application to WGS data, and observed higher sensitivity with WGS data, compared with array-based data, in uncovering mCAs at low mutant cell fractions. As in previous reports, we observed a strong association with age and a non-uniform distribution of mCAs across the genome. The presence of autosomal (but not chromosome X) mCAs was associated with an increased risk of both lymphoid and myeloid malignancies. After adjusting for age, we found that individuals of European ancestry have the highest rates of autosomal mCAs, mirroring the higher rate of leukemia in this group. Our analysis also uncovered higher rates of chromosome X mCAs in AA and HA compared to EA, again after adjusting for age. Germline variants in ATM and MPL showed strong associations with mCAs in cis, including ancestry specific variants. And rare variant gene-burden analysis confirmed the association of putatively protein altering variants in ATM and MPL with mCAs in cis. Individual rare variants in DCPS, ADM17, PPP1R16B, and TET2 were all associated with autosomal mCAs and rare variants in OR4C16 were associated with chromosome X mCAs in females. There was significant enrichment of co-occurrence of CHIP mutations and mCAs both altering cancer associated genes TET2, DNMT3A, JAK2, CUX1, and TP53. Overall, our study demonstrates that rates of mCAs differ across populations and that rare inherited germline variants are strongly associated with mCAs across genetically diverse populations. These results strongly motivate further studies of mCAs in under-represented populations to better understand the causes and consequences of this class of somatic variation.

Posted ContentDOI
16 Apr 2022-medRxiv
TL;DR: An eQTM resource of CpG-transcript pairs is developed that can help inform future functional studies that seek to understand the molecular basis of disease and share genetic regulation between CpGs and transcripts associated with these cardiometabolic traits.
Abstract: Background. Expression quantitative trait methylation (eQTM) analysis identifies DNA CpG sites at which methylation is associated with gene expression and may reveal molecular mechanisms of disease. The present study describes an eQTM resource of CpG-transcript pairs. Methods. DNA methylation was measured in blood samples from 1,045 Framingham Heart Study (FHS) participants using the Illumina 450K BeadChip and in 1,070 FHS participants using the Illumina EPIC array. Blood gene expression data were collected from all 2,115 participants using RNA sequencing (RNA-seq). The association between DNA methylation and gene expression was quantified for all cis (i.e., within 1Mb) and trans (>1Mb) CpG-transcript pairs. Significant results (p<1E-7 for cis and <1E-14 for trans) were subsequently tested for enrichment of biological pathways and of clinical traits. Results. We identified 70,047 significant cis CpG-transcript pairs where the top most significant eGenes (i.e., gene transcripts associated with a CpG) were enriched in biological pathways related to cell signaling, and for 1,208 clinical traits (enrichment false discovery rate [FDR] [≤] 0.05). We also identified 246,667 significant trans CpG-transcript pairs where the top most significant eGenes were enriched in biological pathways related to activation of the immune response, and for 1,191 clinical traits (enrichment FDR [≤] 0.05). Using significant cis CpG-transcript pairs, we identified significant mediation of the association between CpG sites and cardiometabolic traits through gene expression and identified shared genetic regulation between CpGs and transcripts associated with these cardiometabolic traits. Conclusions. We developed a robust and powerful resource of eQTM CpG-transcript pairs that can help inform future functional studies that seek to understand the molecular basis of disease.


Journal ArticleDOI
David Stacey, Lingyan Chen, Paulina J Stanczyk, Joanna M. M. Howson, Amy M. Mason, Stephen Burgess, Stephen MacDonald, Jonathan Langdown, Harriet McKinney, Kate Downes, Neda Farahi, James E. Peters, Saonli Basu, James S. Pankow, Weihong Tang, Nathan James Pankratz, Maria Sabater-Lleal, Paul S. de Vries, Nicholas L. Smith, Abbas Dehghan, Adam S. Heath, Alanna C. Morrison, Alexander P. Reiner, Andrew Johnson, Anne Richmond, Annette Peters, Astrid van Hylckama Vlieg, Barbara McKnight, Bruce M. Psaty, Caroline Hayward, Cavin K. Ward-Caviness, Christopher J. O'Donnell, Daniel I. Chasman, David P. Strachan, David-Alexandre Trégouët, Dennis O. Mook-Kanamori, Dipender Gill, Florian Thibord, Folkert W. Asselbergs, Frank W.G. Leebeek, Frits R. Rosendaal, Gail Davies, Georg Homuth, Gerard Temprano, Harry Campbell, Herman A. Taylor, Jan Bressler, Jennifer E. Huffman, Jerome I. Rotter, Jie Yao, James F. Wilson, Joshua C. Bis, Julie Hahn, Karl C. Desch, Kerri L. Wiggins, Laura M. Raffield, Lawrence F. Bielak, Lisa R. Yanek, Marcus E. Kleber, Martina Mueller, Maryam Kavousi, Massimo Mangino, Matthew P. Conomos, Melissa X. Liu, Min A. Jhun, Ming-Huei Chen, Moniek P.M. de Maat, Patricia A. Peyser, Paul Elliot, Peng Wei, Philipp S. Wild, Pierre-Emmanuel Morange, P. Van der Harst, Qiong Yang, Ngoc Quynh Le, Riccardo E. Marioni, Ruifang Li, Scott M. Damrauer, Simon R. Cox, Stella Trompet, Stéphane Félix, Uwe Völker, Wolfgang Koenig, J. Wouter Jukema, Xiuqing Guo, Amy D. Gelinas, Daniel J. Schneider, Nebojsa Janjic, Nilesh J. Samani, Shu Ye, Charlotte Summers, Edwin R. Chilvers, John Danesh, Dirk S. Paul