scispace - formally typeset
Search or ask a question

Showing papers by "Gonçalo R. Abecasis published in 2019"


Posted ContentDOI
Daniel Taliun1, Daniel N. Harris2, Michael D. Kessler2, Jedidiah Carlson1  +191 moreInstitutions (61)
06 Mar 2019-bioRxiv
TL;DR: The nearly complete catalog of genetic variation in TOPMed studies provides unique opportunities for exploring the contributions of rare and non-coding sequence variants to phenotypic variation as well as resources and early insights from the sequence data.
Abstract: Summary paragraph The Trans-Omics for Precision Medicine (TOPMed) program seeks to elucidate the genetic architecture and disease biology of heart, lung, blood, and sleep disorders, with the ultimate goal of improving diagnosis, treatment, and prevention. The initial phases of the program focus on whole genome sequencing of individuals with rich phenotypic data and diverse backgrounds. Here, we describe TOPMed goals and design as well as resources and early insights from the sequence data. The resources include a variant browser, a genotype imputation panel, and sharing of genomic and phenotypic data via dbGaP. In 53,581 TOPMed samples, >400 million single-nucleotide and insertion/deletion variants were detected by alignment with the reference genome. Additional novel variants are detectable through assembly of unmapped reads and customized analysis in highly variable loci. Among the >400 million variants detected, 97% have frequency

662 citations


Journal ArticleDOI
TL;DR: Genome-wide association analyses based on whole-genome sequencing and imputation identify 40 new risk variants for colorectal cancer, including a strongly protective low-frequency variant at CHD1 and loci implicating signaling and immune function in disease etiology.
Abstract: To further dissect the genetic architecture of colorectal cancer (CRC), we performed whole-genome sequencing of 1,439 cases and 720 controls, imputed discovered sequence variants and Haplotype Reference Consortium panel variants into genome-wide association study data, and tested for association in 34,869 cases and 29,051 controls. Findings were followed up in an additional 23,262 cases and 38,296 controls. We discovered a strongly protective 0.3% frequency variant signal at CHD1. In a combined meta-analysis of 125,478 individuals, we identified 40 new independent signals at P < 5 × 10-8, bringing the number of known independent signals for CRC to ~100. New signals implicate lower-frequency variants, Kruppel-like factors, Hedgehog signaling, Hippo-YAP signaling, long noncoding RNAs and somatic drivers, and support a role for immune function. Heritability analyses suggest that CRC risk is highly polygenic, and larger, more comprehensive studies enabling rare variant analysis will improve understanding of biology underlying this risk and influence personalized screening strategies and drug development.

324 citations


Journal ArticleDOI
TL;DR: Three additional genes, RLBP1, HIC1 and PARP12, after Bonferroni correction are identified and the Eye Genotype Expression database is established as a resource for post-GWAS interpretation of multifactorial ocular traits.
Abstract: Genome-wide association studies (GWAS) have identified genetic variants at 34 loci contributing to age-related macular degeneration (AMD)1-3. We generated transcriptional profiles of postmortem retinas from 453 controls and cases at distinct stages of AMD and integrated retinal transcriptomes, covering 13,662 protein-coding and 1,462 noncoding genes, with genotypes at more than 9 million common SNPs for expression quantitative trait loci (eQTL) analysis of a tissue not included in Genotype-Tissue Expression (GTEx) and other large datasets4,5. Cis-eQTL analysis identified 10,474 genes under genetic regulation, including 4,541 eQTLs detected only in the retina. Integrated analysis of AMD-GWAS with eQTLs ascertained likely target genes at six reported loci. Using transcriptome-wide association analysis (TWAS), we identified three additional genes, RLBP1, HIC1 and PARP12, after Bonferroni correction. Our studies expand the genetic landscape of AMD and establish the Eye Genotype Expression (EyeGEx) database as a resource for post-GWAS interpretation of multifactorial ocular traits.

181 citations


Posted ContentDOI
09 Mar 2019-bioRxiv
TL;DR: The first tranche of large-scale exome sequence data for 49,960 study participants is described, revealing approximately 4 million coding variants and 231,631 predicted loss of function variants, a >10-fold increase compared to imputed sequence for the same participants.
Abstract: SUMMARY The UK Biobank is a prospective study of 502,543 individuals, combining extensive phenotypic and genotypic data with streamlined access for researchers around the world. Here we describe the first tranche of large-scale exome sequence data for 49,960 study participants, revealing approximately 4 million coding variants (of which ~98.4% have frequency 10-fold increase compared to imputed sequence for the same participants. Nearly all genes (>97%) had ≥1 predicted loss of function carrier, and most genes (>69%) had ≥10 loss of function carriers. We illustrate the power of characterizing loss of function variation in this large population through association analyses across 1,741 phenotypes. In addition to replicating a range of established associations, we discover novel loss of function variants with large effects on disease traits, including PIEZO1 on varicose veins, COL6A1 on corneal resistance, MEPE on bone density, and IQGAP2 and GMPR on blood cell traits. We further demonstrate the value of exome sequencing by surveying the prevalence of pathogenic variants of clinical significance in this population, finding that 2% of the population has a medically actionable variant. Additionally, we leverage the phenotypic data to characterize the relationship between rare BRCA1 and BRCA2 pathogenic variants and cancer risk. Exomes from the first 49,960 participants are now made accessible to the scientific community and highlight the promise offered by genomic sequencing in large-scale population-based studies.

123 citations


01 Jan 2019
TL;DR: The authors used exome-sequencing analyses of a large cohort of patients with Type 2 diabetes and control individuals without diabetes from five ancestries to identify gene-level associations of rare variants that are associated with type 2 diabetes.
Abstract: Protein-coding genetic variants that strongly affect disease risk can yield relevant clues to disease pathogenesis. Here we report exome-sequencing analyses of 20,791 individuals with type 2 diabetes (T2D) and 24,440 non-diabetic control participants from 5 ancestries. We identify gene-level associations of rare variants (with minor allele frequencies of less than 0.5%) in 4 genes at exome-wide significance, including a series of more than 30 SLC30A8 alleles that conveys protection against T2D, and in 12 gene sets, including those corresponding to T2D drug targets (P = 6.1 × 10−3) and candidate genes from knockout mice (P = 5.2 × 10−3). Within our study, the strongest T2D gene-level signals for rare variants explain at most 25% of the heritability of the strongest common single-variant signals, and the gene-level effect sizes of the rare variants that we observed in established T2D drug targets will require 75,000–185,000 sequenced cases to achieve exome-wide significance. We propose a method to interpret these modest rare-variant associations and to incorporate these associations into future target or gene prioritization efforts.Exome-sequencing analyses of a large cohort of patients with type 2 diabetes and control individuals without diabetes from five ancestries are used to identify gene-level associations of rare variants that are associated with type 2 diabetes.

107 citations


Journal ArticleDOI
David M. Brazel1, Yu Jiang2, Jordan M. Hughey2, Valérie Turcot3  +182 moreInstitutions (28)
TL;DR: Fine-mapping genome-wide association study loci identifies specific variants contributing to the biological etiology of substance use behavior, including nonsynonymous/loss-of-function coding variants.

60 citations


Journal ArticleDOI
TL;DR: It is shown that associated variants are generally predictive of CKD with only modest improvements in detection compared with other known clinical risk factors, and 53 novel loci for eGFR are identified in a GWAS meta-analysis, a subset of which are associated with other common diseases, such as diabetes and hypertension, based on PheWAS.
Abstract: Chronic kidney disease (CKD) is a growing health burden currently affecting 10-15% of adults worldwide. Estimated glomerular filtration rate (eGFR) as a marker of kidney function is commonly used to diagnose CKD. We analyze eGFR data from the Nord-Trondelag Health Study and Michigan Genomics Initiative and perform a GWAS meta-analysis with public summary statistics, more than doubling the sample size of previous meta-analyses. We identify 147 loci (53 novel) associated with eGFR, including genes involved in transcriptional regulation, kidney development, cellular signaling, metabolism, and solute transport. Additionally, sex-stratified analysis identifies one locus with more significant effects in women than men. Using genetic risk scores constructed from these eGFR meta-analysis results, we show that associated variants are generally predictive of CKD with only modest improvements in detection compared with other known clinical risk factors. Collectively, these results yield additional insight into the genetic factors underlying kidney function and progression to CKD.

47 citations


Journal ArticleDOI
Stephanie A. Bien1, Stephanie A. Bien2, Yu Ru Su2, Yu Ru Su1  +147 moreInstitutions (38)
TL;DR: PendiXcan as mentioned in this paper uses cis-regulatory variant predictors to impute expression and perform gene-level association tests in GWAS without directly measured transcriptomes, finding statistically significant associations using colon transcriptome models with TRIM4 and PYGL.
Abstract: Genome-wide association studies have reported 56 independently associated colorectal cancer (CRC) risk variants, most of which are non-coding and believed to exert their effects by modulating gene expression. The computational method PrediXcan uses cis-regulatory variant predictors to impute expression and perform gene-level association tests in GWAS without directly measured transcriptomes. In this study, we used reference datasets from colon (n = 169) and whole blood (n = 922) transcriptomes to test CRC association with genetically determined expression levels in a genome-wide analysis of 12,186 cases and 14,718 controls. Three novel associations were discovered from colon transverse models at FDR ≤ 0.2 and further evaluated in an independent replication including 32,825 cases and 39,933 controls. After adjusting for multiple comparisons, we found statistically significant associations using colon transcriptome models with TRIM4 (discovery P = 2.2 × 10− 4, replication P = 0.01), and PYGL (discovery P = 2.3 × 10− 4, replication P = 6.7 × 10− 4). Interestingly, both genes encode proteins that influence redox homeostasis and are related to cellular metabolic reprogramming in tumors, implicating a novel CRC pathway linked to cell growth and proliferation. Defining CRC risk regions as one megabase up- and downstream of one of the 56 independent risk variants, we defined 44 non-overlapping CRC-risk regions. Among these risk regions, we identified genes associated with CRC (P < 0.05) in 34/44 CRC-risk regions. Importantly, CRC association was found for two genes in the previously reported 2q25 locus, CXCR1 and CXCR2, which are potential cancer therapeutic targets. These findings provide strong candidate genes to prioritize for subsequent laboratory follow-up of GWAS loci. This study is the first to implement PrediXcan in a large colorectal cancer study and findings highlight the utility of integrating transcriptome data in GWAS for discovery of, and biological insight into, risk loci.

36 citations


Journal ArticleDOI
TL;DR: This paper examines the three most common skin cancer subtypes in the USA and conducts a phenome-wide association study within the MGI data to evaluate PRS associations with secondary traits, and develops an accompanying visual catalog called PRSweb that allows users to directly compare different PRS construction methods.
Abstract: Polygenic risk scores (PRS) are designed to serve as single summary measures that are easy to construct, condensing information from a large number of genetic variants associated with a disease. They have been used for stratification and prediction of disease risk. The primary focus of this paper is to demonstrate how we can combine PRS and electronic health records data to better understand the shared and unique genetic architecture and etiology of disease subtypes that may be both related and heterogeneous. PRS construction strategies often depend on the purpose of the study, the available data/summary estimates, and the underlying genetic architecture of a disease. We consider several choices for constructing a PRS using data obtained from various publicly-available sources including the UK Biobank and evaluate their abilities to predict not just the primary phenotype but also secondary phenotypes derived from electronic health records (EHR). This study was conducted using data from 30,702 unrelated, genotyped patients of recent European descent from the Michigan Genomics Initiative (MGI), a longitudinal biorepository effort within Michigan Medicine. We examine the three most common skin cancer subtypes in the USA: basal cell carcinoma, cutaneous squamous cell carcinoma, and melanoma. Using these PRS for various skin cancer subtypes, we conduct a phenome-wide association study (PheWAS) within the MGI data to evaluate PRS associations with secondary traits. PheWAS results are then replicated using population-based UK Biobank data and compared across various PRS construction methods. We develop an accompanying visual catalog called PRSweb that provides detailed PheWAS results and allows users to directly compare different PRS construction methods.

30 citations


Posted ContentDOI
Alexander G. Bick1, Joshua S. Weinstock2, Satish K. Nandakumar3, Charles P. Fulco4, Matthew Leventhal4, Erik L. Bao1, Joseph Nasser4, Seyedeh M. Zekavat5, Mindy D. Szeto6, Cecelia A. Laurie7, Margaret A. Taub8, Braxton D. Mitchell9, Kathleen C. Barnes6, Arden Moscati10, Myriam Fornage11, Susan Redline12, Bruce M. Psaty7, Edwin K. Silverman12, Scott T. Weiss12, Nicholette D. Palmer, Ramachandran S. Vasan13, Esteban G. Burchard14, Sharon L.R. Kardia2, Jiang He15, Robert C. Kaplan16, Nicholas L. Smith7, Donna K. Arnett17, David A. Schwartz6, Adolfo Correa18, Mariza de Andrade19, Xiuqing Guo20, Barbara A. Konkle7, Brian Custer, Juan M. Peralta21, Hongsheng Gui22, Deborah A. Meyers23, Stephen T. McGarvey24, Ida Yii-Der Chen25, M. Benjamin Shoemaker26, Patricia A Peyser27, Jai G. Broome7, Stephanie M. Gogarten7, Fei Fei Wang7, Quenna Wong7, May E Montasser9, Michelle Daya6, Eimear E. Kenny, Kari E. North28, Lenore J. Launer29, Brian E. Cade12, Joshua C. Bis7, Michael H. Cho12, Jessica Lasky-Su12, Donald W. Bowden, L. Adrienne Cupples13, A.C.Y. Mak14, Lewis C. Becker30, Jennifer A. Smith2, Tanika N. Kelly15, Stella Aslibekyan31, Susan R. Heckbert7, Hemant K. Tiwari32, Ivana V. Yang6, John A. Heit19, Steven A. Lubitz1, Stephen S. Rich, Jill M. Johnsen, Joanne E. Curran21, Sally E. Wenzel33, Daniel E. Weeks33, Dabeeru C. Rao34, Dawood Darbar35, Jee-Young Moon16, Russell P. Tracy36, Erin J Buth7, Nicholas Rafaels6, Ruth J. F. Loos10, Lifang Hou37, Jiwon Lee12, Priyadarshini Kachroo12, Barry I. Freedman, Daniel Levy29, Lawrence F. Bielak2, James E. Hixson38, James S. Floyd7, Eric A. Whitsel28, Patrick T. Ellinor1, Marguerite R. Irvin32, Tasha E. Fingerlin39, Laura M. Raffield28, Sebastian M. Armasu19, Jerome I. Rotter20, Marsha M. Wheeler7, Ester Cerdeira Sabino40, John Blangero21, L. Keoki Williams22, Bruce D. Levy12, Wayne Huey-Herng Sheu, Dan M. Roden41, Eric Boerwinkle11, JoAnn E. Manson12, Rasika A. Mathias30, Pinkal Desai42, Kent D. Taylor, Andrew D. Johnson29, Paul L. Auer43, Charles Kooperberg44, Cathy C. Laurie7, Thomas W. Blackwell2, Albert V. Smith2, Hongyu Zhao5, Ethan M. Lange6, Leslie A. Lange6, James G. Wilson45, Eric S. Lander4, Jesse M. Engreitz4, Benjamin L. Ebert1, Alexander P. Reiner7, Vijay G. Sankaran3, Sidd Jaiswal46, Gonçalo R. Abecasis2, Pradeep Natarajan1, Sekar Kathiresan4 
27 Sep 2019-bioRxiv
TL;DR: Overall, it is observed that germline genetic variation altering hematopoietic stem cell function and the fidelity of DNA-damage repair increase the likelihood of somatic mutations leading to CHIP.
Abstract: Age is the dominant risk factor for most chronic human diseases; yet the mechanisms by which aging confers this risk are largely unknown.1 Recently, the age-related acquisition of somatic mutations in regenerating hematopoietic stem cell populations was associated with both hematologic cancer incidence2–4 and coronary heart disease prevalence.5 Somatic mutations with leukemogenic potential may confer selective cellular advantages leading to clonal expansion, a phenomenon termed ‘Clonal Hematopoiesis of Indeterminate Potential’ (CHIP).6 Simultaneous germline and somatic whole genome sequence analysis now provides the opportunity to identify root causes of CHIP. Here, we analyze high-coverage whole genome sequences from 97,691 participants of diverse ancestries in the NHLBI TOPMed program and identify 4,229 individuals with CHIP. We identify associations with blood cell, lipid, and inflammatory traits specific to different CHIP genes. Association of a genome-wide set of germline genetic variants identified three genetic loci associated with CHIP status, including one locus at TET2 that was African ancestry specific. In silico-informed in vitro evaluation of the TET2 germline locus identified a causal variant that disrupts a TET2 distal enhancer. Aggregates of rare germline loss-of-function variants in CHEK2, a DNA damage repair gene, predisposed to CHIP acquisition. Overall, we observe that germline genetic variation altering hematopoietic stem cell function and the fidelity of DNA-damage repair increase the likelihood of somatic mutations leading to CHIP.

22 citations


01 Jan 2019
TL;DR: A transancestral exome-wide association study for body-fat distribution identifies protein-coding variants that are significantly associated with waist-to-hip ratio adjusted for body mass index.

Journal ArticleDOI
TL;DR: The Genes for Good study uses social media to engage a large, diverse participant pool in genetics research and education and has replicated genome-wide association studies (GWASs) for exemplar traits, such as asthma, diabetes, body mass index (BMI), and pigmentation.
Abstract: The Genes for Good study uses social media to engage a large, diverse participant pool in genetics research and education. Health history and daily tracking surveys are administered through a Facebook application, and participants who complete a minimum number of surveys are mailed a saliva sample kit ("spit kit") to collect DNA for genotyping. As of March 2019, we engaged >80,000 individuals, sent spit kits to >32,000 individuals who met minimum participation requirements, and collected >27,000 spit kits. Participants come from all 50 states and include a diversity of ancestral backgrounds. Rates of important chronic health indicators are consistent with those estimated for the general U.S. population using more traditional study designs. However, our sample is younger and contains a greater percentage of females than the general population. As one means of verifying data quality, we have replicated genome-wide association studies (GWASs) for exemplar traits, such as asthma, diabetes, body mass index (BMI), and pigmentation. The flexible framework of the web application makes it relatively simple to add new questionnaires and for other researchers to collaborate. We anticipate that the study sample will continue to grow and that future analyses may further capitalize on the strengths of the longitudinal data in combination with genetic information.

01 Jan 2019
TL;DR: The authors in this article performed whole-genome sequencing of 1,439 cases and 720 controls, imputed discovered sequence variants and Haplotype Reference Consortium panel variants into genome-wide association study data, and tested for association in 34,869 cases and 29,051 controls.
Abstract: To further dissect the genetic architecture of colorectal cancer (CRC), we performed whole-genome sequencing of 1,439 cases and 720 controls, imputed discovered sequence variants and Haplotype Reference Consortium panel variants into genome-wide association study data, and tested for association in 34,869 cases and 29,051 controls. Findings were followed up in an additional 23,262 cases and 38,296 controls. We discovered a strongly protective 0.3% frequency variant signal at CHD1. In a combined meta-analysis of 125,478 individuals, we identified 40 new independent signals at P < 5 × 10−8, bringing the number of known independent signals for CRC to ~100. New signals implicate lower-frequency variants, Krüppel-like factors, Hedgehog signaling, Hippo-YAP signaling, long noncoding RNAs and somatic drivers, and support a role for immune function. Heritability analyses suggest that CRC risk is highly polygenic, and larger, more comprehensive studies enabling rare variant analysis will improve understanding of biology underlying this risk and influence personalized screening strategies and drug development.Genome-wide association analyses based on whole-genome sequencing and imputation identify 40 new risk variants for colorectal cancer, including a strongly protective low-frequency variant at CHD1 and loci implicating signaling and immune function in disease etiology.

Journal Article
TL;DR: This study is the first to implement PrediXcan in a large colorectal cancer study and findings highlight the utility of integrating transcriptome data in GWAS for discovery of, and biological insight into, risk loci.
Abstract: Every author has erroneously been assigned to the affiliation "62". The affiliation 62 belongs to the author Graham Casey.

Posted ContentDOI
N Ng1, Sara M. Willems2, Juan P. Fernandez1, Rebecca S. Fine3  +261 moreInstitutions (68)
TL;DR: Functional studies demonstrated that a novel FG/FI association at the liver-enriched G6PC transcript was driven by multiple rare loss-of-function variants, including two alleles within the same codon with divergent effects on glucose levels, highlighting the value of integrating genomic and functional data to maximize biological inference.
Abstract: bioRxiv preprint doi: https://doi.org/10.1101/790618; this version posted October 3, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

Journal ArticleDOI
TL;DR: In the version of this article initially published, in Supplementary Data 5, the logFC, FC, P value and adjusted P value for advanced AMD versus control (DE 4/1) without age correction did not correspond to the correct gene IDs.
Abstract: In the version of this article initially published, in Supplementary Data 5, the logFC, FC, P value and adjusted P value for advanced AMD versus control (DE 4/1) without age correction did not correspond to the correct gene IDs. The errors have been corrected in the HTML version of the article.

Journal ArticleDOI
TL;DR: Efficient Methods for Estimation and Random Access of LD (EMeraLD) as mentioned in this paper is a computational tool that leverages sparsity and haplotype structure to estimate LD up to 2 orders of magnitude faster than current tools.
Abstract: Summary Estimating linkage disequilibrium (LD) is essential for a wide range of summary statistics-based association methods for genome-wide association studies. Large genetic datasets, e.g. the TOPMed WGS project and UK Biobank, enable more accurate and comprehensive LD estimates, but increase the computational burden of LD estimation. Here, we describe emeraLD (Efficient Methods for Estimation and Random Access of LD), a computational tool that leverages sparsity and haplotype structure to estimate LD up to 2 orders of magnitude faster than current tools. Availability and implementation emeraLD is implemented in C++, and is open source under GPLv3. Source code and documentation are freely available at http://github.com/statgen/emeraLD. Supplementary information Supplementary data are available at Bioinformatics online.

Posted ContentDOI
20 Mar 2019-bioRxiv
TL;DR: It is shown that SAIGE-GENE can efficiently analyze large sample data (N > 400,000) with type I error rates well controlled and is applicable to exome-wide and genome-wide region-based analysis for hundreds of thousands of samples.
Abstract: With very large sample sizes, population-based cohorts and biobanks provide an exciting opportunity to identify genetic components of complex traits. To analyze rare variants, gene or region-based multiple variant aggregate tests are commonly used to increase association test power. However, due to the substantial computation cost, existing region-based rare variant tests cannot analyze hundreds of thousands of samples while accounting for confounders, such as population stratification and sample relatedness. Here we propose a scalable generalized mixed model region-based association test that can handle large sample sizes. This method, SAIGE-GENE, utilizes state-of-the-art optimization strategies to reduce computational and memory cost, and hence is applicable to exome-wide and genome-wide region-based analysis for hundreds of thousands of samples. Through the analysis of the HUNT study of 69,716 Norwegian samples and the UK Biobank data of 408,910 White British samples, we show that SAIGE-GENE can efficiently analyze large sample data (N > 400,000) with type I error rates well controlled.

DatasetDOI
Mengzhen Liu, Yu Jiang, Robbee Wedow, Yue Li, David M. Brazel, Fang Chen, Gargi Datta, Jose Davila-Velderrain, Daniel McGuire, Chao Tian, Xiaowei Zhan, Hunt All-In Psychiatry, Hélène Choquet, Anna R. Docherty, Jessica D. Faul, Johanna R. Foerster, Lars G. Fritsche, Maiken Elvestad Gabrielsen, Scott D. Gordon, Jeffrey Haessler, Jouke-Jan Hottenga, Hongyan Huang, Seon-Kyeong Jang, Philip R. Jansen, Yueh Ling, Reedik Mägi, Nana Matoba, George McMahon, Antonella Mulas, Valeria Orrù, Teemu Palviainen, Anita Pandit, Reginsson, Gunnar W, Skogholt, Anne Heidi, Jennifer A. Smith, Amy E Taylor, Constance Turman, Gonneke Willemsen, Hannah Young, Kendra A. Young, Gregory J.M. Zajac, Wei Zhao, Wei Zhou, Gyda Bjornsdottir, Jason D. Boardman, Michael Boehnke, Dorret I. Boomsma, Chu Chen, Francesco Cucca, Gareth E. Davies, Charles B. Eaton, Marissa A. Ehringer, Tõnu Esko, Edoardo Fiorillo, Nathan A. Gillespie, Daniel F. Gudbjartsson, Toomas Haller, Kathleen Mullan Harris, Andrew Heath, John K. Hewitt, Ian B. Hickie, John E. Hokanson, Christian J. Hopfer, David J. Hunter, William G. Iacono, Eric O. Johnson, Yoichiro Kamatani, Sharon L.R. Kardia, Matthew C. Keller, Manolis Kellis, Charles Kooperberg, Peter Kraft, Kenneth Krauter, Markku Laakso, Penelope A. Lind, Anu Loukola, Sharon M. Lutz, Pamela A. F. Madden, Nicholas G. Martin, Matt McGue, Matthew B. McQueen, Sarah E. Medland, Andres Metspalu, Karen L. Mohlke, Jonas B. Nielsen, Yukinori Okada, Ulrike Peters, Tinca J. C. Polderman, Danielle Posthuma, Alexander P. Reiner, John P. Rice, Eric B. Rimm, Richard J. Rose, Valgerdur Runarsdottir, Michael C. Stallings, Alena Stančáková, Hreinn Stefansson, Khanh K. Thai, Hilary A. Tindle, Thorarinn Tyrfingsson, Tamara L. Wall, David R. Weir, Constance Weisner, John Whitfield, Bendik Slagsvold Winsvold, Jie Yin, Luisa Zuccolo, Laura J. Bierut, Kristian Hveem, James J. Lee, Marcus R. Munafò, Nancy L. Saccone, Cristen J. Willer, Marilyn C. Cornelis, Sean P. David, David A. Hinds, Eric Jorgenson, Jaakko Kaprio, Jerry A. Stitzel, Kari Stefansson, Thorgeir E. Thorgeirsson, Gonçalo R. Abecasis, Liu Dajiang J, Vrieze Scott 
16 Jan 2019
TL;DR: Files include summary statistics for associations with each phenotype: Drinks per week, Cigarettes per day, Smoking initiation, Smoking cessation, and Age of initiation.
Abstract: Files include summary statistics for associations with each phenotype: Drinks per week, Cigarettes per day, Smoking initiation, Smoking cessation, and Age of initiation. Details for each file can be found in the readme file or in the article's Supplementary Text.

Journal ArticleDOI
TL;DR: A meta‐analysis framework that uses summary statistics to test for association between multiple continuous phenotypes and variants in a region of interest and can improve the power of detection from 23% to 38% on average over single phenotype‐based meta‐ analysis approaches is developed.
Abstract: The power of genetic association analyses can be increased by jointly meta-analyzing multiple correlated phenotypes. Here, we develop a meta-analysis framework, Meta-MultiSKAT, that uses summary statistics to test for association between multiple continuous phenotypes and variants in a region of interest. Our approach models the heterogeneity of effects between studies through a kernel matrix and performs a variance component test for association. Using a genotype kernel, our approach can test for rare-variants and the combined effects of both common and rare-variants. To achieve robust power, within Meta-MultiSKAT, we developed fast and accurate omnibus tests combining different models of genetic effects, functional genomic annotations, multiple correlated phenotypes, and heterogeneity across studies. In addition, Meta-MultiSKAT accommodates situations where studies do not share exactly the same set of phenotypes or have differing correlation patterns among the phenotypes. Simulation studies confirm that Meta-MultiSKAT can maintain the type-I error rate at the exome-wide level of 2.5 × 10-6 . Further simulations under different models of association show that Meta-MultiSKAT can improve the power of detection from 23% to 38% on average over single phenotype-based meta-analysis approaches. We demonstrate the utility and improved power of Meta-MultiSKAT in the meta-analyses of four white blood cell subtype traits from the Michigan Genomics Initiative (MGI) and SardiNIA studies.

Posted ContentDOI
28 Aug 2019-bioRxiv
TL;DR: A statistical framework and computational tool is presented to integrate heterogeneous annotations with GWAS summary statistics for gene-based analysis, applied with comprehensive coding and tissue-specific regulatory annotations, and it is found that incorporating heterogeneous Annotations in gene- based association analysis increases power and performance identifying causal genes.
Abstract: Gene-based association tests aggregate genotypes across multiple variants for each gene, providing an interpretable gene-level analysis framework for genome-wide association studies (GWAS). Early gene-based test applications often focused on rare coding variants; a more recent wave of gene-based methods, e.g. TWAS, use eQTLs to interrogate regulatory associations. Regulatory variants are expected to be particularly valuable for gene-based analysis, since most GWAS associations to date are non-coding. However, identifying causal genes from regulatory associations remains challenging and contentious. Here, we present a statistical framework and computational tool to integrate heterogeneous annotations with GWAS summary statistics for gene-based analysis, applied with comprehensive coding and tissue-specific regulatory annotations. We compare power and accuracy identifying causal genes across single-annotation, omnibus, and annotation-agnostic gene-based tests in simulation studies and an analysis of 128 traits from the UK Biobank, and find that incorporating heterogeneous annotations in gene-based analysis increases power and performance identifying causal genes.

Journal ArticleDOI
TL;DR: Compared to existing methods, this approach can estimate the proportion of contaminating DNA more accurately, eliminate the need for external databases of allele frequencies, and provide contamination estimates that are more robust to the ancestral origin of the contaminating sample.
Abstract: Array genotyping is a cost-effective and widely used tool that enables assessment of up to millions of genetic markers in hundreds of thousands of individuals. Genotyping array data are typically highly accurate but sensitive to mixing of DNA samples from multiple individuals before or during genotyping. Contaminated samples can lead to genotyping errors and consequently cause false positive signals or reduce power of association analyses. Here, we propose a new method to identify contaminated samples and the sources of contamination within a genotyping batch. Through analysis of array intensity and genotype data from intentionally mixed samples and 22,366 samples of the Michigan Genomics Initiative, an ongoing biobank-based study, we show that our method can reliably estimate contamination. We also show that identifying sources of contamination can implicate problematic sample processing steps and guide process improvements. Compared to existing methods, our approach can estimate the proportion of contaminating DNA more accurately, eliminate the need for external databases of allele frequencies, and provide contamination estimates that are more robust to the ancestral origin of the contaminating sample.

Posted ContentDOI
03 Oct 2019-bioRxiv
TL;DR: In this paper, the authors investigated associations of exome-array variants in up to 144,060 individuals without diabetes of multiple ancestries, and found that a novel FG/FI association at the liver-enriched G6PC transcript was driven by multiple rare loss-of-function variants.
Abstract: Summary Metabolic dysregulation in multiple tissues alters glucose homeostasis and influences risk for type 2 diabetes (T2D). To identify pathways and tissues influencing T2D-relevant glycemic traits (fasting glucose [FG], fasting insulin [FI], two-hour glucose [2hGlu] and glycated hemoglobin [HbA1c]), we investigated associations of exome-array variants in up to 144,060 individuals without diabetes of multiple ancestries. Single-variant analyses identified novel associations at 21 coding variants in 18 novel loci, whilst gene-based tests revealed signals at two genes, TF (HbA1c) and G6PC (FG, FI). Pathway and tissue enrichment analyses of trait-associated transcripts confirmed the importance of liver and kidney for FI and pancreatic islets for FG regulation, implicated adipose tissue in FI and the gut in 2hGlu, and suggested a role for the non-endocrine pancreas in glucose homeostasis. Functional studies demonstrated that a novel FG/FI association at the liver-enriched G6PC transcript was driven by multiple rare loss-of-function variants. The FG/HbA1c-associated, islet-specific G6PC2 transcript also contained multiple rare functional variants, including two alleles within the same codon with divergent effects on glucose levels. Our findings highlight the value of integrating genomic and functional data to maximize biological inference. Highlights 23 novel coding variant associations (single-point and gene-based) for glycemic traits 51 effector transcripts highlighted different pathway/tissue signatures for each trait The exocrine pancreas and gut influence fasting and 2h glucose, respectively Multiple variants in liver-enriched G6PC and islet-specific G6PC2 influence glycemia

Journal ArticleDOI
TL;DR: Indels were the most likely causal variant in seven loci, including one locus associated with monocyte count where an indel with causality and mechanism previously demonstrated had a 0.999 posterior probability.
Abstract: It is unclear whether insertions and deletions (indels) are more likely to influence complex traits than abundant single-nucleotide polymorphisms (SNPs). We sought to understand which category of variation is more likely to impact health. Using the SardiNIA study as an exemplar, we characterized 478,876 common indels and 8,246,244 common SNPs in up to 5,949 well-phenotyped individuals from an isolated valley in Sardinia. We assessed association between 120 traits, resulting in 89 nonoverlapping-associated loci.We evaluated whether indels were enriched among credible sets of potential causal variants. These credible sets included 1,319 SNPs and 88 indels. We did not find indels to be significantly enriched. Indels were the most likely causal variant in seven loci, including one locus associated with monocyte count where an indel with causality and mechanism previously demonstrated (rs200748895:TGCTG/T) had a 0.999 posterior probability. Overall, our results show a very modest and nonsignificant enrichment for common indels in associated loci.

Posted ContentDOI
Jonas B. Nielsen1, Oren Rom1, Ida Surakka1, Sarah E. Graham1, Wei Zhou2, Wei Zhou3, Wei Zhou1, Lars G. Fritsche4, Lars G. Fritsche1, Sarah A Gagliano Taliun1, C Sidore5, Yuhao Liu1, Maiken Elvestad Gabrielsen4, Anne Heidi Skogholt4, Brooke N. Wolford1, William Overton1, Whitney E. Hornsby1, Akua Acheampong1, Austen Grooms1, Tanmoy Roychowdhury1, Amanda Schaefer1, Gregory J.M. Zajac3, Luis Villacorta1, Jifeng Zhang1, Ben Michael Brumpton4, Mari Løset4, Vivek Rai1, Kent D. Taylor6, Nicholette D. Palmer7, Yii Der Chen6, Seung Hoan Choi3, Steven A. Lubitz2, Steven A. Lubitz3, Patrick T. Ellinor3, Patrick T. Ellinor2, Kathleen C. Barnes8, Michelle Daya8, Nicholas M. Rafaels8, Scott T. Weiss2, Jessica Lasky-Su2, Russell P. Tracy9, Ramachandran S. Vasan10, Ramachandran S. Vasan11, L. Adrienne Cupples11, L. Adrienne Cupples10, Rasika A. Mathias12, Lisa R. Yanek12, Lewis C. Becker12, Patricia A. Peyser1, Lawrence F. Bielak1, Jennifer A. Smith1, Stella Aslibekyan13, Bertha A. Hildalgo13, Donna K. Arnett14, Marguerite R. Irvin13, James G. Wilson15, Solomon K. Musani15, Adolfo Correa15, Stephen S. Rich16, Xiuqing Guo6, Jerome I. Rotter6, Barbara A. Konkle17, Jill M. Johnsen17, Allison E. Ashley-Koch18, Marilyn J. Telen18, Vivien A. Sheehan19, John Blangero20, Joanne E. Curran20, Juan M. Peralta20, Courtney G. Montgomery21, Wayne Huey-Herng Sheu, Ren-Hua Chung22, Karen Schwander23, Seyed Mehdi Nouraie24, Victor R. Gordeuk25, Yingze Zhang24, Charles Kooperberg26, Alexander P. Reiner26, Alexander P. Reiner17, Rebecca D. Jackson27, Eugene R. Bleecker28, Deborah A. Meyers28, Xingnan Li28, Sayantan Das1, Ketian Yu1, Jonathon LeFaive1, Albert V. Smith1, Thomas W. Blackwell1, Daniel Taliun1, Sebastian Zöllner1, Lukas Forer29, Sebastian Schoenherr29, Christian Fuchsberger1, Anita Pandit1, Matthew Zawistowski1, Sachin Kheterpal1, Chad M. Brummett1, Pradeep Natarajan3, Pradeep Natarajan2, David Schlessinger11, Seunggeun Lee1, Hyun Min Kang1, Francesco Cucca4, Francesco Cucca5, Oddgeir L. Holmen4, Bjørn Olav Åsvold4, Michael Boehnke1, Sekar Kathiresan3, Sekar Kathiresan2, Gonçalo R. Abecasis30, Gonçalo R. Abecasis1, Y. Eugene Chen1, Cristen J. Willer4, Cristen J. Willer1, Kristian Hveem4 
02 Apr 2019-bioRxiv
TL;DR: It is demonstrated that simultaneous consideration of multiple phenotypes and a focus on rare protein-altering variants may identify promising therapeutic targets may identifyPromising therapeutic targets in cardiovascular diseases.
Abstract: SUMMARY Cardiovascular diseases (CVD), and in particular cerebrovascular and ischemic heart diseases, are leading causes of death globally.1 Lowering circulating lipids is an important treatment strategy to reduce risk.2,3 However, some pharmaceutical mechanisms of reducing CVD may increase risk of fatty liver disease or other metabolic disorders.4,5,6 To identify potential novel therapeutic targets, which may reduce risk of CVD without increasing risk of metabolic disease, we focused on the simultaneous evaluation of quantitative traits related to liver function and CVD. Using a combination of low-coverage (5×) whole-genome sequencing and targeted genotyping, deep genotype imputation based on the TOPMed reference panel7, and genome-wide association study (GWAS) meta-analysis, we analyzed 12 liver-related blood traits (including liver enzymes, blood lipids, and markers of iron metabolism) in up to 203,476 people from three population-based cohorts of different ancestries. We identified 88 likely causal protein-altering variants that were associated with one or more liver-related blood traits. We identified several loss-of-function (LoF) variants reducing low-density lipoprotein cholesterol (LDL-C) or risk of CVD without increased risk of liver disease or diabetes, including variants in known lipid genes (e.g. APOB, LPL). A novel LoF variant, ZNF529:p.K405X, was associated with decreased levels of LDL-C (P=1.3×10−8) but demonstrated no association with liver enzymes or non-fasting blood glucose levels. Silencing of ZNF529 in human hepatocytes resulted in upregulation of LDL receptor (LDLR) and increased LDL-C uptake in the cells, suggesting that inhibition of ZNF529 or its gene product could be used for treating hypercholesterolemia and hence reduce the risk of CVD. Taken together, we demonstrate that simultaneous consideration of multiple phenotypes and a focus on rare protein-altering variants may identify promising therapeutic targets.

Posted ContentDOI
19 Apr 2019-bioRxiv
TL;DR: These findings provide tentative evidence that daytime napping may reduce AD risk, however, findings should be replicated using independent samples.
Abstract: INTRODUCTION It is established that Alzheimer’s disease (AD) patients experience sleep disruption. However, it remains unknown whether disruption in the quantity, quality or timing of sleep is a risk factor for the onset of AD. METHODS Mendelian randomization (MR) was used to estimate the causal effect of self-reported and accelerometer-measured sleep parameters (chronotype, duration, fragmentation, insomnia, daytime napping and daytime sleepiness) on AD risk. RESULTS Overall, there was little evidence that sleep traits affect the risk of AD. There was some evidence to suggest that self-reported daytime napping was associated with lower AD risk (odds ratio [OR]: 0.70, 95% confidence interval [CI]: 0.50 to 0.99). Some other sleep traits (accelerometer-measured eveningness and sleep duration, and self-reported daytime sleepiness) had ORs for AD risk of a similar magnitude to daytime napping, but were less precisely estimated. DISCUSSON Our findings provide tentative evidence that daytime napping may reduce AD risk. However, findings should be replicated using independent samples.

Posted ContentDOI
30 Mar 2019-bioRxiv
TL;DR: A meta-analysis framework that uses summary statistics to test for association between multiple continuous phenotypes and variants in a region of interest and demonstrates the utility and improved power of Meta-MultiSKAT in the meta-analyses of four white blood cell subtype traits.
Abstract: The power of genetic association analyses can be increased by jointly meta-analyzing multiple correlated phenotypes. Here, we develop a meta-analysis framework, Meta-MultiSKAT, that uses summary statistics to test for association between multiple continuous phenotypes and variants in a region of interest. Our approach models the heterogeneity of effects between studies through a kernel matrix and performs a variance component test for association. Using a genotype kernel, our approach can test for rare-variants and the combined effects of both common and rare-variants. To achieve robust power, within Meta-MultiSKAT, we developed fast and accurate omnibus tests combining different models of genetic effects, functional genomic annotations, multiple correlated phenotypes and heterogeneity across studies. Additionally, Meta-MultiSKAT accommodates situations where studies do not share exactly the same set of phenotypes or have differing correlation patterns among the phenotypes. Simulation studies confirm that Meta-MultiSKAT can maintain type-I error rate at exome-wide level of 2.5×10−6. Further simulations under different models of association show that Meta-MultiSKAT can improve power of detection from 23% to 38% on average over single phenotype-based meta-analysis approaches. We demonstrate the utility and improved power of Meta-MultiSKAT in the meta-analyses of four white blood cell subtype traits from the Michigan Genomics Initiative (MGI) and SardiNIA studies.

Posted ContentDOI
19 Feb 2019-bioRxiv
TL;DR: It is demonstrated that the human single nucleotide mutation rate is similar across numerous human ancestries and populations, and a reduced mutation rate in the Amish founder population is discovered, which shows that mutation rates can shift rapidly.
Abstract: de novo Mutations (DNMs), or mutations that appear in an individual despite not being seen in their parents, are an important source of genetic variation whose impact is relevant to studies of human evolution, genetics, and disease. Utilizing high-coverage whole genome sequencing data as part of the Trans-Omics for Precision Medicine (TOPMed) program, we directly estimate and analyze DNM counts, rates, and spectra from 1,465 trios across an array of diverse human populations. Using the resulting call set of 86,865 single nucleotide DNMs, we find a significant positive correlation between local recombination rate and local DNM rate, which together can explain up to 35.5% of the genome-wide variation in population level rare genetic variation from 41K unrelated TOPMed samples. While genome-wide heterozygosity does correlate weakly with DNM count, we do not find significant differences in DNM rate between individuals of European, African, and Latino ancestry, nor across ancestrally distinct segments within admixed individuals. However, interestingly, we do find significantly fewer DNMs in Amish individuals compared with other Europeans, even after accounting for parental age and sequencing center. Specifically, we find significant reductions in the number of T→C mutations in the Amish, which seems to underpin their overall reduction in DNMs. Finally, we calculate near-zero estimates of narrow sense heritability (h2), which suggest that variation in DNM rate is significantly shaped by non-additive genetic effects and/or the environment, and that a less mutagenic environment may be responsible for the reduced DNM rate in the Amish. Significance Here we provide one of the largest and most diverse human de novo mutation (DNM) call sets to date, and use it to quantify the genome-wide relationship between local mutation rate and population-level rare genetic variation. While we demonstrate that the human single nucleotide mutation rate is similar across numerous human ancestries and populations, we also discover a reduced mutation rate in the Amish founder population, which shows that mutation rates can shift rapidly. Finally, we find that variation in mutation rates is not heritable, which suggests that the environment may influence mutation rates more significantly than previously realized.

Posted ContentDOI
03 Jun 2019-bioRxiv
TL;DR: In this paper, the authors reported the first whole-genome sequence analysis of sleep-disordered breathing (SDB) through the NHLBI Trans-Omics for Precision Medicine (TOPMed) program.
Abstract: Sleep-disordered breathing (SDB) is a common disorder associated with significant morbidity. Through the NHLBI Trans-Omics for Precision Medicine (TOPMed) program we report the first whole-genome sequence analysis of SDB. We identified 4 rare gene-based associations with SDB traits in 7,988 individuals of diverse ancestry and 4 replicated common variant associations with inclusion of additional samples (n=13,257). We identified a multi-ethnic set-based rare-variant association (p = 3.48 × 10−8) on chromosome X with ARMCX3. Transcription factor binding site enrichment identified associations with genes implicated with respiratory and craniofacial traits. Results highlighted associations in genes that modulate lung development, inflammation, respiratory rhythmogenesis and HIF1A-mediated hypoxic response.