Showing papers by "Michael Boehnke published in 2020"
••
University of North Carolina at Chapel Hill1, University of Massachusetts Amherst2, University of Oxford3, The Chinese University of Hong Kong4, Kyoto University5, Seoul National University Hospital6, Vanderbilt University Medical Center7, National University of Singapore8, Academia Sinica9, Nagoya University10, University of California, Los Angeles11, Los Angeles Biomedical Research Institute12, Hallym University13, University of Texas Health Science Center at Houston14, University of Michigan15, Wake Forest University16, Genentech17, Imperial College London18, London North West Healthcare NHS Trust19, University of Manchester20, University of Liverpool21, Kyushu University22, Peking Union Medical College23, National Taiwan University24, University of Minnesota25, Chinese National Human Genome Center26, National Defense Medical Center27, Tri-Service General Hospital28, Taipei Veterans General Hospital29, National Yang-Ming University30, Jichi Medical University31, Heidelberg University32, University of Tokyo33, Osaka University34, Agency for Science, Technology and Research35, University of the Ryukyus36, Ehime University37, Yonsei University38, Samsung Medical Center39, University of San Carlos40, Peking University41, Macau University of Science and Technology42, China Medical University (Taiwan)43, Shanghai Jiao Tong University44, Kurume University45, University of Pittsburgh46, Capital Medical University47, New Generation University College48, Seoul National University49
TL;DR: A meta-analysis of genome-wide association study data from 77,418 individuals of East Asian ancestry with type 2 diabetes identifies novel variants associated with increased risk of type 2abetes in both East Asian and European populations.
Abstract: Meta-analyses of genome-wide association studies (GWAS) have identified more than 240 loci that are associated with type 2 diabetes (T2D)1,2; however, most of these loci have been identified in analyses of individuals with European ancestry. Here, to examine T2D risk in East Asian individuals, we carried out a meta-analysis of GWAS data from 77,418 individuals with T2D and 356,122 healthy control individuals. In the main analysis, we identified 301 distinct association signals at 183 loci, and across T2D association models with and without consideration of body mass index and sex, we identified 61 loci that are newly implicated in predisposition to T2D. Common variants associated with T2D in both East Asian and European populations exhibited strongly correlated effect sizes. Previously undescribed associations include signals in or near GDAP1, PTF1A, SIX3, ALDH2, a microRNA cluster, and genes that affect the differentiation of muscle and adipose cells3. At another locus, expression quantitative trait loci at two overlapping T2D signals affect two genes-NKX6-3 and ANK1-in different tissues4-6. Association studies in diverse populations identify additional loci and elucidate disease-associated genes, biology, and pathways.
218 citations
••
University Medical Center Groningen1, Erasmus University Rotterdam2, Katholieke Universiteit Leuven3, Chinese Academy of Sciences4, University of Surrey5, King's College London6, University of Toronto7, Mount Sinai Hospital8, Avera Health9, Karolinska Institutet10, Saint Petersburg State University of Information Technologies, Mechanics and Optics11, University of Copenhagen12, Greifswald University Hospital13, University of Kiel14, Albert Einstein College of Medicine15, Sungkyunkwan University16, University of Tartu17, Weizmann Institute of Science18, Copenhagen University Hospital19, University of Texas Health Science Center at Houston20, University of Alabama at Birmingham21, Stockholm University22, University of Michigan23, VU University Amsterdam24, University of Oxford25, University of Bristol26, University of Amsterdam27, Maastricht University28, University of California, San Diego29, University of Eastern Finland30, National Institutes of Health31, University of California, Berkeley32, University of Milan33, Harvard University34, Radboud University Nijmegen35, University of North Carolina at Chapel Hill36, Ewha Womans University37, Fred Hutchinson Cancer Research Center38, National Research Council39
TL;DR: A phenome-wide association study and Mendelian randomization identified enrichment of microbiome trait loci in the metabolic, nutrition and environment domains and suggested the microbiome has causal effects in ulcerative colitis and rheumatoid arthritis.
Abstract: To study the effect of host genetics on gut microbiome composition, the MiBioGen consortium curated and analyzed genome-wide genotypes and 16S fecal microbiome data from 18,340 individuals (24 cohorts). Microbial composition showed high variability across cohorts: only 9 out of 410 genera were detected in more than 95% samples. A genome-wide association study (GWAS) of host genetic variation in relation to microbial taxa identified 31 loci affecting microbiome at a genome-wide significant (P
210 citations
••
TL;DR: Genomic feature, gene-expression and gene-set analyses revealed distinct biological signatures for each trait, highlighting different underlying biological pathways, increasing understanding of diabetes pathophysiology by use of trans-ancestry studies for improved power and resolution.
Abstract: Glycaemic traits are used to diagnose and monitor type 2 diabetes, and cardiometabolic health. To date, most genetic studies of glycaemic traits have focused on individuals of European ancestry. Here, we aggregated genome-wide association studies in up to 281,416 individuals without diabetes (30% non-European ancestry) with fasting glucose, 2h-glucose post-challenge, glycated haemoglobin, and fasting insulin data. Trans-ancestry and single-ancestry meta-analyses identified 242 loci (99 novel; P
158 citations
••
Broad Institute1, Harvard University2, King's College London3, University of California, San Francisco4, University of California, Los Angeles5, Semel Institute for Neuroscience and Human Behavior6, University of Michigan7, University of Alabama at Birmingham8, Cincinnati Children's Hospital Medical Center9, Wake Forest University10, Brigham and Women's Hospital11, Howard Hughes Medical Institute12, SUNY Downstate Medical Center13, Genentech14
TL;DR: Sexual dimorphism in genetic vulnerability to schizophrenia, systemic lupus erythematosus and Sjögren’s syndrome is linked to differential protein abundance from alleles of complement component 4, which implicate the complement system as a source of sexual dimorphisms in vulnerability to diverse illnesses.
Abstract: Many common illnesses, for reasons that have not been identified, differentially affect men and women. For instance, the autoimmune diseases systemic lupus erythematosus (SLE) and Sjogren's syndrome affect nine times more women than men1, whereas schizophrenia affects men with greater frequency and severity relative to women2. All three illnesses have their strongest common genetic associations in the major histocompatibility complex (MHC) locus, an association that in SLE and Sjogren's syndrome has long been thought to arise from alleles of the human leukocyte antigen (HLA) genes at that locus3-6. Here we show that variation of the complement component 4 (C4) genes C4A and C4B, which are also at the MHC locus and have been linked to increased risk for schizophrenia7, generates 7-fold variation in risk for SLE and 16-fold variation in risk for Sjogren's syndrome among individuals with common C4 genotypes, with C4A protecting more strongly than C4B in both illnesses. The same alleles that increase risk for schizophrenia greatly reduce risk for SLE and Sjogren's syndrome. In all three illnesses, C4 alleles act more strongly in men than in women: common combinations of C4A and C4B generated 14-fold variation in risk for SLE, 31-fold variation in risk for Sjogren's syndrome, and 1.7-fold variation in schizophrenia risk among men (versus 6-fold, 15-fold and 1.26-fold variation in risk among women, respectively). At a protein level, both C4 and its effector C3 were present at higher levels in cerebrospinal fluid and plasma8,9 in men than in women among adults aged between 20 and 50 years, corresponding to the ages of differential disease vulnerability. Sex differences in complement protein levels may help to explain the more potent effects of C4 alleles in men, women's greater risk of SLE and Sjogren's syndrome and men's greater vulnerability to schizophrenia. These results implicate the complement system as a source of sexual dimorphism in vulnerability to diverse illnesses.
130 citations
••
TL;DR: PheWeb is an easy-to-use open-source web-based tool for visualizing, navigating and sharing GWAS and PheWAS results, used to explore association results for large datasets such as the UK Biobank5 and the Michigan Genomics Initiative, and organizes relationships between traits on the basis of pairwise genetic correlations.
Abstract: To the Editor — Advances in genotyping and sequencing technologies, the growing availability of electronic health records for research use and the emergence of population-scale cohorts are enabling large studies to collect copious amounts of both phenotype and genotype data. Studies can collect thousands of traits measured across hundreds of thousands of individuals, each assessed at millions of genetic variants. These resources enable genomeand phenome-wide association studies (GWAS and PheWAS, respectively) at increasing scales and can generate high-dimensional results that can provide insights into many aspects of human genetics and biology. However, navigating these association results can be challenging and cumbersome. To aid in generating and testing hypotheses for the mechanisms underlying complex traits, these results should be organized in an intuitive, easily navigable manner. The current standards in the field are to use Manhattan1 and LocusZoom2 plots to review single-trait results and to use PheWAS3,4 plots to summarize results across many traits. The ability of investigators to explore their own data by alternating between these two view types, is an increasingly common feature of large-scale association analyses. Therefore, we developed PheWeb, an easy-to-use open-source web-based tool for visualizing, navigating and sharing GWAS and PheWAS results. We have used PheWeb to explore association results for large datasets such as the UK Biobank5 (http://pheweb.org/ MGI-freeze2/) and the Michigan Genomics Initiative (MGI; http://pheweb.sph.umich. edu/MGI-freeze2). The PheWeb instance populated with the UK Biobank summary statistics displays 28 million genetic markers assessed across 1,403 binary traits for 408,961 white British participants6 (Supplementary Note). Others have used PheWeb to explore large sets of association results, such as the Oxford Brain Imaging Genetics Project (http://big.stats.ox.ac.uk) and the new computationally efficient association tool fastGWA (http://fastgwa. info/ukbimp). PheWeb provides automated data processing and an interactive web interface for exploratory analysis. The data-processing pipeline loads and harmonizes association summary statistics (recipes are provided for the output of many common tools; Supplementary Note); organizes relationships between traits on the basis of pairwise genetic correlations; and annotates the variants. The web interface provides intuitive visualizations at three levels of granularity: genome-wide summaries at the trait level, and regional (LocusZoom)2 and phenome-wide summaries at the variant level (Supplementary Note and Supplementary Fig. 1). PheWeb links to relevant public databases (for example, the NHGRI-EBI GWAS Catalog7 and ClinVar8) to provide further information on a particular variant. Association results can be queried by trait, variant or gene. To facilitate collaboration, PheWeb visualizations can be shared through the URLs, and we are exploring the opportunity to enable collaborative annotation on each results page. PheWeb can help make meaningful discoveries. To illustrate this potential, Fig. 1 and Supplementary Figs. 2 and 3 illustrate different views of genetic association signals and key variants for bladder cancer in the UK Biobank association results in PheWeb. From the Manhattan plot (Fig. 1a), the strongest association on chromosome 5 is at rs4975616 (P = 9.9 × 10−11), which is located near the CLPTM1L gene. The regional view (Supplementary Fig. 4) highlights that rs4975616 and several of its proxies are in the GWAS Catalog and are associated with various cancers (for example, lung cancer, pancreatic cancer and basal cell carcinoma), thus suggesting a broad role of the locus in cancer susceptibility. The PheWAS view (Supplementary Fig. 5) further supports the association of the locus with a variety of cancers, including cancers of the skin and lung, and it links to multiple PubMed entries supporting the potential role of rs4975616 in lung and other cancers9. Interestingly, for the other top loci (both on chromosome 8), the regional and PheWAS views (rs2976384 in JRK and PSCA, Fig. 1b,c; rs10094872 near MYC, Supplementary Figs. 6 and 7) distinctly convey that these loci are not associated with skin and lung cancers, but are instead associated with gastric and urinary traits such as duodenal ulcer, urinary tract infection and pancreatic cancer. The quantile–quantile plot (Supplementary Fig. 2) shows that the significant associations for bladder cancer are driven by common (minor allele frequency >2%) variants. To help make connections among traits, PheWeb optionally displays pairwise genetic correlations across traits (Supplementary Fig. 3). For example, bladder cancer shows the strongest genetic correlation with cancer of urinary organs (r = 0.84, P = 7.6 × 10−7 from cross-trait linkage-disequilibrium-score regression10) and weaker correlations with tobacco-use disorder (r = 0.39, P = 0.006), an observation consistent with the role of smoking as a major risk factor for bladder cancer11. In our view, describing the spectrum of traits at each locus helps to identify loci that influence disease through similar mechanisms and to expose connections among traits, whether expected or unexpected. Clearly, further research is needed to make conclusive statements regarding the roles of these or any loci, including fine-mapping and colocalization approaches as well as refinement of phenotype definitions for the traits. To this end, PheWeb is designed to allow for gradual updating of result sets as new analyses are completed and refined. Although we believe that making data explorations and large sets of results broadly accessible and intuitive is extremely valuable, doing so does not obviate the need for further analysis, experimentation and biological follow-up. Thus, PheWeb is as useful as the data and results behind it, but we expect that these results will be much more useful when they are accessible. When interpreting the visualizations, users must consider existing biases in the underlying GWAS data, including but not limited to analyses conducted in restricted (ancestry) sets of individuals and suboptimal phenotype definitions. We welcome user feedback and feature requests through the PheWeb GitHub repository (https://github.com/statgen/ pheweb), which helps us enhance and tailor PheWeb to meet the needs of the research community. This repository includes a walk-through demonstration and easy-to-follow instructions for creating a PheWeb for one’s own data. The PheWeb codebase is not exclusive to displaying variant–trait associations, and it can be used to display other types of genome-wide
112 citations
••
A. Mesut Erzurumluoglu1, Mengzhen Liu2, Victoria E. Jackson1, Victoria E. Jackson3 +203 more•Institutions (65)
TL;DR: The novel loci will facilitate understanding the genetic aetiology of smoking behaviour and may lead to the identification of potential drug targets for smoking prevention and/or cessation.
Abstract: Smoking is a major heritable and modifiable risk factor for many diseases, including cancer, common respiratory disorders and cardiovascular diseases. Fourteen genetic loci have previously been ass ...
82 citations
••
University of Michigan1, University of Oxford2, Lund University3, National Institutes of Health4, Statens Serum Institut5, University of Helsinki6, Swiss Institute of Bioinformatics7, University of Geneva8, University of North Carolina at Chapel Hill9, University of Alberta10, Genentech11, University of Connecticut12
TL;DR: The relationship between genetic variants influencing predisposition to type 2 diabetes and related glycemic traits, and human pancreatic islet transcription is explored using data from 420 donors to illustrate the advantages of performing functional and regulatory studies in disease relevant tissues.
Abstract: Most signals detected by genome-wide association studies map to non-coding sequence and their tissue-specific effects influence transcriptional regulation. However, key tissues and cell-types required for functional inference are absent from large-scale resources. Here we explore the relationship between genetic variants influencing predisposition to type 2 diabetes (T2D) and related glycemic traits, and human pancreatic islet transcription using data from 420 donors. We find: (a) 7741 cis-eQTLs in islets with a replication rate across 44 GTEx tissues between 40% and 73%; (b) marked overlap between islet cis-eQTL signals and active regulatory sequences in islets, with reduced eQTL effect size observed in the stretch enhancers most strongly implicated in GWAS signal location; (c) enrichment of islet cis-eQTL signals with T2D risk variants identified in genome-wide association studies; and (d) colocalization between 47 islet cis-eQTLs and variants influencing T2D or glycemic traits, including DGKB and TCF7L2. Our findings illustrate the advantages of performing functional and regulatory studies in disease relevant tissues.
81 citations
••
TL;DR: The results identify a risk locus that contributes more strongly to SA than other phenotypes and suggest the existence of a shared genetic etiology between SA and known risk factors that is not mediated by psychiatric disorders.
Abstract: Suicide is a leading cause of death worldwide and non-fatal suicide attempts, which occur far more frequently, are a major source of disability and social and economic burden. Both are known to have a substantial genetic etiology, which is partially shared and partially distinct from that of related psychiatric disorders. We conducted a genome-wide association study (GWAS) of 29,782 suicide attempt (SA) cases and 519,961 controls in the International Suicide Genetics Consortium and conditioned the results on psychiatric disorders using GWAS summary statistics, to investigate their shared and divergent genetic architectures. Two loci reached genome-wide significance for SA: the major histocompatibility complex and an intergenic locus on chromosome 7, which remained associated after conditioning and has previously been implicated in risk-taking, smoking, and insomnia. SA showed strong genetic correlation with psychiatric disorders, particularly major depression, and also with smoking, lower socioeconomic status, pain, lower educational attainment, reproductive traits, risk-taking, sleep disturbances, and poorer overall general health. After conditioning, the genetic correlations between SA and psychiatric disorders decreased, whereas those with non-psychiatric traits remained largely unchanged. Our results identify a risk locus that contributes more strongly to SA than other phenotypes and suggest the existence of a shared genetic etiology between SA and known risk factors that is not mediated by psychiatric disorders.
68 citations
••
Norwegian University of Science and Technology1, Medical Research Council2, University of Michigan3, deCODE genetics4, Erasmus University Rotterdam5, Greifswald University Hospital6, University Medical Center Groningen7, University of Freiburg8, University of Pennsylvania9, Cardiff University10, University of Bristol11, University of Iceland12, Ohio State University13, University of Texas MD Anderson Cancer Center14, Radboud University Nijmegen15, University of Colorado Hospital16, University of Buenos Aires17, Nord-Trøndelag Hospital Trust18
TL;DR: A GWAS and two-sample Mendelian randomization using TSH index variants as instrumental variables suggests a protective effect of higher TSH levels (indicating lower thyroid function) on risk of thyroid cancer and goiter.
Abstract: Thyroid stimulating hormone (TSH) is critical for normal development and metabolism. To better understand the genetic contribution to TSH levels, we conduct a GWAS meta-analysis at 22.4 million genetic markers in up to 119,715 individuals and identify 74 genome-wide significant loci for TSH, of which 28 are previously unreported. Functional experiments show that the thyroglobulin protein-altering variants P118L and G67S impact thyroglobulin secretion. Phenome-wide association analysis in the UK Biobank demonstrates the pleiotropic effects of TSH-associated variants and a polygenic score for higher TSH levels is associated with a reduced risk of thyroid cancer in the UK Biobank and three other independent studies. Two-sample Mendelian randomization using TSH index variants as instrumental variables suggests a protective effect of higher TSH levels (indicating lower thyroid function) on risk of thyroid cancer and goiter. Our findings highlight the pleiotropic effects of TSH-associated variants on thyroid function and growth of malignant and benign thyroid tumors.
61 citations
01 Jan 2020
TL;DR: In this article, a fixed effects meta-analysis of up to 61 studies (up to 346,813 participants) was performed to investigate the association of SNVs with smoking behavior traits.
Abstract: Smoking is a major heritable and modifiable risk factor for many diseases, including cancer, common respiratory disorders and cardiovascular diseases. Fourteen genetic loci have previously been associated with smoking behaviour-related traits. We tested up to 235,116 single nucleotide variants (SNVs) on the exome-array for association with smoking initiation, cigarettes per day, pack-years, and smoking cessation in a fixed effects meta-analysis of up to 61 studies (up to 346,813 participants). In a subset of 112,811 participants, a further one million SNVs were also genotyped and tested for association with the four smoking behaviour traits. SNV-trait associations with P < 5 × 10−8 in either analysis were taken forward for replication in up to 275,596 independent participants from UK Biobank. Lastly, a meta-analysis of the discovery and replication studies was performed. Sixteen SNVs were associated with at least one of the smoking behaviour traits (P < 5 × 10−8) in the discovery samples. Ten novel SNVs, including rs12616219 near TMEM182, were followed-up and five of them (rs462779 in REV3L, rs12780116 in CNNM2, rs1190736 in GPR101, rs11539157 in PJA1, and rs12616219 near TMEM182) replicated at a Bonferroni significance threshold (P < 4.5 × 10−3) with consistent direction of effect. A further 35 SNVs were associated with smoking behaviour traits in the discovery plus replication meta-analysis (up to 622,409 participants) including a rare SNV, rs150493199, in CCDC141 and two low-frequency SNVs in CEP350 and HDGFRP2. Functional follow-up implied that decreased expression of REV3L may lower the probability of smoking initiation. The novel loci will facilitate understanding the genetic aetiology of smoking behaviour and may lead to the identification of potential drug targets for smoking prevention and/or cessation.
54 citations
••
Niamh Mullins1, Andreas J. Forstner2, Andreas J. Forstner3, Kevin S. O’Connell4 +284 more•Institutions (102)
TL;DR: This genome-wide association study (GWAS) of 41,917 BD cases and 371,549 controls identified 64 associated genomic loci, which provides the best-powered BD polygenic scores to date, when applied in both European and diverse ancestry samples.
Abstract: Bipolar disorder (BD) is a heritable mental illness with complex etiology. We performed a genome-wide association study (GWAS) of 41,917 BD cases and 371,549 controls of European ancestry, which identified 64 associated genomic loci. BD risk alleles were enriched in genes in synaptic signaling pathways and brain-expressed genes, particularly those with high specificity of expression in neurons of the prefrontal cortex and hippocampus. Significant signal enrichment was found in genes encoding targets of antipsychotics, calcium channel blockers, antiepileptics and anesthetics. Integrating eQTL data implicated 15 genes robustly linked to BD via gene expression, including druggable genes such as HTR6, MCHR1, DCLK3 and FURIN. This GWAS provides the best-powered BD polygenic scores to date, when applied in both European and diverse ancestry samples. Analyses of BD subtypes indicated high but imperfect genetic correlation between BD type I and II and identified additional associated loci. Together, these results advance our understanding of the biological etiology of BD, identify novel therapeutic leads and prioritize genes for functional follow-up studies.
••
TL;DR: A robust statistical method that accurately estimates DNA contamination and is agnostic to genetic ancestry of the intended or contaminating sample and integrates the estimation of genetic ancestry and DNA contamination in a unified likelihood framework by leveraging individual-specific allele frequencies projected from reference genotypes onto principal component coordinates.
Abstract: Detecting and estimating DNA sample contamination are important steps to ensure high-quality genotype calls and reliable downstream analysis. Existing methods rely on population allele frequency information for accurate estimation of contamination rates. Correctly specifying population allele frequencies for each individual in early stage of sequence analysis is impractical or even impossible for large-scale sequencing centers that simultaneously process samples from multiple studies across diverse populations. On the other hand, incorrectly specified allele frequencies may result in substantial bias in estimated contamination rates. For example, we observed that existing methods often fail to identify 10% contaminated samples at a typical 3% contamination exclusion threshold when genetic ancestry is misspecified. Such an incomplete screening of contaminated samples substantially inflates the estimated rate of genotyping errors even in deeply sequenced genomes and exomes. We propose a robust statistical method that accurately estimates DNA contamination and is agnostic to genetic ancestry of the intended or contaminating sample. Our method integrates the estimation of genetic ancestry and DNA contamination in a unified likelihood framework by leveraging individual-specific allele frequencies projected from reference genotypes onto principal component coordinates. Our method can also be used for estimating genetic ancestries, similar to LASER or TRACE, but simultaneously accounting for potential contamination. We demonstrate that our method robustly estimates contamination rates and genetic ancestries across populations and contamination scenarios. We further demonstrate that, in the presence of contamination, genetic ancestry inference can be substantially biased with existing methods that ignore contamination, while our method corrects for such biases.
••
Norwegian University of Science and Technology1, University of Michigan2, Copenhagen University Hospital3, University of Copenhagen4, Los Angeles Biomedical Research Institute5, Wake Forest University6, Broad Institute7, Harvard University8, University of Colorado Boulder9, Brigham and Women's Hospital10, University of Vermont11, Boston University12, Johns Hopkins University13, University of Alabama at Birmingham14, University of Kentucky15, University of Mississippi Medical Center16, University of Virginia17, University of Washington18, Duke University19, Baylor College of Medicine20, University of Texas at Austin21, Oklahoma Medical Research Foundation22, National Health Research Institutes23, Washington University in St. Louis24, University of Pittsburgh25, University of Illinois at Chicago26, Fred Hutchinson Cancer Research Center27, Ohio State University28, University of Arizona29, Innsbruck Medical University30, National Institutes of Health31, University of Sassari32, Regeneron33, Nord-Trøndelag Hospital Trust34
TL;DR: In this article, the authors performed genome-wide analyses of participants in the HUNT Study in Norway to search for protein-altering variants with beneficial impact on quantitative blood traits related to cardiovascular disease, but without detrimental impact on liver function.
Abstract: Pharmaceutical drugs targeting dyslipidemia and cardiovascular disease (CVD) may increase the risk of fatty liver disease and other metabolic disorders. To identify potential novel CVD drug targets without these adverse effects, we perform genome-wide analyses of participants in the HUNT Study in Norway (n = 69,479) to search for protein-altering variants with beneficial impact on quantitative blood traits related to cardiovascular disease, but without detrimental impact on liver function. We identify 76 (11 previously unreported) presumed causal protein-altering variants associated with one or more CVD- or liver-related blood traits. Nine of the variants are predicted to result in loss-of-function of the protein. This includes ZNF529:p.K405X, which is associated with decreased low-density-lipoprotein (LDL) cholesterol (P = 1.3 × 10-8) without being associated with liver enzymes or non-fasting blood glucose. Silencing of ZNF529 in human hepatoma cells results in upregulation of LDL receptor and increased LDL uptake in the cells. This suggests that inhibition of ZNF529 or its gene product should be prioritized as a novel candidate drug target for treating dyslipidemia and associated CVD.
••
TL;DR: Additional epidemiologic and genetic factors contributing to risk prediction are assessed, demonstrating that inclusion of common polygenic variation significantly improved biomarker estimation for two monogenic dyslipidemias.
Abstract: Hundreds of thousands of genetic variants have been reported to cause severe monogenic diseases, but the probability that a variant carrier will develop the disease (termed penetrance) is unknown for virtually all of them. Additionally, the clinical utility of common polygenetic variation remains uncertain. Using exome sequencing from 77,184 adult individuals (38,618 multi-ancestral individuals from a type 2 diabetes case-control study and 38,566 participants from the UK Biobank, for whom genotype array data were also available), we applied clinical standard-of-care gene variant curation for eight monogenic metabolic conditions. Rare variants causing monogenic diabetes and dyslipidemias displayed effect sizes significantly larger than the top 1% of the corresponding polygenic scores. Nevertheless, penetrance estimates for monogenic variant carriers averaged below 60% in both studies for all conditions except monogenic diabetes. We assessed additional epidemiologic and genetic factors contributing to risk prediction, demonstrating that inclusion of common polygenic variation significantly improved biomarker estimation for two monogenic dyslipidemias.
••
TL;DR: This work showed that the joint modeling approach provided an unbiased estimate of genetic effects, greatly improved the power of single-variant association tests among methods that can properly estimate allele effects, and enhanced gene-level tests over existing approaches.
Abstract: There is great interest in understanding the impact of rare variants in human diseases using large sequence datasets. In deep sequence datasets of >10,000 samples, ~10% of the variant sites are observed to be multi-allelic. Many of the multi-allelic variants have been shown to be functional and disease-relevant. Proper analysis of multi-allelic variants is critical to the success of a sequencing study, but existing methods do not properly handle multi-allelic variants and can produce highly misleading association results. We discuss practical issues and methods to encode multi-allelic sites, conduct single-variant and gene-level association analyses, and perform meta-analysis for multi-allelic variants. We evaluated these methods through extensive simulations and the study of a large meta-analysis of ~18,000 samples on the cigarettes-per-day phenotype. We showed that our joint modeling approach provided an unbiased estimate of genetic effects, greatly improved the power of single-variant association tests among methods that can properly estimate allele effects, and enhanced gene-level tests over existing approaches. Software packages implementing these methods are available online.
••
TL;DR: For poorly represented populations, sequencing a subset of participants is often most cost-effective, and can substantially increase imputation quality and GWAS power, and for populations that are well‐represented in existing reference panels, array genotyping alone is cost‐effective and well‐powered to detect common‐ and rare‐variant associations.
Abstract: A key aim for current genome-wide association studies (GWAS) is to interrogate the full spectrum of genetic variation underlying human traits, including rare variants, across populations. Deep whole-genome sequencing is the gold standard to fully capture genetic variation, but remains prohibitively expensive for large sample sizes. Array genotyping interrogates a sparser set of variants, which can be used as a scaffold for genotype imputation to capture a wider set of variants. However, imputation quality depends crucially on reference panel size and genetic distance from the target population. Here, we consider sequencing a subset of GWAS participants and imputing the rest using a reference panel that includes both sequenced GWAS participants and an external reference panel. We investigate how imputation quality and GWAS power are affected by the number of participants sequenced for admixed populations (African and Latino Americans) and European population isolates (Sardinians and Finns), and identify powerful, cost-effective GWAS designs given current sequencing and array costs. For populations that are well-represented in existing reference panels, we find that array genotyping alone is cost-effective and well-powered to detect common- and rare-variant associations. For poorly represented populations, sequencing a subset of participants is often most cost-effective, and can substantially increase imputation quality and GWAS power.
••
University Medical Center Groningen1, Erasmus University Rotterdam2, Katholieke Universiteit Leuven3, University of Surrey4, King's College London5, University of Toronto6, Karolinska Institutet7, University of Groningen8, University of Copenhagen9, Greifswald University Hospital10, University of Kiel11, Albert Einstein College of Medicine12, Sungkyunkwan University13, University of Tartu14, Weizmann Institute of Science15, Copenhagen University Hospital16, University of Texas Health Science Center at Houston17, University of Alabama at Birmingham18, Stockholm University19, University of Michigan20, VU University Amsterdam21, Mount Sinai Hospital, Toronto22, University of Oxford23, University of Bristol24, University of Amsterdam25, Maastricht University26, University of California, San Diego27, University of Eastern Finland28, National Institutes of Health29, University of California, Los Angeles30, Harvard University31, University of North Carolina at Chapel Hill32, Ewha Womans University33, Erasmus University Medical Center34, Monash University, Clayton campus35
TL;DR: A phenome-wide association study and Mendelian randomization analyses identified enrichment of microbiome trait loci SNPs in the metabolic, nutrition and environment domains and indicated food preferences and diseases as mediators of genetic effects.
Abstract: To study the effect of host genetics on gut microbiome composition, the MiBioGen consortium curated and analyzed whole-genome genotypes and 16S fecal microbiome data from 18,473 individuals (25 cohorts) Microbial composition showed high variability across cohorts: we detected only 9 out of 410 genera in more than 95% of the samples A genome-wide association study (GWAS) of host genetic variation in relation to microbial taxa identified 30 loci affecting microbome taxa at a genome-wide significant (P
••
Washington University in St. Louis1, Yale University2, University of Southern California3, University of Helsinki4, National Institute for Health and Welfare5, University of Eastern Finland6, University of Michigan7, Broad Institute8, Harvard University9, Semel Institute for Neuroscience and Human Behavior10
TL;DR: This study confirms that integrating SVs in trait-mapping studies will expand the knowledge of genetic factors underlying disease risk, and discovered 31 genome-wide significant associations at 15 loci at which SVs have strong phenotypic effects.
Abstract: The contribution of genome structural variation (SV) to quantitative traits associated with cardiometabolic diseases remains largely unknown. Here, we present the results of a study examining genetic association between SVs and cardiometabolic traits in the Finnish population. We used sensitive methods to identify and genotype 129,166 high-confidence SVs from deep whole genome sequencing (WGS) data of 4,848 individuals. We tested the 64,572 common and low frequency SVs for association with 116 quantitative traits, and tested candidate associations using exome sequencing and array genotype data from an additional 15,205 individuals. We discovered 31 genome-wide significant associations at 15 loci, including two novel loci at which SVs have strong phenotypic effects: (1) a deletion of the ALB gene promoter that is greatly enriched in the Finnish population and causes decreased serum albumin level in carriers (p=1.47x10-54), and is also associated with increased levels of total cholesterol (p=1.22x10-28) and 14 additional cholesterol-related traits, and (2) a multiallelic copy number variant (CNV) at PDPR that is strongly associated with pyruvate (p=4.81x10-21) and alanine (p=6.14x10-12) levels and resides within a structurally complex genomic region that has accumulated many rearrangements over evolutionary time. We also confirmed six previously reported associations, including five led by stronger signals in single nucleotide variants (SNVs), and one linking recurrent HP gene deletion and cholesterol levels (p=6.24x10-10), which was also found to be strongly associated with increased glycoprotein level (p=3.53x10-35). Our study confirms that integrating SVs in trait-mapping studies will expand our knowledge of genetic factors underlying disease risk.
••
King's College London1, University of Michigan2, University of North Carolina at Chapel Hill3, Harvard University4, National Institutes of Health5, University of Eastern Finland6, King Abdulaziz University7, University of Helsinki8, National Institute for Health and Welfare9, University of California, Los Angeles10, Minerva Foundation Institute for Medical Research11, Helsinki University Central Hospital12
TL;DR: It is demonstrated that at-risk individuals have lower background ACE2 levels in this highly relevant tissue, and further studies will be required to establish how this may contribute to increased COVID-19 severity.
Abstract: COVID-19 severity has varied widely, with demographic and cardio-metabolic factors increasing risk of severe reactions to SARS-CoV-2 infection, but the underlying mechanisms for this remain uncertain. We investigated phenotypic and genetic factors associated with subcutaneous adipose tissue expression of Angiotensin I Converting Enzyme 2 ( ACE2 ), which has been shown to act as a receptor for SARS-CoV-2 cellular entry. In a meta-analysis of three independent studies including up to 1,471 participants, lower adipose tissue ACE2 expression was associated with adverse cardio-metabolic health indices including type 2 diabetes (T2D) and obesity status, higher serum fasting insulin and BMI, and lower serum HDL levels (P<5.32x10 -4 ). ACE2 expression levels were also associated with estimated proportions of cell types in adipose tissue; lower ACE2 expression was associated with a lower proportion of microvascular endothelial cells (P=4.25x10 -4 ) and higher macrophage proportion (P=2.74x10 -5 ), suggesting a link to inflammation. Despite an estimated heritability of 32%, we did not identify any proximal or distal genetic variants (eQTLs) associated with adipose tissue ACE2 expression. Our results demonstrate that at-risk individuals have lower background ACE2 levels in this highly relevant tissue. Further studies will be required to establish how this may contribute to increased COVID-19 severity.
••
TL;DR: A statistical framework and computational tool is presented to integrate heterogeneous annotations with GWAS summary statistics for gene-based analysis, applied with comprehensive coding and tissue-specific regulatory annotations, and it is found that incorporating heterogeneous Annotations in gene- based association analysis increases power and performance identifying causal genes.
Abstract: Gene-based association tests aggregate genotypes across multiple variants for each gene, providing an interpretable gene-level analysis framework for genome-wide association studies (GWAS). Early gene-based test applications often focused on rare coding variants; a more recent wave of gene-based methods, e.g. TWAS, use eQTLs to interrogate regulatory associations. Regulatory variants are expected to be particularly valuable for gene-based analysis, since most GWAS associations to date are non-coding. However, identifying causal genes from regulatory associations remains challenging and contentious. Here, we present a statistical framework and computational tool to integrate heterogeneous annotations with GWAS summary statistics for gene-based analysis, applied with comprehensive coding and tissue-specific regulatory annotations. We compare power and accuracy identifying causal genes across single-annotation, omnibus, and annotation-agnostic gene-based tests in simulation studies and an analysis of 128 traits from the UK Biobank, and find that incorporating heterogeneous annotations in gene-based association analysis increases power and performance identifying causal genes.
01 Jan 2020
••
TL;DR: Fine-mapping and experimental validation demonstrated that multiple, distinct association signals at these loci can influence multiple transcripts through multiple molecular mechanisms.
Abstract: Loci identified in genome-wide association studies (GWAS) can include multiple distinct association signals. We sought to identify the molecular basis of multiple association signals for adiponectin, a hormone involved in glucose regulation secreted almost exclusively from adipose tissue, identified in the Metabolic Syndrome in Men (METSIM) study. With GWAS data for 9,262 men, four loci were significantly associated with adiponectin: ADIPOQ, CDH13, IRS1, and PBRM1. We performed stepwise conditional analyses to identify distinct association signals, a subset of which are also nearly independent (lead variant pairwise r2<0.01). Two loci exhibited allelic heterogeneity, ADIPOQ and CDH13. Of seven association signals at the ADIPOQ locus, two signals colocalized with adipose tissue expression quantitative trait loci (eQTLs) for three transcripts: trait-increasing alleles at one signal were associated with increased ADIPOQ and LINC02043, while trait-increasing alleles at the other signal were associated with decreased ADIPOQ-AS1. In reporter assays, adiponectin-increasing alleles at two signals showed corresponding directions of effect on transcriptional activity. Putative mechanisms for the seven ADIPOQ signals include a missense variant (ADIPOQ G90S), a splice variant, a promoter variant, and four enhancer variants. Of two association signals at the CDH13 locus, the first signal consisted of promoter variants, including the lead adipose tissue eQTL variant for CDH13, while a second signal included a distal intron 1 enhancer variant that showed ~2-fold allelic differences in transcriptional reporter activity. Fine-mapping and experimental validation demonstrated that multiple, distinct association signals at these loci can influence multiple transcripts through multiple molecular mechanisms.
••
University of Michigan1, Mayo Clinic2, Cleveland Clinic3, University of Colorado Denver4, University of Texas at Austin5, University of Texas Health Science Center at Houston6, University of California, San Francisco7, Brigham and Women's Hospital8, University of Washington9, Boston University10, Harvard University11, University of Mississippi Medical Center12, Los Angeles Biomedical Research Institute13, University of Alabama at Birmingham14, Tulane University15, Fred Hutchinson Cancer Research Center16, University of Virginia17, Johns Hopkins University School of Medicine18, University of Maryland, Baltimore19, Oklahoma Medical Research Foundation20, University of Mississippi21, Wake Forest University22, Vanderbilt University23, University of Pittsburgh24
TL;DR: This manuscript presents the Robust Unified Test for HWE (RUTH), a practical and scalable software tool to rapidly perform HWE tests across millions of markers and hundreds of thousands of individuals while supporting standard VCF/BCF formats, and demonstrates different tradeoffs between false positives and statistical power across the methods.
Abstract: Traditional Hardy-Weinberg equilibrium (HWE) tests (the χ2 test and the exact test) have long been used as a metric for evaluating genotype quality, as technical artifacts leading to incorrect genotype calls often can be identified as deviations from HWE However, in datasets comprised of individuals from diverse ancestries, HWE can be violated even without genotyping error, complicating the use of HWE testing to assess genotype data quality In this manuscript, we present the Robust Unified Test for HWE (RUTH) to test for HWE while accounting for population structure and genotype uncertainty, and evaluate the impact of population heterogeneity and genotype uncertainty on the standard HWE tests and alternative methods using simulated and real sequence datasets Our results demonstrate that ignoring population structure or genotype uncertainty in HWE tests can inflate false positive rates by many orders of magnitude Our evaluations demonstrate different tradeoffs between false positives and statistical power across the methods, with RUTH consistently amongst the best across all evaluations RUTH is implemented as a practical and scalable software tool to rapidly perform HWE tests across millions of markers and hundreds of thousands of individuals while supporting standard VCF/BCF formats RUTH is publicly available at https://wwwgithubcom/statgen/ruth
••
TL;DR: Joint calling is compared to the alternative of single‐study calling in terms of variant detection sensitivity and genotype accuracy as a function of sequencing coverage and their impact on downstream association analysis is assessed.
Abstract: Individual sequencing studies often have limited sample sizes and so limited power to detect trait associations with rare variants. A common strategy is to aggregate data from multiple studies. For studying rare variants, jointly calling all samples together is the gold standard strategy but can be difficult to implement due to privacy restrictions and computational burden. Here, we compare joint calling to the alternative of single-study calling in terms of variant detection sensitivity and genotype accuracy as a function of sequencing coverage and assess their impact on downstream association analysis. To do so, we analyze deep-coverage (~82×) exome and low-coverage (~5×) genome sequence data on 2,250 individuals from the Genetics of Type 2 Diabetes study jointly and separately within five geographic cohorts. For rare single nucleotide variants (SNVs): (a) ≥97% of discovered SNVs are found by both calling strategies; (b) nonreference concordance with a set of highly accurate genotypes is ≥99% for both calling strategies; (c) meta-analysis has similar power to joint analysis in deep-coverage sequence data but can be less powerful in low-coverage sequence data. Given similar data processing and quality control steps, we recommend single-study calling as a viable alternative to joint calling for analyzing SNVs of all minor allele frequency in deep-coverage data.
••
Washington University in St. Louis1, University of Eastern Finland2, University of Helsinki3, National Institute for Health and Welfare4, University of Southern California5, Semel Institute for Neuroscience and Human Behavior6, Harvard University7, Broad Institute8, University of Michigan9, Yale University10
TL;DR: Measurements of MT-CN in blood-derived DNA may primarily reflect differences in cell-type composition, and that these differences may be causally linked to insulin and related traits.
Abstract: Mitochondrial copy number is known to vary among humans and across tissues, and a population-based study of mitochondrial genome copy number (MT-CN) in blood – as estimated from genome sequencing data – observed strong (∼54%) heritability. However, the genetic causes and phenotypic consequences of MT-CN variation in humans are not well-studied. Here, we studied MT-CN variation in blood-derived DNA from 19,184 Finnish individuals with deep cardiometabolic trait measurements using a combination of whole genome (N = 4,163) and exome sequencing (N = 19,034) data as well as imputed array genotypes (N = 17,718). We confirmed that MT-CN in blood is highly heritable (31% by GREML). We identified two loci in the nuclear genome that are significantly associated with MT-CN variation: a common variant at the MYB-HBS1L locus (P = 1.6×10−8), which has previously been associated with numerous hematological parameters; and a burden of rare variants in the TMB1M1 gene (P = 3.0×10−8), which has been reported to protect against non-alcoholic fatty liver disease in model organisms. We also found that MT-CN is strongly associated with insulin levels (P = 2.0×10−21), fat mass (P = 4.5×10−16), and other related traits. Using a Mendelian randomization framework, we constructed a genetic instrument for MT-CN using penalized regression with adjustment for potentially confounding covariates and found a significant association with insulin levels, which suggests that our MT-CN measurement in blood may be causally related to metabolic syndrome. Finally, we computed our genetic instrument in UK Biobank participants and tested it against a set of cell count and cardiometabolic traits. We found significant associations between MT-CN and both neutrophil and platelet counts (P = 1.8×10−8 and P = 1.2×10−3). While the association between MT-CN and metabolic syndrome traits was replicated in the UK Biobank, adjusting for cell counts largely eliminated these signals, suggesting that MT-CN is actually a proxy measurement for neutrophil and platelet counts in its effect on metabolic syndrome. Taken together, these results suggest that measurements of MT-CN in blood-derived DNA may primarily reflect differences in cell-type composition, and that these differences may be causally linked to insulin and related traits. Author summary The number of mitochondria per cell is variable between tissues and between individuals, and prior studies have shown that mitochondrial genome copy number in blood (MT-CN) – as estimated indirectly from sequencing of blood-derived DNA – is a genetically determined trait that varies among humans. We studied genetic data from approximately 19,000 Finnish individuals and showed that MT-CN is significantly associated with insulin and related metabolic traits, providing evidence that MT-CN levels play a role in determining these traits. Consistent with a previous study, we showed that genetics play a significant role in determining blood-derived MT-CN. We also found new evidence linking several regions of the genome to MT-CN, including a gene known to be associated with non-alcoholic fatty liver disease. Finally, we found that in the link between blood-derived MT-CN and metabolic syndrome, MT-CN likely represents the relative quantities of circulating immune cells, providing further evidence for the role of inflammation in metabolic syndrome.
01 Jan 2020
••
TL;DR: It is shown that for both single- and multiple-variant tests, the power loss for ATR analogs increases with increasing stringency of Type 1 error control and increasing correlation between the genetic variant (or multiple variants) and covariates.
Abstract: Multiple linear regression is commonly used to test for association between genetic variants and continuous traits and estimate genetic effect sizes. Confounding variables are controlled for by including them as additional covariates. An alternative technique that is increasingly used is to regress out covariates from the raw trait and then perform regression analysis with only the genetic variants included as predictors. In the case of single-variant analysis, this adjusted trait regression (ATR) technique is known to be less powerful than the traditional technique when the genetic variant is correlated with the covariates We extend previous results for single-variant tests by deriving exact relationships between the single-variant score, Wald, likelihood-ratio, and F test statistics and their ATR analogs. We also derive the asymptotic power of ATR analogs of the multiple-variant score and burden tests. We show that the maximum power loss of the ATR analog of the multiple-variant score test is completely characterized by the canonical correlations between the set of genetic variants and the set of covariates. Further, we show that for both single- and multiple-variant tests, the power loss for ATR analogs increases with increasing stringency of Type 1 error control ( α ) and increasing correlation (or canonical correlations) between the genetic variant (or multiple variants) and covariates. We recommend using ATR only when maximum canonical correlation between variants and covariates is low, as is typically true.