scispace - formally typeset
Search or ask a question

Showing papers by "Michael Boehnke published in 2020"


Journal ArticleDOI
Cassandra N. Spracklen1, Cassandra N. Spracklen2, Momoko Horikoshi, Young Jin Kim, Kuang Lin3, Fiona Bragg3, Sanghoon Moon, Ken Suzuki, Claudia H. T. Tam4, Yasuharu Tabara5, Soo Heon Kwak6, Fumihiko Takeuchi, Jirong Long7, Victor Jun Yu Lim8, Jin-Fang Chai8, Chien-Hsiun Chen9, Masahiro Nakatochi10, Jie Yao11, Jie Yao12, Hyeok Sun Choi13, Apoorva K Iyengar1, Hannah J Perrin1, Sarah M Brotman1, Martijn van de Bunt3, Anna L. Gloyn, Jennifer E. Below7, Jennifer E. Below14, Michael Boehnke15, Donald W. Bowden16, John C. Chambers, Anubha Mahajan3, Anubha Mahajan17, Mark I. McCarthy, Maggie C.Y. Ng16, Maggie C.Y. Ng7, Lauren E. Petty7, Lauren E. Petty14, Weihua Zhang18, Weihua Zhang19, Andrew P. Morris20, Andrew P. Morris3, Andrew P. Morris21, Linda S. Adair1, Masato Akiyama22, Zheng Bian23, Juliana C.N. Chan, Li-Ching Chang9, Miao-Li Chee, Yii-Der Ida Chen11, Yii-Der Ida Chen12, Yuan-Tsong Chen9, Zhengming Chen3, Lee-Ming Chuang24, Shufa Du1, Penny Gordon-Larsen1, Myron D. Gross25, Xiuqing Guo12, Xiuqing Guo11, Yu Guo23, Sohee Han, Annie-Green Howard1, Wei Huang26, Yi-Jen Hung27, Yi-Jen Hung28, Mi Yeong Hwang, Chii-Min Hwu29, Chii-Min Hwu30, Sahoko Ichihara31, Masato Isono, Hye-Mi Jang, Guozhi Jiang4, Jost B. Jonas32, Yoichiro Kamatani33, Tomohiro Katsuya34, Takahisa Kawaguchi5, Chiea Chuen Khor35, Chiea Chuen Khor36, Katsuhiko Kohara37, Myung-Shik Lee38, Myung-Shik Lee39, Nanette R. Lee40, Liming Li41, Jianjun Liu8, Jianjun Liu35, Andrea O.Y. Luk4, Jun Lv41, Yukinori Okada34, Mark A Pereira25, Charumathi Sabanayagam8, Shi Jinxiu25, Dong Mun Shin, Wing-Yee So4, Atsushi Takahashi, Brian Tomlinson4, Brian Tomlinson42, Fuu Jen Tsai43, Rob M. van Dam8, Yong-Bing Xiang44, Ken Yamamoto45, Toshimasa Yamauchi33, Kyungheon Yoon, Canqing Yu41, Jian-Min Yuan46, Liang Zhang, Wei Zheng7, Michiya Igase37, Yoon Shin Cho13, Jerome I. Rotter12, Jerome I. Rotter11, Ya Xing Wang47, Wayne Huey-Herng Sheu27, Wayne Huey-Herng Sheu30, Mitsuhiro Yokota45, Jer-Yuarn Wu9, Ching-Yu Cheng8, Tien Yin Wong8, Xiao-Ou Shu7, Norihiro Kato, Kyong-Soo Park48, Kyong-Soo Park6, Kyong-Soo Park49, E-Shyong Tai8, Fumihiko Matsuda5, Woon-Puay Koh8, Ronald Cw Ma, Shiro Maeda36, Iona Y Millwood3, Ju Young Lee, Takashi Kadowaki33, Robin G. Walters3, Bong-Jo Kim, Karen L. Mohlke1, Xueling Sim8 
11 Jun 2020-Nature
TL;DR: A meta-analysis of genome-wide association study data from 77,418 individuals of East Asian ancestry with type 2 diabetes identifies novel variants associated with increased risk of type 2abetes in both East Asian and European populations.
Abstract: Meta-analyses of genome-wide association studies (GWAS) have identified more than 240 loci that are associated with type 2 diabetes (T2D)1,2; however, most of these loci have been identified in analyses of individuals with European ancestry. Here, to examine T2D risk in East Asian individuals, we carried out a meta-analysis of GWAS data from 77,418 individuals with T2D and 356,122 healthy control individuals. In the main analysis, we identified 301 distinct association signals at 183 loci, and across T2D association models with and without consideration of body mass index and sex, we identified 61 loci that are newly implicated in predisposition to T2D. Common variants associated with T2D in both East Asian and European populations exhibited strongly correlated effect sizes. Previously undescribed associations include signals in or near GDAP1, PTF1A, SIX3, ALDH2, a microRNA cluster, and genes that affect the differentiation of muscle and adipose cells3. At another locus, expression quantitative trait loci at two overlapping T2D signals affect two genes-NKX6-3 and ANK1-in different tissues4-6. Association studies in diverse populations identify additional loci and elucidate disease-associated genes, biology, and pathways.

218 citations


Posted ContentDOI
Alexander Kurilshikov1, Carolina Medina-Gomez2, Rodrigo Bacigalupe3, Djawad Radjabzadeh2, Jun Wang4, Ayse Demirkan1, Ayse Demirkan5, Caroline I. Le Roy6, Juan Antonio Raygoza Garay7, Juan Antonio Raygoza Garay8, Casey T. Finnicum9, Xingrong Liu10, Daria V. Zhernakova11, Daria V. Zhernakova1, Marc Jan Bonder1, Tue H. Hansen12, Fabian Frost13, Malte C. Rühlemann14, Williams Turpin8, Williams Turpin7, Jee-Young Moon15, Han-Na Kim16, Kreete Lüll17, Elad Barkan18, Shiraz A. Shah19, Myriam Fornage20, Joanna Szopinska-Tokov, Zachary D. Wallen21, Dmitrii Borisevich12, Lars Agréus10, Anna Andreasson22, Corinna Bang14, Larbi Bedrani7, Jordana T. Bell6, Hans Bisgaard19, Michael Boehnke23, Dorret I. Boomsma24, Robert D. Burk15, Annique Claringbould1, Kenneth Croitoru8, Kenneth Croitoru7, Gareth E. Davies24, Cornelia M. van Duijn25, Cornelia M. van Duijn2, Liesbeth Duijts2, Gwen Falony3, Jingyuan Fu1, Adriaan van der Graaf1, Torben Hansen12, Georg Homuth13, David A. Hughes26, Richard G. IJzerman27, Matthew A. Jackson6, Matthew A. Jackson25, Vincent W. V. Jaddoe2, Marie Joossens3, Torben Jørgensen12, Daniel Keszthelyi28, Rob Knight29, Markku Laakso30, Matthias Laudes, Lenore J. Launer31, Wolfgang Lieb14, Aldons J. Lusis32, Ad A.M. Masclee28, Henriette A. Moll2, Zlatan Mujagic28, Qi Qibin15, Daphna Rothschild18, Hocheol Shin16, Søren J. Sørensen12, Claire J. Steves6, Jonathan Thorsen19, Nicholas J. Timpson26, Raul Y. Tito3, Sara Vieira-Silva3, Uwe Völker13, Henry Völzke13, Urmo Võsa1, Kaitlin H Wade26, Susanna Walter33, Kyoko Watanabe24, Stefan Weiss13, Frank Ulrich Weiss13, Omer Weissbrod34, Harm-Jan Westra1, Gonneke Willemsen24, Haydeh Payami21, Daisy Jonkers28, Alejandro Arias Vasquez35, Eco J. C. de Geus24, Katie A. Meyer36, Jakob Stokholm19, Eran Segal18, Elin Org17, Cisca Wijmenga1, Hyung Lae Kim37, Robert C. Kaplan38, Tim D. Spector6, André G. Uitterlinden2, Fernando Rivadeneira2, Andre Franke14, Markus M. Lerch13, Lude Franke1, Serena Sanna1, Serena Sanna39, Mauro D'Amato, Oluf Pedersen12, Andrew D. Paterson7, Robert Kraaij2, Jeroen Raes3, Alexandra Zhernakova1 
16 Dec 2020-bioRxiv
TL;DR: A phenome-wide association study and Mendelian randomization identified enrichment of microbiome trait loci in the metabolic, nutrition and environment domains and suggested the microbiome has causal effects in ulcerative colitis and rheumatoid arthritis.
Abstract: To study the effect of host genetics on gut microbiome composition, the MiBioGen consortium curated and analyzed genome-wide genotypes and 16S fecal microbiome data from 18,340 individuals (24 cohorts). Microbial composition showed high variability across cohorts: only 9 out of 410 genera were detected in more than 95% samples. A genome-wide association study (GWAS) of host genetic variation in relation to microbial taxa identified 31 loci affecting microbiome at a genome-wide significant (P

210 citations


Posted ContentDOI
Ji Chen1, Ji Chen2, Cassandra N. Spracklen3, Cassandra N. Spracklen4  +475 moreInstitutions (145)
25 Jul 2020-bioRxiv
TL;DR: Genomic feature, gene-expression and gene-set analyses revealed distinct biological signatures for each trait, highlighting different underlying biological pathways, increasing understanding of diabetes pathophysiology by use of trans-ancestry studies for improved power and resolution.
Abstract: Glycaemic traits are used to diagnose and monitor type 2 diabetes, and cardiometabolic health. To date, most genetic studies of glycaemic traits have focused on individuals of European ancestry. Here, we aggregated genome-wide association studies in up to 281,416 individuals without diabetes (30% non-European ancestry) with fasting glucose, 2h-glucose post-challenge, glycated haemoglobin, and fasting insulin data. Trans-ancestry and single-ancestry meta-analyses identified 242 loci (99 novel; P

158 citations


Journal ArticleDOI
11 May 2020-Nature
TL;DR: Sexual dimorphism in genetic vulnerability to schizophrenia, systemic lupus erythematosus and Sjögren’s syndrome is linked to differential protein abundance from alleles of complement component 4, which implicate the complement system as a source of sexual dimorphisms in vulnerability to diverse illnesses.
Abstract: Many common illnesses, for reasons that have not been identified, differentially affect men and women. For instance, the autoimmune diseases systemic lupus erythematosus (SLE) and Sjogren's syndrome affect nine times more women than men1, whereas schizophrenia affects men with greater frequency and severity relative to women2. All three illnesses have their strongest common genetic associations in the major histocompatibility complex (MHC) locus, an association that in SLE and Sjogren's syndrome has long been thought to arise from alleles of the human leukocyte antigen (HLA) genes at that locus3-6. Here we show that variation of the complement component 4 (C4) genes C4A and C4B, which are also at the MHC locus and have been linked to increased risk for schizophrenia7, generates 7-fold variation in risk for SLE and 16-fold variation in risk for Sjogren's syndrome among individuals with common C4 genotypes, with C4A protecting more strongly than C4B in both illnesses. The same alleles that increase risk for schizophrenia greatly reduce risk for SLE and Sjogren's syndrome. In all three illnesses, C4 alleles act more strongly in men than in women: common combinations of C4A and C4B generated 14-fold variation in risk for SLE, 31-fold variation in risk for Sjogren's syndrome, and 1.7-fold variation in schizophrenia risk among men (versus 6-fold, 15-fold and 1.26-fold variation in risk among women, respectively). At a protein level, both C4 and its effector C3 were present at higher levels in cerebrospinal fluid and plasma8,9 in men than in women among adults aged between 20 and 50 years, corresponding to the ages of differential disease vulnerability. Sex differences in complement protein levels may help to explain the more potent effects of C4 alleles in men, women's greater risk of SLE and Sjogren's syndrome and men's greater vulnerability to schizophrenia. These results implicate the complement system as a source of sexual dimorphism in vulnerability to diverse illnesses.

130 citations


Journal ArticleDOI
TL;DR: PheWeb is an easy-to-use open-source web-based tool for visualizing, navigating and sharing GWAS and PheWAS results, used to explore association results for large datasets such as the UK Biobank5 and the Michigan Genomics Initiative, and organizes relationships between traits on the basis of pairwise genetic correlations.
Abstract: To the Editor — Advances in genotyping and sequencing technologies, the growing availability of electronic health records for research use and the emergence of population-scale cohorts are enabling large studies to collect copious amounts of both phenotype and genotype data. Studies can collect thousands of traits measured across hundreds of thousands of individuals, each assessed at millions of genetic variants. These resources enable genomeand phenome-wide association studies (GWAS and PheWAS, respectively) at increasing scales and can generate high-dimensional results that can provide insights into many aspects of human genetics and biology. However, navigating these association results can be challenging and cumbersome. To aid in generating and testing hypotheses for the mechanisms underlying complex traits, these results should be organized in an intuitive, easily navigable manner. The current standards in the field are to use Manhattan1 and LocusZoom2 plots to review single-trait results and to use PheWAS3,4 plots to summarize results across many traits. The ability of investigators to explore their own data by alternating between these two view types, is an increasingly common feature of large-scale association analyses. Therefore, we developed PheWeb, an easy-to-use open-source web-based tool for visualizing, navigating and sharing GWAS and PheWAS results. We have used PheWeb to explore association results for large datasets such as the UK Biobank5 (http://pheweb.org/ MGI-freeze2/) and the Michigan Genomics Initiative (MGI; http://pheweb.sph.umich. edu/MGI-freeze2). The PheWeb instance populated with the UK Biobank summary statistics displays 28 million genetic markers assessed across 1,403 binary traits for 408,961 white British participants6 (Supplementary Note). Others have used PheWeb to explore large sets of association results, such as the Oxford Brain Imaging Genetics Project (http://big.stats.ox.ac.uk) and the new computationally efficient association tool fastGWA (http://fastgwa. info/ukbimp). PheWeb provides automated data processing and an interactive web interface for exploratory analysis. The data-processing pipeline loads and harmonizes association summary statistics (recipes are provided for the output of many common tools; Supplementary Note); organizes relationships between traits on the basis of pairwise genetic correlations; and annotates the variants. The web interface provides intuitive visualizations at three levels of granularity: genome-wide summaries at the trait level, and regional (LocusZoom)2 and phenome-wide summaries at the variant level (Supplementary Note and Supplementary Fig. 1). PheWeb links to relevant public databases (for example, the NHGRI-EBI GWAS Catalog7 and ClinVar8) to provide further information on a particular variant. Association results can be queried by trait, variant or gene. To facilitate collaboration, PheWeb visualizations can be shared through the URLs, and we are exploring the opportunity to enable collaborative annotation on each results page. PheWeb can help make meaningful discoveries. To illustrate this potential, Fig. 1 and Supplementary Figs. 2 and 3 illustrate different views of genetic association signals and key variants for bladder cancer in the UK Biobank association results in PheWeb. From the Manhattan plot (Fig. 1a), the strongest association on chromosome 5 is at rs4975616 (P = 9.9 × 10−11), which is located near the CLPTM1L gene. The regional view (Supplementary Fig. 4) highlights that rs4975616 and several of its proxies are in the GWAS Catalog and are associated with various cancers (for example, lung cancer, pancreatic cancer and basal cell carcinoma), thus suggesting a broad role of the locus in cancer susceptibility. The PheWAS view (Supplementary Fig. 5) further supports the association of the locus with a variety of cancers, including cancers of the skin and lung, and it links to multiple PubMed entries supporting the potential role of rs4975616 in lung and other cancers9. Interestingly, for the other top loci (both on chromosome 8), the regional and PheWAS views (rs2976384 in JRK and PSCA, Fig. 1b,c; rs10094872 near MYC, Supplementary Figs. 6 and 7) distinctly convey that these loci are not associated with skin and lung cancers, but are instead associated with gastric and urinary traits such as duodenal ulcer, urinary tract infection and pancreatic cancer. The quantile–quantile plot (Supplementary Fig. 2) shows that the significant associations for bladder cancer are driven by common (minor allele frequency >2%) variants. To help make connections among traits, PheWeb optionally displays pairwise genetic correlations across traits (Supplementary Fig. 3). For example, bladder cancer shows the strongest genetic correlation with cancer of urinary organs (r = 0.84, P = 7.6 × 10−7 from cross-trait linkage-disequilibrium-score regression10) and weaker correlations with tobacco-use disorder (r = 0.39, P = 0.006), an observation consistent with the role of smoking as a major risk factor for bladder cancer11. In our view, describing the spectrum of traits at each locus helps to identify loci that influence disease through similar mechanisms and to expose connections among traits, whether expected or unexpected. Clearly, further research is needed to make conclusive statements regarding the roles of these or any loci, including fine-mapping and colocalization approaches as well as refinement of phenotype definitions for the traits. To this end, PheWeb is designed to allow for gradual updating of result sets as new analyses are completed and refined. Although we believe that making data explorations and large sets of results broadly accessible and intuitive is extremely valuable, doing so does not obviate the need for further analysis, experimentation and biological follow-up. Thus, PheWeb is as useful as the data and results behind it, but we expect that these results will be much more useful when they are accessible. When interpreting the visualizations, users must consider existing biases in the underlying GWAS data, including but not limited to analyses conducted in restricted (ancestry) sets of individuals and suboptimal phenotype definitions. We welcome user feedback and feature requests through the PheWeb GitHub repository (https://github.com/statgen/ pheweb), which helps us enhance and tailor PheWeb to meet the needs of the research community. This repository includes a walk-through demonstration and easy-to-follow instructions for creating a PheWeb for one’s own data. The PheWeb codebase is not exclusive to displaying variant–trait associations, and it can be used to display other types of genome-wide

112 citations


Journal ArticleDOI
TL;DR: The novel loci will facilitate understanding the genetic aetiology of smoking behaviour and may lead to the identification of potential drug targets for smoking prevention and/or cessation.
Abstract: Smoking is a major heritable and modifiable risk factor for many diseases, including cancer, common respiratory disorders and cardiovascular diseases. Fourteen genetic loci have previously been ass ...

82 citations


Journal ArticleDOI
TL;DR: The relationship between genetic variants influencing predisposition to type 2 diabetes and related glycemic traits, and human pancreatic islet transcription is explored using data from 420 donors to illustrate the advantages of performing functional and regulatory studies in disease relevant tissues.
Abstract: Most signals detected by genome-wide association studies map to non-coding sequence and their tissue-specific effects influence transcriptional regulation. However, key tissues and cell-types required for functional inference are absent from large-scale resources. Here we explore the relationship between genetic variants influencing predisposition to type 2 diabetes (T2D) and related glycemic traits, and human pancreatic islet transcription using data from 420 donors. We find: (a) 7741 cis-eQTLs in islets with a replication rate across 44 GTEx tissues between 40% and 73%; (b) marked overlap between islet cis-eQTL signals and active regulatory sequences in islets, with reduced eQTL effect size observed in the stretch enhancers most strongly implicated in GWAS signal location; (c) enrichment of islet cis-eQTL signals with T2D risk variants identified in genome-wide association studies; and (d) colocalization between 47 islet cis-eQTLs and variants influencing T2D or glycemic traits, including DGKB and TCF7L2. Our findings illustrate the advantages of performing functional and regulatory studies in disease relevant tissues.

81 citations


Posted ContentDOI
Niamh Mullins1, Jooeun Kang2, Adrian I. Campos3, Adrian I. Campos4  +324 moreInstitutions (116)
04 Dec 2020-medRxiv
TL;DR: The results identify a risk locus that contributes more strongly to SA than other phenotypes and suggest the existence of a shared genetic etiology between SA and known risk factors that is not mediated by psychiatric disorders.
Abstract: Suicide is a leading cause of death worldwide and non-fatal suicide attempts, which occur far more frequently, are a major source of disability and social and economic burden. Both are known to have a substantial genetic etiology, which is partially shared and partially distinct from that of related psychiatric disorders. We conducted a genome-wide association study (GWAS) of 29,782 suicide attempt (SA) cases and 519,961 controls in the International Suicide Genetics Consortium and conditioned the results on psychiatric disorders using GWAS summary statistics, to investigate their shared and divergent genetic architectures. Two loci reached genome-wide significance for SA: the major histocompatibility complex and an intergenic locus on chromosome 7, which remained associated after conditioning and has previously been implicated in risk-taking, smoking, and insomnia. SA showed strong genetic correlation with psychiatric disorders, particularly major depression, and also with smoking, lower socioeconomic status, pain, lower educational attainment, reproductive traits, risk-taking, sleep disturbances, and poorer overall general health. After conditioning, the genetic correlations between SA and psychiatric disorders decreased, whereas those with non-psychiatric traits remained largely unchanged. Our results identify a risk locus that contributes more strongly to SA than other phenotypes and suggest the existence of a shared genetic etiology between SA and known risk factors that is not mediated by psychiatric disorders.

68 citations


Journal ArticleDOI
TL;DR: A GWAS and two-sample Mendelian randomization using TSH index variants as instrumental variables suggests a protective effect of higher TSH levels (indicating lower thyroid function) on risk of thyroid cancer and goiter.
Abstract: Thyroid stimulating hormone (TSH) is critical for normal development and metabolism. To better understand the genetic contribution to TSH levels, we conduct a GWAS meta-analysis at 22.4 million genetic markers in up to 119,715 individuals and identify 74 genome-wide significant loci for TSH, of which 28 are previously unreported. Functional experiments show that the thyroglobulin protein-altering variants P118L and G67S impact thyroglobulin secretion. Phenome-wide association analysis in the UK Biobank demonstrates the pleiotropic effects of TSH-associated variants and a polygenic score for higher TSH levels is associated with a reduced risk of thyroid cancer in the UK Biobank and three other independent studies. Two-sample Mendelian randomization using TSH index variants as instrumental variables suggests a protective effect of higher TSH levels (indicating lower thyroid function) on risk of thyroid cancer and goiter. Our findings highlight the pleiotropic effects of TSH-associated variants on thyroid function and growth of malignant and benign thyroid tumors.

61 citations


01 Jan 2020
TL;DR: In this article, a fixed effects meta-analysis of up to 61 studies (up to 346,813 participants) was performed to investigate the association of SNVs with smoking behavior traits.
Abstract: Smoking is a major heritable and modifiable risk factor for many diseases, including cancer, common respiratory disorders and cardiovascular diseases. Fourteen genetic loci have previously been associated with smoking behaviour-related traits. We tested up to 235,116 single nucleotide variants (SNVs) on the exome-array for association with smoking initiation, cigarettes per day, pack-years, and smoking cessation in a fixed effects meta-analysis of up to 61 studies (up to 346,813 participants). In a subset of 112,811 participants, a further one million SNVs were also genotyped and tested for association with the four smoking behaviour traits. SNV-trait associations with P < 5 × 10−8 in either analysis were taken forward for replication in up to 275,596 independent participants from UK Biobank. Lastly, a meta-analysis of the discovery and replication studies was performed. Sixteen SNVs were associated with at least one of the smoking behaviour traits (P < 5 × 10−8) in the discovery samples. Ten novel SNVs, including rs12616219 near TMEM182, were followed-up and five of them (rs462779 in REV3L, rs12780116 in CNNM2, rs1190736 in GPR101, rs11539157 in PJA1, and rs12616219 near TMEM182) replicated at a Bonferroni significance threshold (P < 4.5 × 10−3) with consistent direction of effect. A further 35 SNVs were associated with smoking behaviour traits in the discovery plus replication meta-analysis (up to 622,409 participants) including a rare SNV, rs150493199, in CCDC141 and two low-frequency SNVs in CEP350 and HDGFRP2. Functional follow-up implied that decreased expression of REV3L may lower the probability of smoking initiation. The novel loci will facilitate understanding the genetic aetiology of smoking behaviour and may lead to the identification of potential drug targets for smoking prevention and/or cessation.

54 citations


Posted ContentDOI
18 Sep 2020-medRxiv
TL;DR: This genome-wide association study (GWAS) of 41,917 BD cases and 371,549 controls identified 64 associated genomic loci, which provides the best-powered BD polygenic scores to date, when applied in both European and diverse ancestry samples.
Abstract: Bipolar disorder (BD) is a heritable mental illness with complex etiology. We performed a genome-wide association study (GWAS) of 41,917 BD cases and 371,549 controls of European ancestry, which identified 64 associated genomic loci. BD risk alleles were enriched in genes in synaptic signaling pathways and brain-expressed genes, particularly those with high specificity of expression in neurons of the prefrontal cortex and hippocampus. Significant signal enrichment was found in genes encoding targets of antipsychotics, calcium channel blockers, antiepileptics and anesthetics. Integrating eQTL data implicated 15 genes robustly linked to BD via gene expression, including druggable genes such as HTR6, MCHR1, DCLK3 and FURIN. This GWAS provides the best-powered BD polygenic scores to date, when applied in both European and diverse ancestry samples. Analyses of BD subtypes indicated high but imperfect genetic correlation between BD type I and II and identified additional associated loci. Together, these results advance our understanding of the biological etiology of BD, identify novel therapeutic leads and prioritize genes for functional follow-up studies.

Journal ArticleDOI
TL;DR: A robust statistical method that accurately estimates DNA contamination and is agnostic to genetic ancestry of the intended or contaminating sample and integrates the estimation of genetic ancestry and DNA contamination in a unified likelihood framework by leveraging individual-specific allele frequencies projected from reference genotypes onto principal component coordinates.
Abstract: Detecting and estimating DNA sample contamination are important steps to ensure high-quality genotype calls and reliable downstream analysis. Existing methods rely on population allele frequency information for accurate estimation of contamination rates. Correctly specifying population allele frequencies for each individual in early stage of sequence analysis is impractical or even impossible for large-scale sequencing centers that simultaneously process samples from multiple studies across diverse populations. On the other hand, incorrectly specified allele frequencies may result in substantial bias in estimated contamination rates. For example, we observed that existing methods often fail to identify 10% contaminated samples at a typical 3% contamination exclusion threshold when genetic ancestry is misspecified. Such an incomplete screening of contaminated samples substantially inflates the estimated rate of genotyping errors even in deeply sequenced genomes and exomes. We propose a robust statistical method that accurately estimates DNA contamination and is agnostic to genetic ancestry of the intended or contaminating sample. Our method integrates the estimation of genetic ancestry and DNA contamination in a unified likelihood framework by leveraging individual-specific allele frequencies projected from reference genotypes onto principal component coordinates. Our method can also be used for estimating genetic ancestries, similar to LASER or TRACE, but simultaneously accounting for potential contamination. We demonstrate that our method robustly estimates contamination rates and genetic ancestries across populations and contamination scenarios. We further demonstrate that, in the presence of contamination, genetic ancestry inference can be substantially biased with existing methods that ignore contamination, while our method corrects for such biases.

Journal ArticleDOI
Jonas B. Nielsen1, Jonas B. Nielsen2, Oren Rom2, Ida Surakka2, Sarah E. Graham2, Wei Zhou, Tanmoy Roychowdhury2, Lars G. Fritsche2, Lars G. Fritsche1, Sarah A Gagliano Taliun2, Carlo Sidore, Yuhao Liu2, Maiken Elvestad Gabrielsen1, Anne Heidi Skogholt1, Brooke N. Wolford2, William Overton2, Ying Zhao2, Jin Chen2, He Zhang2, Whitney E. Hornsby2, Akua Acheampong2, Austen Grooms2, Amanda Schaefer2, Gregory J.M. Zajac2, Luis Villacorta2, Jifeng Zhang2, Ben Michael Brumpton1, Mari Løset1, Vivek Rai2, Pia R. Lundegaard3, Pia R. Lundegaard4, Morten S. Olesen4, Morten S. Olesen3, Kent D. Taylor5, Nicholette D. Palmer6, Yii Der Chen5, Seung Hoan Choi7, Steven A. Lubitz7, Steven A. Lubitz8, Patrick T. Ellinor7, Patrick T. Ellinor8, Kathleen C. Barnes9, Michelle Daya9, Nicholas Rafaels9, Scott T. Weiss10, Scott T. Weiss8, Jessica Lasky-Su8, Jessica Lasky-Su10, Russell P. Tracy11, Ramachandran S. Vasan12, L. Adrienne Cupples12, Rasika A. Mathias13, Lisa R. Yanek13, Lewis C. Becker13, Patricia A. Peyser2, Lawrence F. Bielak2, Jennifer A. Smith2, Stella Aslibekyan14, Bertha Hidalgo14, Donna K. Arnett15, Marguerite R. Irvin14, James G. Wilson16, Solomon K. Musani16, Adolfo Correa16, Stephen S. Rich17, Xiuqing Guo5, Jerome I. Rotter5, Barbara A. Konkle18, Jill M. Johnsen18, Allison E. Ashley-Koch19, Marilyn J. Telen19, Vivien A. Sheehan20, John Blangero21, Joanne E. Curran21, Juan M. Peralta21, Courtney G. Montgomery22, Wayne Huey-Herng Sheu, Ren-Hua Chung23, Karen Schwander24, Seyed Mehdi Nouraie25, Victor R. Gordeuk26, Yingze Zhang25, Charles Kooperberg27, Alexander P. Reiner18, Alexander P. Reiner27, Rebecca D. Jackson28, Eugene R. Bleecker29, Deborah A. Meyers29, Xingnan Li29, Sayantan Das2, Ketian Yu2, Jonathon LeFaive2, Albert V. Smith2, Thomas W. Blackwell2, Daniel Taliun2, Sebastian Zöllner2, Lukas Forer29, Sebastian Schoenherr30, Christian Fuchsberger2, Anita Pandit2, Matthew Zawistowski2, Sachin Kheterpal2, Chad M. Brummett2, Pradeep Natarajan7, Pradeep Natarajan8, David Schlessinger31, Seunggeun Lee2, Hyun Min Kang2, Francesco Cucca32, Oddgeir L. Holmen1, Bjørn Olav Åsvold1, Michael Boehnke2, Sekar Kathiresan7, Sekar Kathiresan8, Gonçalo R. Abecasis33, Gonçalo R. Abecasis2, Y. Eugene Chen2, Cristen J. Willer, Kristian Hveem1, Kristian Hveem34 
TL;DR: In this article, the authors performed genome-wide analyses of participants in the HUNT Study in Norway to search for protein-altering variants with beneficial impact on quantitative blood traits related to cardiovascular disease, but without detrimental impact on liver function.
Abstract: Pharmaceutical drugs targeting dyslipidemia and cardiovascular disease (CVD) may increase the risk of fatty liver disease and other metabolic disorders. To identify potential novel CVD drug targets without these adverse effects, we perform genome-wide analyses of participants in the HUNT Study in Norway (n = 69,479) to search for protein-altering variants with beneficial impact on quantitative blood traits related to cardiovascular disease, but without detrimental impact on liver function. We identify 76 (11 previously unreported) presumed causal protein-altering variants associated with one or more CVD- or liver-related blood traits. Nine of the variants are predicted to result in loss-of-function of the protein. This includes ZNF529:p.K405X, which is associated with decreased low-density-lipoprotein (LDL) cholesterol (P = 1.3 × 10-8) without being associated with liver enzymes or non-fasting blood glucose. Silencing of ZNF529 in human hepatoma cells results in upregulation of LDL receptor and increased LDL uptake in the cells. This suggests that inhibition of ZNF529 or its gene product should be prioritized as a novel candidate drug target for treating dyslipidemia and associated CVD.

Posted ContentDOI
Julia K. Goodrich1, Moriel Singer-Berk1, Rachel Son1, Abigail Sveden1  +155 moreInstitutions (66)
24 Sep 2020-medRxiv
TL;DR: Additional epidemiologic and genetic factors contributing to risk prediction are assessed, demonstrating that inclusion of common polygenic variation significantly improved biomarker estimation for two monogenic dyslipidemias.
Abstract: Hundreds of thousands of genetic variants have been reported to cause severe monogenic diseases, but the probability that a variant carrier will develop the disease (termed penetrance) is unknown for virtually all of them. Additionally, the clinical utility of common polygenetic variation remains uncertain. Using exome sequencing from 77,184 adult individuals (38,618 multi-ancestral individuals from a type 2 diabetes case-control study and 38,566 participants from the UK Biobank, for whom genotype array data were also available), we applied clinical standard-of-care gene variant curation for eight monogenic metabolic conditions. Rare variants causing monogenic diabetes and dyslipidemias displayed effect sizes significantly larger than the top 1% of the corresponding polygenic scores. Nevertheless, penetrance estimates for monogenic variant carriers averaged below 60% in both studies for all conditions except monogenic diabetes. We assessed additional epidemiologic and genetic factors contributing to risk prediction, demonstrating that inclusion of common polygenic variation significantly improved biomarker estimation for two monogenic dyslipidemias.

Journal ArticleDOI
25 May 2020-Genes
TL;DR: This work showed that the joint modeling approach provided an unbiased estimate of genetic effects, greatly improved the power of single-variant association tests among methods that can properly estimate allele effects, and enhanced gene-level tests over existing approaches.
Abstract: There is great interest in understanding the impact of rare variants in human diseases using large sequence datasets. In deep sequence datasets of >10,000 samples, ~10% of the variant sites are observed to be multi-allelic. Many of the multi-allelic variants have been shown to be functional and disease-relevant. Proper analysis of multi-allelic variants is critical to the success of a sequencing study, but existing methods do not properly handle multi-allelic variants and can produce highly misleading association results. We discuss practical issues and methods to encode multi-allelic sites, conduct single-variant and gene-level association analyses, and perform meta-analysis for multi-allelic variants. We evaluated these methods through extensive simulations and the study of a large meta-analysis of ~18,000 samples on the cigarettes-per-day phenotype. We showed that our joint modeling approach provided an unbiased estimate of genetic effects, greatly improved the power of single-variant association tests among methods that can properly estimate allele effects, and enhanced gene-level tests over existing approaches. Software packages implementing these methods are available online.

Journal ArticleDOI
TL;DR: For poorly represented populations, sequencing a subset of participants is often most cost-effective, and can substantially increase imputation quality and GWAS power, and for populations that are well‐represented in existing reference panels, array genotyping alone is cost‐effective and well‐powered to detect common‐ and rare‐variant associations.
Abstract: A key aim for current genome-wide association studies (GWAS) is to interrogate the full spectrum of genetic variation underlying human traits, including rare variants, across populations. Deep whole-genome sequencing is the gold standard to fully capture genetic variation, but remains prohibitively expensive for large sample sizes. Array genotyping interrogates a sparser set of variants, which can be used as a scaffold for genotype imputation to capture a wider set of variants. However, imputation quality depends crucially on reference panel size and genetic distance from the target population. Here, we consider sequencing a subset of GWAS participants and imputing the rest using a reference panel that includes both sequenced GWAS participants and an external reference panel. We investigate how imputation quality and GWAS power are affected by the number of participants sequenced for admixed populations (African and Latino Americans) and European population isolates (Sardinians and Finns), and identify powerful, cost-effective GWAS designs given current sequencing and array costs. For populations that are well-represented in existing reference panels, we find that array genotyping alone is cost-effective and well-powered to detect common- and rare-variant associations. For poorly represented populations, sequencing a subset of participants is often most cost-effective, and can substantially increase imputation quality and GWAS power.

Posted ContentDOI
Alexander Kurilshikov1, Carolina Medina-Gomez2, Rodrigo Bacigalupe3, Djawad Radjabzadeh2, Jun Wang, Ayse Demirkan4, Caroline I. Le Roy5, Juan Antonio Raygoza Garay6, Casey T. Finnicum, Xingrong Liu7, Daria V. Zhernakova8, Marc Jan Bonder1, Tue H. Hansen9, Fabian Frost10, Malte C. Rühlemann11, Williams Turpin6, Jee-Young Moon12, Han-Na Kim13, Kreete Lüll14, Elad Barkan15, Shiraz A. Shah16, Myriam Fornage17, Joanna Szopinska-Tokov, Zachary D. Wallen18, Dmitrii Borisevich9, Lars Agréus7, Anna Andreasson19, Corinna Bang11, Larbi Bedrani6, Jordana T. Bell5, Hans Bisgaard16, Michael Boehnke20, Dorret I. Boomsma21, Robert D. Burk12, Annique Claringbould1, Kenneth Croitoru22, Gareth E. Davies, Cornelia M. van Duijn23, Liesbeth Duijts2, Gwen Falony3, Jingyuan Fu1, Adriaan van der Graaf1, Torben Hansen9, Georg Homuth10, David A. Hughes24, Richard G. IJzerman25, Matthew A Jackson4, Vincent W. V. Jaddoe2, Marie Joossens3, Torben Joergensen9, Daniel Keszthelyi26, Rob Knight27, Markku Laakso28, Matthias Laudes, Lenore J. Launer29, Wolfgang Lieb11, Aldons J. Lusis30, Ad A.M. Masclee26, Henriëtte A. Moll2, Zlatan Mujagic26, Qi Qibin12, Daphna Rothschild15, Hocheol Shin13, Søren J. Sørensen9, Claire J. Steves5, Jonathan Thorsen16, Nicholas J. Timpson24, Raul Y. Tito3, Sara Vieira-Silva3, Uve Voelker10, Henry Voelzke10, Urmo Võsa1, Kaitlin H Wade24, Susanna Walter, Kyoko Watanabe21, Stefan Weiss, Frank Ulrich Weiss10, Omer Weissbrod31, Harm-Jan Westra1, Gonneke Willemsen21, Haydeh Payami18, Daisy Jonkers26, Alejandro Arias Vasquez, Eco J. C. de Geus21, Katie A. Meyer32, Jakob Stokholm16, Eran Segal15, Elin Org14, Cisca Wijmenga1, Hyung Lae Kim33, Robert C. Kaplan12, Tim D. Spector5, André G. Uitterlinden2, Fernando Rivadeneira34, Andre Franke11, Markus M. Lerch10, Lude Franke1, Serena Sanna, Mauro D'Amato35, Oluf Pedersen9, Andrew D. Paterson6, Robert Kraaij2, Jeroen Raes3, Alexandra Zhernakova1 
28 Jun 2020-bioRxiv
TL;DR: A phenome-wide association study and Mendelian randomization analyses identified enrichment of microbiome trait loci SNPs in the metabolic, nutrition and environment domains and indicated food preferences and diseases as mediators of genetic effects.
Abstract: To study the effect of host genetics on gut microbiome composition, the MiBioGen consortium curated and analyzed whole-genome genotypes and 16S fecal microbiome data from 18,473 individuals (25 cohorts) Microbial composition showed high variability across cohorts: we detected only 9 out of 410 genera in more than 95% of the samples A genome-wide association study (GWAS) of host genetic variation in relation to microbial taxa identified 30 loci affecting microbome taxa at a genome-wide significant (P

Posted ContentDOI
13 Dec 2020-bioRxiv
TL;DR: This study confirms that integrating SVs in trait-mapping studies will expand the knowledge of genetic factors underlying disease risk, and discovered 31 genome-wide significant associations at 15 loci at which SVs have strong phenotypic effects.
Abstract: The contribution of genome structural variation (SV) to quantitative traits associated with cardiometabolic diseases remains largely unknown. Here, we present the results of a study examining genetic association between SVs and cardiometabolic traits in the Finnish population. We used sensitive methods to identify and genotype 129,166 high-confidence SVs from deep whole genome sequencing (WGS) data of 4,848 individuals. We tested the 64,572 common and low frequency SVs for association with 116 quantitative traits, and tested candidate associations using exome sequencing and array genotype data from an additional 15,205 individuals. We discovered 31 genome-wide significant associations at 15 loci, including two novel loci at which SVs have strong phenotypic effects: (1) a deletion of the ALB gene promoter that is greatly enriched in the Finnish population and causes decreased serum albumin level in carriers (p=1.47x10-54), and is also associated with increased levels of total cholesterol (p=1.22x10-28) and 14 additional cholesterol-related traits, and (2) a multiallelic copy number variant (CNV) at PDPR that is strongly associated with pyruvate (p=4.81x10-21) and alanine (p=6.14x10-12) levels and resides within a structurally complex genomic region that has accumulated many rearrangements over evolutionary time. We also confirmed six previously reported associations, including five led by stronger signals in single nucleotide variants (SNVs), and one linking recurrent HP gene deletion and cholesterol levels (p=6.24x10-10), which was also found to be strongly associated with increased glycoprotein level (p=3.53x10-35). Our study confirms that integrating SVs in trait-mapping studies will expand our knowledge of genetic factors underlying disease risk.

Posted ContentDOI
14 Aug 2020-medRxiv
TL;DR: It is demonstrated that at-risk individuals have lower background ACE2 levels in this highly relevant tissue, and further studies will be required to establish how this may contribute to increased COVID-19 severity.
Abstract: COVID-19 severity has varied widely, with demographic and cardio-metabolic factors increasing risk of severe reactions to SARS-CoV-2 infection, but the underlying mechanisms for this remain uncertain. We investigated phenotypic and genetic factors associated with subcutaneous adipose tissue expression of Angiotensin I Converting Enzyme 2 ( ACE2 ), which has been shown to act as a receptor for SARS-CoV-2 cellular entry. In a meta-analysis of three independent studies including up to 1,471 participants, lower adipose tissue ACE2 expression was associated with adverse cardio-metabolic health indices including type 2 diabetes (T2D) and obesity status, higher serum fasting insulin and BMI, and lower serum HDL levels (P<5.32x10 -4 ). ACE2 expression levels were also associated with estimated proportions of cell types in adipose tissue; lower ACE2 expression was associated with a lower proportion of microvascular endothelial cells (P=4.25x10 -4 ) and higher macrophage proportion (P=2.74x10 -5 ), suggesting a link to inflammation. Despite an estimated heritability of 32%, we did not identify any proximal or distal genetic variants (eQTLs) associated with adipose tissue ACE2 expression. Our results demonstrate that at-risk individuals have lower background ACE2 levels in this highly relevant tissue. Further studies will be required to establish how this may contribute to increased COVID-19 severity.

Journal ArticleDOI
TL;DR: A statistical framework and computational tool is presented to integrate heterogeneous annotations with GWAS summary statistics for gene-based analysis, applied with comprehensive coding and tissue-specific regulatory annotations, and it is found that incorporating heterogeneous Annotations in gene- based association analysis increases power and performance identifying causal genes.
Abstract: Gene-based association tests aggregate genotypes across multiple variants for each gene, providing an interpretable gene-level analysis framework for genome-wide association studies (GWAS). Early gene-based test applications often focused on rare coding variants; a more recent wave of gene-based methods, e.g. TWAS, use eQTLs to interrogate regulatory associations. Regulatory variants are expected to be particularly valuable for gene-based analysis, since most GWAS associations to date are non-coding. However, identifying causal genes from regulatory associations remains challenging and contentious. Here, we present a statistical framework and computational tool to integrate heterogeneous annotations with GWAS summary statistics for gene-based analysis, applied with comprehensive coding and tissue-specific regulatory annotations. We compare power and accuracy identifying causal genes across single-annotation, omnibus, and annotation-agnostic gene-based tests in simulation studies and an analysis of 128 traits from the UK Biobank, and find that incorporating heterogeneous annotations in gene-based association analysis increases power and performance identifying causal genes.


Journal ArticleDOI
TL;DR: Fine-mapping and experimental validation demonstrated that multiple, distinct association signals at these loci can influence multiple transcripts through multiple molecular mechanisms.
Abstract: Loci identified in genome-wide association studies (GWAS) can include multiple distinct association signals. We sought to identify the molecular basis of multiple association signals for adiponectin, a hormone involved in glucose regulation secreted almost exclusively from adipose tissue, identified in the Metabolic Syndrome in Men (METSIM) study. With GWAS data for 9,262 men, four loci were significantly associated with adiponectin: ADIPOQ, CDH13, IRS1, and PBRM1. We performed stepwise conditional analyses to identify distinct association signals, a subset of which are also nearly independent (lead variant pairwise r2<0.01). Two loci exhibited allelic heterogeneity, ADIPOQ and CDH13. Of seven association signals at the ADIPOQ locus, two signals colocalized with adipose tissue expression quantitative trait loci (eQTLs) for three transcripts: trait-increasing alleles at one signal were associated with increased ADIPOQ and LINC02043, while trait-increasing alleles at the other signal were associated with decreased ADIPOQ-AS1. In reporter assays, adiponectin-increasing alleles at two signals showed corresponding directions of effect on transcriptional activity. Putative mechanisms for the seven ADIPOQ signals include a missense variant (ADIPOQ G90S), a splice variant, a promoter variant, and four enhancer variants. Of two association signals at the CDH13 locus, the first signal consisted of promoter variants, including the lead adipose tissue eQTL variant for CDH13, while a second signal included a distal intron 1 enhancer variant that showed ~2-fold allelic differences in transcriptional reporter activity. Fine-mapping and experimental validation demonstrated that multiple, distinct association signals at these loci can influence multiple transcripts through multiple molecular mechanisms.

Posted ContentDOI
24 Jun 2020-bioRxiv
TL;DR: This manuscript presents the Robust Unified Test for HWE (RUTH), a practical and scalable software tool to rapidly perform HWE tests across millions of markers and hundreds of thousands of individuals while supporting standard VCF/BCF formats, and demonstrates different tradeoffs between false positives and statistical power across the methods.
Abstract: Traditional Hardy-Weinberg equilibrium (HWE) tests (the χ2 test and the exact test) have long been used as a metric for evaluating genotype quality, as technical artifacts leading to incorrect genotype calls often can be identified as deviations from HWE However, in datasets comprised of individuals from diverse ancestries, HWE can be violated even without genotyping error, complicating the use of HWE testing to assess genotype data quality In this manuscript, we present the Robust Unified Test for HWE (RUTH) to test for HWE while accounting for population structure and genotype uncertainty, and evaluate the impact of population heterogeneity and genotype uncertainty on the standard HWE tests and alternative methods using simulated and real sequence datasets Our results demonstrate that ignoring population structure or genotype uncertainty in HWE tests can inflate false positive rates by many orders of magnitude Our evaluations demonstrate different tradeoffs between false positives and statistical power across the methods, with RUTH consistently amongst the best across all evaluations RUTH is implemented as a practical and scalable software tool to rapidly perform HWE tests across millions of markers and hundreds of thousands of individuals while supporting standard VCF/BCF formats RUTH is publicly available at https://wwwgithubcom/statgen/ruth

Journal ArticleDOI
TL;DR: Joint calling is compared to the alternative of single‐study calling in terms of variant detection sensitivity and genotype accuracy as a function of sequencing coverage and their impact on downstream association analysis is assessed.
Abstract: Individual sequencing studies often have limited sample sizes and so limited power to detect trait associations with rare variants. A common strategy is to aggregate data from multiple studies. For studying rare variants, jointly calling all samples together is the gold standard strategy but can be difficult to implement due to privacy restrictions and computational burden. Here, we compare joint calling to the alternative of single-study calling in terms of variant detection sensitivity and genotype accuracy as a function of sequencing coverage and assess their impact on downstream association analysis. To do so, we analyze deep-coverage (~82×) exome and low-coverage (~5×) genome sequence data on 2,250 individuals from the Genetics of Type 2 Diabetes study jointly and separately within five geographic cohorts. For rare single nucleotide variants (SNVs): (a) ≥97% of discovered SNVs are found by both calling strategies; (b) nonreference concordance with a set of highly accurate genotypes is ≥99% for both calling strategies; (c) meta-analysis has similar power to joint analysis in deep-coverage sequence data but can be less powerful in low-coverage sequence data. Given similar data processing and quality control steps, we recommend single-study calling as a viable alternative to joint calling for analyzing SNVs of all minor allele frequency in deep-coverage data.

Posted ContentDOI
27 Oct 2020-medRxiv
TL;DR: Measurements of MT-CN in blood-derived DNA may primarily reflect differences in cell-type composition, and that these differences may be causally linked to insulin and related traits.
Abstract: Mitochondrial copy number is known to vary among humans and across tissues, and a population-based study of mitochondrial genome copy number (MT-CN) in blood – as estimated from genome sequencing data – observed strong (∼54%) heritability. However, the genetic causes and phenotypic consequences of MT-CN variation in humans are not well-studied. Here, we studied MT-CN variation in blood-derived DNA from 19,184 Finnish individuals with deep cardiometabolic trait measurements using a combination of whole genome (N = 4,163) and exome sequencing (N = 19,034) data as well as imputed array genotypes (N = 17,718). We confirmed that MT-CN in blood is highly heritable (31% by GREML). We identified two loci in the nuclear genome that are significantly associated with MT-CN variation: a common variant at the MYB-HBS1L locus (P = 1.6×10−8), which has previously been associated with numerous hematological parameters; and a burden of rare variants in the TMB1M1 gene (P = 3.0×10−8), which has been reported to protect against non-alcoholic fatty liver disease in model organisms. We also found that MT-CN is strongly associated with insulin levels (P = 2.0×10−21), fat mass (P = 4.5×10−16), and other related traits. Using a Mendelian randomization framework, we constructed a genetic instrument for MT-CN using penalized regression with adjustment for potentially confounding covariates and found a significant association with insulin levels, which suggests that our MT-CN measurement in blood may be causally related to metabolic syndrome. Finally, we computed our genetic instrument in UK Biobank participants and tested it against a set of cell count and cardiometabolic traits. We found significant associations between MT-CN and both neutrophil and platelet counts (P = 1.8×10−8 and P = 1.2×10−3). While the association between MT-CN and metabolic syndrome traits was replicated in the UK Biobank, adjusting for cell counts largely eliminated these signals, suggesting that MT-CN is actually a proxy measurement for neutrophil and platelet counts in its effect on metabolic syndrome. Taken together, these results suggest that measurements of MT-CN in blood-derived DNA may primarily reflect differences in cell-type composition, and that these differences may be causally linked to insulin and related traits. Author summary The number of mitochondria per cell is variable between tissues and between individuals, and prior studies have shown that mitochondrial genome copy number in blood (MT-CN) – as estimated indirectly from sequencing of blood-derived DNA – is a genetically determined trait that varies among humans. We studied genetic data from approximately 19,000 Finnish individuals and showed that MT-CN is significantly associated with insulin and related metabolic traits, providing evidence that MT-CN levels play a role in determining these traits. Consistent with a previous study, we showed that genetics play a significant role in determining blood-derived MT-CN. We also found new evidence linking several regions of the genome to MT-CN, including a gene known to be associated with non-alcoholic fatty liver disease. Finally, we found that in the link between blood-derived MT-CN and metabolic syndrome, MT-CN likely represents the relative quantities of circulating immune cells, providing further evidence for the role of inflammation in metabolic syndrome.


Journal ArticleDOI
TL;DR: It is shown that for both single- and multiple-variant tests, the power loss for ATR analogs increases with increasing stringency of Type 1 error control and increasing correlation between the genetic variant (or multiple variants) and covariates.
Abstract: Multiple linear regression is commonly used to test for association between genetic variants and continuous traits and estimate genetic effect sizes. Confounding variables are controlled for by including them as additional covariates. An alternative technique that is increasingly used is to regress out covariates from the raw trait and then perform regression analysis with only the genetic variants included as predictors. In the case of single-variant analysis, this adjusted trait regression (ATR) technique is known to be less powerful than the traditional technique when the genetic variant is correlated with the covariates We extend previous results for single-variant tests by deriving exact relationships between the single-variant score, Wald, likelihood-ratio, and F test statistics and their ATR analogs. We also derive the asymptotic power of ATR analogs of the multiple-variant score and burden tests. We show that the maximum power loss of the ATR analog of the multiple-variant score test is completely characterized by the canonical correlations between the set of genetic variants and the set of covariates. Further, we show that for both single- and multiple-variant tests, the power loss for ATR analogs increases with increasing stringency of Type 1 error control ( α ) and increasing correlation (or canonical correlations) between the genetic variant (or multiple variants) and covariates. We recommend using ATR only when maximum canonical correlation between variants and covariates is low, as is typically true.