scispace - formally typeset
Search or ask a question

Showing papers by "Michael Boehnke published in 2021"


Journal ArticleDOI
Daniel Taliun1, Daniel N. Harris2, Michael D. Kessler2, Jedidiah Carlson1  +202 moreInstitutions (61)
10 Feb 2021-Nature
TL;DR: The Trans-Omics for Precision Medicine (TOPMed) project as discussed by the authors aims to elucidate the genetic architecture and biology of heart, lung, blood and sleep disorders, with the ultimate goal of improving diagnosis, treatment and prevention of these diseases.
Abstract: The Trans-Omics for Precision Medicine (TOPMed) programme seeks to elucidate the genetic architecture and biology of heart, lung, blood and sleep disorders, with the ultimate goal of improving diagnosis, treatment and prevention of these diseases The initial phases of the programme focused on whole-genome sequencing of individuals with rich phenotypic data and diverse backgrounds Here we describe the TOPMed goals and design as well as the available resources and early insights obtained from the sequence data The resources include a variant browser, a genotype imputation server, and genomic and phenotypic data that are available through dbGaP (Database of Genotypes and Phenotypes)1 In the first 53,831 TOPMed samples, we detected more than 400 million single-nucleotide and insertion or deletion variants after alignment with the reference genome Additional previously undescribed variants were detected through assembly of unmapped reads and customized analysis in highly variable loci Among the more than 400 million detected variants, 97% have frequencies of less than 1% and 46% are singletons that are present in only one individual (53% among unrelated individuals) These rare variants provide insights into mutational processes and recent human evolutionary history The extensive catalogue of genetic variation in TOPMed studies provides unique opportunities for exploring the contributions of rare and noncoding sequence variants to phenotypic variation Furthermore, combining TOPMed haplotypes with modern imputation methods improves the power and reach of genome-wide association studies to include variants down to a frequency of approximately 001% The goals, resources and design of the NHLBI Trans-Omics for Precision Medicine (TOPMed) programme are described, and analyses of rare variants detected in the first 53,831 samples provide insights into mutational processes and recent human evolutionary history

801 citations


Journal ArticleDOI
TL;DR: The authors performed a genome-wide association study of 41,917 bipolar disorder cases and 371,549 controls of European ancestry, which identified 64 associated genomic loci, including genes encoding targets of antipsychotics, calcium channel blockers, antiepileptics and anesthetics.
Abstract: Bipolar disorder is a heritable mental illness with complex etiology. We performed a genome-wide association study of 41,917 bipolar disorder cases and 371,549 controls of European ancestry, which identified 64 associated genomic loci. Bipolar disorder risk alleles were enriched in genes in synaptic signaling pathways and brain-expressed genes, particularly those with high specificity of expression in neurons of the prefrontal cortex and hippocampus. Significant signal enrichment was found in genes encoding targets of antipsychotics, calcium channel blockers, antiepileptics and anesthetics. Integrating expression quantitative trait locus data implicated 15 genes robustly linked to bipolar disorder via gene expression, encoding druggable targets such as HTR6, MCHR1, DCLK3 and FURIN. Analyses of bipolar disorder subtypes indicated high but imperfect genetic correlation between bipolar disorder type I and II and identified additional associated loci. Together, these results advance our understanding of the biological etiology of bipolar disorder, identify novel therapeutic leads and prioritize genes for functional follow-up studies.

378 citations


Journal ArticleDOI
Alexander Kurilshikov1, Carolina Medina-Gomez2, Rodrigo Bacigalupe3, Djawad Radjabzadeh2, Jun Wang3, Jun Wang4, Ayse Demirkan1, Ayse Demirkan5, Caroline I. Le Roy6, Juan Antonio Raygoza Garay7, Casey T. Finnicum8, Xingrong Liu9, Daria V. Zhernakova1, Marc Jan Bonder1, Tue H. Hansen10, Fabian Frost11, Malte C. Rühlemann12, Williams Turpin7, Jee-Young Moon13, Han-Na Kim14, Kreete Lüll15, Elad Barkan16, Shiraz A. Shah17, Myriam Fornage18, Joanna Szopinska-Tokov, Zachary D. Wallen19, Dmitrii Borisevich10, Lars Agréus9, Anna Andreasson20, Corinna Bang12, Larbi Bedrani7, Jordana T. Bell6, Hans Bisgaard17, Michael Boehnke21, Dorret I. Boomsma22, Robert D. Burk13, Annique Claringbould1, Kenneth Croitoru7, Gareth E. Davies8, Gareth E. Davies22, Cornelia M. van Duijn2, Cornelia M. van Duijn23, Liesbeth Duijts2, Gwen Falony3, Jingyuan Fu1, Adriaan van der Graaf1, Torben Hansen10, Georg Homuth11, David A. Hughes24, Richard G. IJzerman25, Matthew A. Jackson23, Matthew A. Jackson6, Vincent W. V. Jaddoe2, Marie Joossens3, Torben Jørgensen10, Daniel Keszthelyi26, Rob Knight27, Markku Laakso28, Matthias Laudes, Lenore J. Launer29, Wolfgang Lieb12, Aldons J. Lusis30, Ad A.M. Masclee26, Henriette A. Moll2, Zlatan Mujagic26, Qi Qibin13, Daphna Rothschild16, Hocheol Shin14, Søren J. Sørensen10, Claire J. Steves6, Jonathan Thorsen17, Nicholas J. Timpson24, Raul Y. Tito3, Sara Vieira-Silva3, Uwe Völker11, Henry Völzke11, Urmo Võsa1, Kaitlin H Wade24, Susanna Walter31, Kyoko Watanabe22, Stefan Weiss11, Frank Ulrich Weiss11, Omer Weissbrod32, Harm-Jan Westra1, Gonneke Willemsen22, Haydeh Payami19, Daisy Jonkers26, Alejandro Arias Vasquez33, Eco J. C. de Geus22, Katie A. Meyer34, Jakob Stokholm17, Eran Segal16, Elin Org15, Cisca Wijmenga1, Hyung Lae Kim35, Robert C. Kaplan36, Tim D. Spector6, André G. Uitterlinden2, Fernando Rivadeneira2, Andre Franke12, Markus M. Lerch11, Lude Franke1, Serena Sanna37, Serena Sanna1, Mauro D'Amato, Oluf Pedersen10, Andrew D. Paterson7, Robert Kraaij2, Jeroen Raes3, Alexandra Zhernakova1 
TL;DR: In this article, the MiBioGen consortium curated and analyzed genome-wide genotypes and 16S fecal microbiome data from 18,340 individuals (24 cohorts) and found high variability across cohorts: only 9 of 410 genera were detected in more than 95% of samples.
Abstract: To study the effect of host genetics on gut microbiome composition, the MiBioGen consortium curated and analyzed genome-wide genotypes and 16S fecal microbiome data from 18,340 individuals (24 cohorts). Microbial composition showed high variability across cohorts: only 9 of 410 genera were detected in more than 95% of samples. A genome-wide association study of host genetic variation regarding microbial taxa identified 31 loci affecting the microbiome at a genome-wide significant (P < 5 × 10−8) threshold. One locus, the lactase (LCT) gene locus, reached study-wide significance (genome-wide association study signal: P = 1.28 × 10−20), and it showed an age-dependent association with Bifidobacterium abundance. Other associations were suggestive (1.95 × 10−10 < P < 5 × 10−8) but enriched for taxa showing high heritability and for genes expressed in the intestine and brain. A phenome-wide association study and Mendelian randomization identified enrichment of microbiome trait loci in the metabolic, nutrition and environment domains and suggested the microbiome might have causal effects in ulcerative colitis and rheumatoid arthritis.

287 citations


Journal ArticleDOI
Ji Chen1, Ji Chen2, Cassandra N. Spracklen3, Cassandra N. Spracklen4  +475 moreInstitutions (146)
TL;DR: This paper aggregated genome-wide association studies comprising up to 281,416 individuals without diabetes (30% non-European ancestry) for whom fasting glucose, 2-h glucose after an oral glucose challenge, glycated hemoglobin and fasting insulin data were available.
Abstract: Glycemic traits are used to diagnose and monitor type 2 diabetes and cardiometabolic health. To date, most genetic studies of glycemic traits have focused on individuals of European ancestry. Here we aggregated genome-wide association studies comprising up to 281,416 individuals without diabetes (30% non-European ancestry) for whom fasting glucose, 2-h glucose after an oral glucose challenge, glycated hemoglobin and fasting insulin data were available. Trans-ancestry and single-ancestry meta-analyses identified 242 loci (99 novel; P < 5 × 10-8), 80% of which had no significant evidence of between-ancestry heterogeneity. Analyses restricted to individuals of European ancestry with equivalent sample size would have led to 24 fewer new loci. Compared with single-ancestry analyses, equivalent-sized trans-ancestry fine-mapping reduced the number of estimated variants in 99% credible sets by a median of 37.5%. Genomic-feature, gene-expression and gene-set analyses revealed distinct biological signatures for each trait, highlighting different underlying biological pathways. Our results increase our understanding of diabetes pathophysiology by using trans-ancestry studies for improved power and resolution.

178 citations


Journal ArticleDOI
TL;DR: LocusZoom as mentioned in this paper is a JavaScript library for creating interactive web-based visualizations of genetic association study results, which can display one or more traits in the context of relevant biological data (such as gene models and other genomic annotation), and allows interactive refinement of analysis models (by selecting linkage disequilibrium reference panels, identifying sets of likely causal variants, or comparisons to the GWAS catalog).
Abstract: LocusZoom.js is a JavaScript library for creating interactive web-based visualizations of genetic association study results. It can display one or more traits in the context of relevant biological data (such as gene models and other genomic annotation), and allows interactive refinement of analysis models (by selecting linkage disequilibrium reference panels, identifying sets of likely causal variants, or comparisons to the GWAS catalog). It can be embedded in web pages to enable data sharing and exploration. Views can be customized and extended to display other data types such as phenome-wide association study (PheWAS) results, chromatin co-accessibility, or eQTL measurements. A new web upload service harmonizes datasets, adds annotations, and makes it easy to explore user-provided result sets. Availability LocusZoom.js is open-source software under a permissive MIT license. Code and documentation are available at: https://github.com/statgen/locuszoom/. Installable packages for all versions are also distributed via NPM. Additional features are provided as standalone libraries to promote reuse. Use with your own GWAS results at https://my.locuszoom.org/. Supplementary information Supplementary data are available at Bioinformatics online.

91 citations


Posted ContentDOI
04 Jan 2021-bioRxiv
TL;DR: LocusZoom as discussed by the authors is a JavaScript library for creating interactive web-based visualizations of genetic association study results, which can display one or more traits in the context of relevant biological data (such as gene models and other genomic annotation), and allows interactive refinement of analysis models (by selecting linkage disequilibrium reference panels, identifying sets of likely causal variants, or comparisons to the GWAS catalog).
Abstract: LocusZoom.js is a JavaScript library for creating interactive web-based visualizations of genetic association study results. It can display one or more traits in the context of relevant biological data (such as gene models and other genomic annotation), and allows interactive refinement of analysis models (by selecting linkage disequilibrium reference panels, identifying sets of likely causal variants, or comparisons to the GWAS catalog). It can be embedded in web pages to enable data sharing and exploration. Views can be customized and extended to display other data types such as phenome-wide association study (PheWAS) results, chromatin co-accessibility, or eQTL measurements. A new web upload service harmonizes datasets, adds annotations, and makes it easy to explore user-provided result sets. Availability LocusZoom.js is open-source software under a permissive MIT license. Code and documentation are available at: https://github.com/statgen/locuszoom/. Installable packages are also distributed via NPM. Additional features are provided as standalone libraries to promote reuse. Use with your own GWAS results at https://my.locuszoom.org/. Contact locuszoom@googlegroups.com

71 citations


Journal ArticleDOI
TL;DR: In this paper, the authors compare the performance of GWAS meta-analyses using a strict P-value threshold of 5'×'10-8' to other multiple testing strategies: (1) less stringent Pvalue thresholds, (2) controlling the FDR with the Benjamini-Hochberg and Benjamani-Yekutieli procedure, and (3) controlling Bayesian FDR with posterior probabilities.
Abstract: Over the last decade, GWAS meta-analyses have used a strict P-value threshold of 5 × 10-8 to classify associations as significant. Here, we use our current understanding of frequently studied traits including lipid levels, height, and BMI to revisit this genome-wide significance threshold. We compare the performance of studies using the P = 5 × 10-8 threshold in terms of true and false positive rate to other multiple testing strategies: (1) less stringent P-value thresholds, (2) controlling the FDR with the Benjamini-Hochberg and Benjamini-Yekutieli procedure, and (3) controlling the Bayesian FDR with posterior probabilities. We applied these procedures to re-analyze results from the Global Lipids and GIANT GWAS meta-analysis consortia and supported them with extensive simulation that mimics the empirical data. We observe in simulated studies with sample sizes ∼20,000 and >120,000 that relaxing the P-value threshold to 5 × 10-7 increased discovery at the cost of 18% and 8% of additional loci being false positive results, respectively. FDR and Bayesian FDR are well controlled for both sample sizes with a few exceptions that disappear under a less stringent definition of true positives and the two approaches yield similar results. Our work quantifies the value of using a relaxed P-value threshold in large studies to increase their true positive discovery but also show the excess false positive rates due to such actions in modest-sized studies. These results may guide investigators considering different thresholds in replication studies and downstream work such as gene-set enrichment or pathway analysis. Finally, we demonstrate the viability of FDR-controlling procedures in GWAS.

27 citations


Posted ContentDOI
Wei Zhou1, Wei Zhou2, Masahiro Kanai, Kuan-Han H. Wu3  +148 moreInstitutions (38)
21 Nov 2021-medRxiv
TL;DR: The Global Biobank Meta-analysis Initiative (GBMI) as mentioned in this paper is a collaborative network of 19 biobanks from 4 continents representing more than 2.1 million consented individuals with genetic data linked to electronic health records.
Abstract: Biobanks are being established across the world to understand the genetic, environmental, and epidemiological basis of human diseases with the goal of better prevention and treatments. Genome-wide association studies (GWAS) have been very successful at mapping genomic loci for a wide range of human diseases and traits, but in general, lack appropriate representation of diverse ancestries - with most biobanks and preceding GWAS studies composed of individuals of European ancestries. Here, we introduce the Global Biobank Meta-analysis Initiative (GBMI) -- a collaborative network of 19 biobanks from 4 continents representing more than 2.1 million consented individuals with genetic data linked to electronic health records. GBMI meta-analyzes summary statistics from GWAS generated using harmonized genotypes and phenotypes from member biobanks. GBMI brings together results from GWAS analysis across 6 main ancestry groups: approximately 33,000 of African ancestry either from Africa or from admixed-ancestry diaspora (AFR), 18,000 admixed American (AMR), 31,000 Central and South Asian (CSA), 341,000 East Asian (EAS), 1.4 million European (EUR), and 1,600 Middle Eastern (MID) individuals. In this flagship project, we generated GWASs from across 14 exemplar diseases and endpoints, including both common and less prevalent diseases that were previously understudied. Using the genetic association results, we validate that GWASs conducted in biobanks worldwide can be successfully integrated despite heterogeneity in case definitions, recruitment strategies, and baseline characteristics between biobanks. We demonstrate the value of this collaborative effort to improve GWAS power for diseases, increase representation, benefit understudied diseases, and improve risk prediction while also enabling the nomination of disease genes and drug candidates by incorporating gene and protein expression data and providing insight into the underlying biology of the studied traits.

24 citations


Journal ArticleDOI
TL;DR: In this article, a trans-disease meta-analysis was applied to 8,016,731 well-imputed genetic markers from large-scale meta-analyses of psoriasis (11,024 cases and 16,336 controls) and type 2 diabetes (74,124 cases and 824,006 controls), adjusted for body mass index.

18 citations


Journal ArticleDOI
Xiaoming Jia1, Fernando S. Goes2, Adam E. Locke3, Duncan Palmer4, Weiqing Wang5, Sarah Cohen-Woods6, Sarah Cohen-Woods7, Giulio Genovese4, Anne U. Jackson8, Chen Jiang9, Mark N. Kvale1, Niamh Mullins5, Hoang T. Nguyen5, Mehdi Pirooznia, Margarita Rivera6, Margarita Rivera10, Douglas M. Ruderfer11, Ling Shen9, Khanh K. Thai9, Matthew Zawistowski8, Yongwen Zhuang8, Gonçalo R. Abecasis8, Huda Akil12, Sarah E. Bergen13, Margit Burmeister, Sinéad B. Chapman4, Melissa DelaBastide14, Anders Juréus13, Hyun Min Kang8, Pui-Yan Kwok1, Jun Li8, Shawn Levy, Eric T. Monson15, Jennifer L. Moran16, Janet L. Sobell17, Stanley J. Watson12, Virginia L. Willour15, Sebastian Zöllner8, Rolf Adolfsson18, Douglas Blackwood19, Michael Boehnke8, Gerome Breen6, Aiden Corvin20, Nicholas John Craddock21, Arianna DiFlorio21, Christina M. Hultman13, Mikael Landén13, Mikael Landén22, Cathryn M. Lewis6, Steven A. McCarroll16, W. Richard McCombie14, Peter McGuffin6, Andrew M. McIntosh19, Andrew McQuillin23, Derek W. Morris20, Derek W. Morris24, Richard M. Myers, Michael Conlon O'Donovan21, Roel A. Ophoff25, Marco P. Boks, René S. Kahn5, Willem H. Ouwehand26, Michael John Owen21, Carlos N. Pato17, Carlos N. Pato27, Michele T. Pato17, Michele T. Pato27, Danielle Posthuma28, James B. Potash2, Andreas Reif29, Pamela Sklar5, Jordan W. Smoller4, Jordan W. Smoller16, Patrick F. Sullivan30, John B. Vincent31, John B. Vincent32, James T.R. Walters21, Benjamin M. Neale16, Benjamin M. Neale4, Shaun Purcell33, Shaun Purcell4, Neil Risch1, Catherine Schaefer9, Eli A. Stahl5, Peter P. Zandi2, Laura J. Scott8 
TL;DR: In this article, the authors examined the protein-coding (exonic) sequences of 3,987 unrelated individuals with bipolar disorder and 5,322 controls of predominantly European ancestry across four cohorts from the Bipolar Sequencing Consortium (BSC).
Abstract: Bipolar disorder (BD) is a serious mental illness with substantial common variant heritability. However, the role of rare coding variation in BD is not well established. We examined the protein-coding (exonic) sequences of 3,987 unrelated individuals with BD and 5,322 controls of predominantly European ancestry across four cohorts from the Bipolar Sequencing Consortium (BSC). We assessed the burden of rare, protein-altering, single nucleotide variants classified as pathogenic or likely pathogenic (P-LP) both exome-wide and within several groups of genes with phenotypic or biologic plausibility in BD. While we observed an increased burden of rare coding P-LP variants within 165 genes identified as BD GWAS regions in 3,987 BD cases (meta-analysis OR = 1.9, 95% CI = 1.3-2.8, one-sided p = 6.0 × 10-4), this enrichment did not replicate in an additional 9,929 BD cases and 14,018 controls (OR = 0.9, one-side p = 0.70). Although BD shares common variant heritability with schizophrenia, in the BSC sample we did not observe a significant enrichment of P-LP variants in SCZ GWAS genes, in two classes of neuronal synaptic genes (RBFOX2 and FMRP) associated with SCZ or in loss-of-function intolerant genes. In this study, the largest analysis of exonic variation in BD, individuals with BD do not carry a replicable enrichment of rare P-LP variants across the exome or in any of several groups of genes with biologic plausibility. Moreover, despite a strong shared susceptibility between BD and SCZ through common genetic variation, we do not observe an association between BD risk and rare P-LP coding variants in genes known to modulate risk for SCZ.

13 citations


Journal ArticleDOI
TL;DR: FIVEx, an interactive eQTL/sQTL browser with an intuitive interface tailored to the functional interpretation of associated variants, is developed, which provides important insights for understanding potential tissue-specific regulatory mechanisms underlying trait-associated signals.
Abstract: SUMMARY Expression quantitative trait loci (eQTLs) characterize the associations between genetic variation and gene expression to provide insights into tissue-specific gene regulation. Interactive visualization of tissue-specific eQTLs or splice QTLs (sQTLs) can facilitate our understanding of functional variants relevant to disease-related traits. However, combining the multi-dimensional nature of eQTLs/sQTLs into a concise and informative visualization is challenging. Existing QTL visualization tools provide useful ways to summarize the unprecedented scale of transcriptomic data but are not necessarily tailored to answer questions about the functional interpretations of trait-associated variants or other variants of interest. We developed FIVEx, an interactive eQTL/sQTL browser with an intuitive interface tailored to the functional interpretation of associated variants. It features the ability to navigate seamlessly between different data views while providing relevant tissue- and locus-specific information to offer users a better understanding of population-scale multi-tissue transcriptomic profiles. Our implementation of the FIVEx browser on the EBI eQTL catalogue, encompassing 16 publicly available RNA-seq studies, provides important insights for understanding potential tissue-specific regulatory mechanisms underlying trait-associated signals. AVAILABILITY AND IMPLEMENTATION A FIVEx instance visualizing EBI eQTL catalogue data can be found at https://fivex.sph.umich.edu. Its source code is open source under an MIT license at https://github.com/statgen/fivex. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

Journal ArticleDOI
TL;DR: In this paper, the authors used a Mendelian randomization framework to identify two loci significantly associated with mt-CN variation: a common variant at the MYB-HBS1L locus and a burden of rare variants in the TMBIM1 gene.
Abstract: Mitochondrial genome copy number (MT-CN) varies among humans and across tissues and is highly heritable, but its causes and consequences are not well understood. When measured by bulk DNA sequencing in blood, MT-CN may reflect a combination of the number of mitochondria per cell and cell-type composition. Here, we studied MT-CN variation in blood-derived DNA from 19184 Finnish individuals using a combination of genome (N = 4163) and exome sequencing (N = 19034) data as well as imputed genotypes (N = 17718). We identified two loci significantly associated with MT-CN variation: a common variant at the MYB-HBS1L locus (P = 1.6 × 10−8), which has previously been associated with numerous hematological parameters; and a burden of rare variants in the TMBIM1 gene (P = 3.0 × 10−8), which has been reported to protect against non-alcoholic fatty liver disease. We also found that MT-CN is strongly associated with insulin levels (P = 2.0 × 10−21) and other metabolic syndrome (metS)-related traits. Using a Mendelian randomization framework, we show evidence that MT-CN measured in blood is causally related to insulin levels. We then applied an MT-CN polygenic risk score (PRS) derived from Finnish data to the UK Biobank, where the association between the PRS and metS traits was replicated. Adjusting for cell counts largely eliminated these signals, suggesting that MT-CN affects metS via cell-type composition. These results suggest that measurements of MT-CN in blood-derived DNA partially reflect differences in cell-type composition and that these differences are causally linked to insulin and related traits.

Posted ContentDOI
George Hindy1, George Hindy2, George Hindy3, Peter Dornbos2  +222 moreInstitutions (83)
26 Aug 2021-bioRxiv
TL;DR: In this article, gene-based association testing of blood lipid levels with rare (minor allele frequency 170,000 individuals from multiple ancestries: 97,493 European, 30,025 South Asian, 16,507 African, 16.440 Hispanic/Latino, 10,420 East Asian, and 1,182 Samoan was performed.
Abstract: Large-scale gene sequencing studies for complex traits have the potential to identify causal genes with therapeutic implications. We performed gene-based association testing of blood lipid levels with rare (minor allele frequency 170,000 individuals from multiple ancestries: 97,493 European, 30,025 South Asian, 16,507 African, 16,440 Hispanic/Latino, 10,420 East Asian, and 1,182 Samoan. We identified 35 genes associated with circulating lipid levels. Ten of these: ALB, SRSF2, JAK2, CREB3L3, TMEM136, VARS, NR1H3, PLA2G12A, PPARG and STAB1 have not been implicated for lipid levels using rare coding variation in population-based samples. We prioritize 32 genes identified in array-based genome-wide association study (GWAS) loci based on gene-based associations, of which three: EVI5, SH2B3, and PLIN1, had no prior evidence of rare coding variant associations. Most of the associated genes showed evidence of association in multiple ancestries. Also, we observed an enrichment of gene-based associations for low-density lipoprotein cholesterol drug target genes, and for genes closest to GWAS index single nucleotide polymorphisms (SNP). Our results demonstrate that gene-based associations can be beneficial for drug target development and provide evidence that the gene closest to the array-based GWAS index SNP is often the functional gene for blood lipid levels.

Journal ArticleDOI
Hunna J. Watson1, Hunna J. Watson2, Hunna J. Watson3, Laura M. Thornton1  +208 moreInstitutions (90)
20 Sep 2021
TL;DR: Early- and typical-onset AN showed distinct genetic correlation patterns with putative risk factors for AN, which provide evidence consistent with a common variant genetic basis for age at onset and implicate biological pathways regulating menarche and reproduction.
Abstract: Background Genetics and biology may influence the age at onset of anorexia nervosa (AN). The aims of this study were to determine whether common genetic variation contributes to AN age at onset and to investigate the genetic associations between age at onset of AN and age at menarche. Methods A secondary analysis of the Psychiatric Genomics Consortium genome-wide association study (GWAS) of AN was performed which included 9,335 cases and 31,981 screened controls, all from European ancestries. We conducted GWASs of age at onset, early-onset AN ( Results Two loci were genome-wide significant in the typical-onset AN GWAS. Heritability estimates (SNP-h2) were 0.01-0.04 for age at onset, 0.16-0.25 for early-onset AN, and 0.17-0.25 for typical-onset AN. Early- and typical-onset AN showed distinct genetic correlation patterns with putative risk factors for AN. Specifically, early-onset AN was significantly genetically correlated with younger age at menarche, and typical-onset AN was significantly negatively genetically correlated with anthropometric traits. Genetic risk scores for age at onset and early-onset AN estimated from independent GWASs significantly predicted age at onset. Mendelian randomization analysis suggested a causal link between younger age at menarche and early-onset AN. Conclusions Our results provide evidence consistent with a common variant genetic basis for age at onset and implicate biological pathways regulating menarche and reproduction.

Journal ArticleDOI
TL;DR: In this paper, the authors surveyed FD comorbidities, heritability, and genetic correlations across a wide spectrum of conditions and traits in 10,078 cases and 351,282 non-FD controls of European ancestry.
Abstract: BACKGROUND Functional dyspepsia (FD) is a common gastrointestinal condition of poorly understood pathophysiology. While symptoms' overlap with other conditions may indicate common pathogenetic mechanisms, genetic predisposition is suspected but has not been adequately investigated. METHODS Using healthcare, questionnaire, and genetic data from three large population-based biobanks (UK Biobank, EGCUT, and MGI), we surveyed FD comorbidities, heritability, and genetic correlations across a wide spectrum of conditions and traits in 10,078 cases and 351,282 non-FD controls of European ancestry. KEY RESULTS In UK Biobank, 281 diagnoses were detected at increased prevalence in FD, based on healthcare records. Among these, gastrointestinal conditions (OR = 4.0, p 0.344), mostly overlapping with those also enriched in FD patients. Suggestive (p < 5.0 × 10-6 ) association with FD risk was detected for 13 loci, with 2 showing nominal replication (p < 0.05) in an independent cohort of 192 FD patients. CONCLUSIONS & INFERENCES FD has a weak heritable component that shows commonalities with multiple conditions across a wide spectrum of pathophysiological domains. This new knowledge contributes to a better understanding of FD etiology and may have implications for improving its treatment.

Journal ArticleDOI
TL;DR: GAUS as discussed by the authors is a method for gene set association analysis that requires only GWAS summary statistics, which can identify the subset of genes that have the maximal evidence of association and can best account for the gene set-phenotype association.
Abstract: Tests of association between a phenotype and a set of genes in a biological pathway can provide insights into the genetic architecture of complex phenotypes beyond those obtained from single-variant or single-gene association analysis. However, most existing gene set tests have limited power to detect gene set-phenotype association when a small fraction of the genes are associated with the phenotype and cannot identify the potentially "active" genes that might drive a gene set-based association. To address these issues, we have developed Gene set analysis Association Using Sparse Signals (GAUSS), a method for gene set association analysis that requires only GWAS summary statistics. For each significantly associated gene set, GAUSS identifies the subset of genes that have the maximal evidence of association and can best account for the gene set association. Using pre-computed correlation structure among test statistics from a reference panel, our p value calculation is substantially faster than other permutation- or simulation-based approaches. In simulations with varying proportions of causal genes, we find that GAUSS effectively controls type 1 error rate and has greater power than several existing methods, particularly when a small proportion of genes account for the gene set signal. Using GAUSS, we analyzed UK Biobank GWAS summary statistics for 10,679 gene sets and 1,403 binary phenotypes. We found that GAUSS is scalable and identified 13,466 phenotype and gene set association pairs. Within these gene sets, we identify an average of 17.2 (max = 405) genes that underlie these gene set associations.

Journal ArticleDOI
17 May 2021-Genetics
TL;DR: The Robust Unified Test for Hardy-Weinberg Equilibrium (RUTH) as discussed by the authors was proposed to test for HWE while accounting for population structure and genotype uncertainty, and to evaluate the impact of population heterogeneity and uncertainty on the standard HWE tests and alternative methods using simulated and real sequence data sets.
Abstract: Traditional Hardy-Weinberg equilibrium (HWE) tests (the χ2 test and the exact test) have long been used as a metric for evaluating genotype quality, as technical artifacts leading to incorrect genotype calls often can be identified as deviations from HWE. However, in data sets composed of individuals from diverse ancestries, HWE can be violated even without genotyping error, complicating the use of HWE testing to assess genotype data quality. In this manuscript, we present the Robust Unified Test for HWE (RUTH) to test for HWE while accounting for population structure and genotype uncertainty, and to evaluate the impact of population heterogeneity and genotype uncertainty on the standard HWE tests and alternative methods using simulated and real sequence data sets. Our results demonstrate that ignoring population structure or genotype uncertainty in HWE tests can inflate false-positive rates by many orders of magnitude. Our evaluations demonstrate different tradeoffs between false positives and statistical power across the methods, with RUTH consistently among the best across all evaluations. RUTH is implemented as a practical and scalable software tool to rapidly perform HWE tests across millions of markers and hundreds of thousands of individuals while supporting standard VCF/BCF formats. RUTH is publicly available at https://www.github.com/statgen/ruth.

Journal ArticleDOI
TL;DR: In this paper, the contribution of genome structural variation (SV) to quantitative traits associated with cardiometabolic diseases remains largely unknown, and the authors used sensitive methods to identify and genotype 129,166 high confidence SVs from deep whole-genome sequencing (WGS) data of 4,848 individuals.
Abstract: Summary The contribution of genome structural variation (SV) to quantitative traits associated with cardiometabolic diseases remains largely unknown. Here, we present the results of a study examining genetic association between SVs and cardiometabolic traits in the Finnish population. We used sensitive methods to identify and genotype 129,166 high-confidence SVs from deep whole-genome sequencing (WGS) data of 4,848 individuals. We tested the 64,572 common and low-frequency SVs for association with 116 quantitative traits and tested candidate associations using exome sequencing and array genotype data from an additional 15,205 individuals. We discovered 31 genome-wide significant associations at 15 loci, including 2 loci at which SVs have strong phenotypic effects: (1) a deletion of the ALB promoter that is greatly enriched in the Finnish population and causes decreased serum albumin level in carriers (p = 1.47 × 10−54) and is also associated with increased levels of total cholesterol (p = 1.22 × 10−28) and 14 additional cholesterol-related traits, and (2) a multi-allelic copy number variant (CNV) at PDPR that is strongly associated with pyruvate (p = 4.81 × 10−21) and alanine (p = 6.14 × 10−12) levels and resides within a structurally complex genomic region that has accumulated many rearrangements over evolutionary time. We also confirmed six previously reported associations, including five led by stronger signals in single nucleotide variants (SNVs) and one linking recurrent HP gene deletion and cholesterol levels (p = 6.24 × 10−10), which was also found to be strongly associated with increased glycoprotein level (p = 3.53 × 10−35). Our study confirms that integrating SVs in trait-mapping studies will expand our knowledge of genetic factors underlying disease risk.

Posted ContentDOI
20 Oct 2021-medRxiv
TL;DR: This article explored the impact of rare variants (minor allele frequency, MAF) on the effect of rare allele frequency on the performance of the MAF-based MAF estimator.
Abstract: Few studies have explored the impact of rare variants (minor allele frequency, MAF

Posted ContentDOI
28 Sep 2021
TL;DR: It is demonstrated that standard Bonferroni and permutation-based methods for multiple testing correction are inadequate for a holistic analysis of biobank data and a single-iteration permutation method is proposed that is computationally feasible and provides false discovery rate (FDR) estimates tailored to individual datasets and variant frequencies.
Abstract: Biobanks housing genetic and phenotypic data for thousands of individuals introduce new opportunities and challenges for genetic association studies. Association testing across many phenotypes increases the multiple-testing burden and correlation between phenotypes makes appropriate multiple-testing correction uncertain. Moreover, analysis including low-frequency variants results in inflated type I error due to the much larger number of tests and the elevated importance of each individual minor allele carrier in those tests. Here we demonstrate that standard Bonferroni and permutation-based methods for multiple testing correction are inadequate for a holistic analysis of biobank data because ideal significance thresholds vary across datasets and minor allele frequencies. We propose a single-iteration permutation method that is computationally feasible and provides false discovery rate (FDR) estimates tailored to individual datasets and variant frequencies. Each dataset’s unique FDR estimates provide customized levels of confidence for association results and enable informed interpretation of genetic association studies across the phenome.

Journal Article

Journal ArticleDOI
TL;DR: This work proposes a score test under the modified random effects model for gene-/region-based rare variants associations, adapting the kernel regression framework to construct the model and incorporating genetic similarities across populations into modeling the heterogeneity structure of the genetic effect coefficients.
Abstract: Trans-ethnic meta-analysis is a powerful tool for detecting novel loci in genetic association studies. However, in the presence of heterogeneity among different populations, existing gene-/region-based rare variants meta-analysis methods may be unsatisfactory because they do not consider genetic similarity or dissimilarity among different populations. In response, we propose a score test under the modified random effects model for gene-/region-based rare variants associations. We adapt the kernel regression framework to construct the model and incorporate genetic similarities across populations into modeling the heterogeneity structure of the genetic effect coefficients. We use a resampling-based copula method to approximate asymptotic distribution of the test statistic, enabling efficient estimation of p-values. Simulation studies show that our proposed method controls type I error rates and increases power over existing approaches in the presence of heterogeneity. We illustrate our method by analyzing T2D-GENES consortium exome sequence data to explore rare variant associations with several traits.