scispace - formally typeset
Search or ask a question

Showing papers in "Nature Genetics in 2021"


Journal ArticleDOI
TL;DR: ArchR as discussed by the authors is a software suite for single-cell analysis of regulatory chromatin in R (ArchR; https://www.archrproject.com/ ) that enables fast and comprehensive analysis of singlecell chromatin accessibility data.
Abstract: The advent of single-cell chromatin accessibility profiling has accelerated the ability to map gene regulatory landscapes but has outpaced the development of scalable software to rapidly extract biological meaning from these data. Here we present a software suite for single-cell analysis of regulatory chromatin in R (ArchR; https://www.archrproject.com/ ) that enables fast and comprehensive analysis of single-cell chromatin accessibility data. ArchR provides an intuitive, user-focused interface for complex single-cell analyses, including doublet removal, single-cell clustering and cell type identification, unified peak set generation, cellular trajectory identification, DNA element-to-gene linkage, transcription factor footprinting, mRNA expression level prediction from chromatin accessibility and multi-omic integration with single-cell RNA sequencing (scRNA-seq). Enabling the analysis of over 1.2 million single cells within 8 h on a standard Unix laptop, ArchR is a comprehensive software suite for end-to-end analysis of single-cell chromatin accessibility that will accelerate the understanding of gene regulation at the resolution of individual cells.

406 citations


Journal ArticleDOI
TL;DR: The authors performed a genome-wide association study of 41,917 bipolar disorder cases and 371,549 controls of European ancestry, which identified 64 associated genomic loci, including genes encoding targets of antipsychotics, calcium channel blockers, antiepileptics and anesthetics.
Abstract: Bipolar disorder is a heritable mental illness with complex etiology. We performed a genome-wide association study of 41,917 bipolar disorder cases and 371,549 controls of European ancestry, which identified 64 associated genomic loci. Bipolar disorder risk alleles were enriched in genes in synaptic signaling pathways and brain-expressed genes, particularly those with high specificity of expression in neurons of the prefrontal cortex and hippocampus. Significant signal enrichment was found in genes encoding targets of antipsychotics, calcium channel blockers, antiepileptics and anesthetics. Integrating expression quantitative trait locus data implicated 15 genes robustly linked to bipolar disorder via gene expression, encoding druggable targets such as HTR6, MCHR1, DCLK3 and FURIN. Analyses of bipolar disorder subtypes indicated high but imperfect genetic correlation between bipolar disorder type I and II and identified additional associated loci. Together, these results advance our understanding of the biological etiology of bipolar disorder, identify novel therapeutic leads and prioritize genes for functional follow-up studies.

378 citations


Journal ArticleDOI
Urmo Võsa1, Annique Claringbould2, Annique Claringbould3, Harm-Jan Westra1, Marc Jan Bonder1, Patrick Deelen, Biao Zeng4, Holger Kirsten5, Ashis Saha6, Roman Kreuzhuber7, Roman Kreuzhuber3, Roman Kreuzhuber8, Seyhan Yazar9, Harm Brugge1, Roy Oelen1, Dylan H. de Vries1, Monique G. P. van der Wijst1, Silva Kasela10, Natalia Pervjakova10, Isabel Alves11, Marie-Julie Favé11, Mawusse Agbessi11, Mark W. Christiansen12, Rick Jansen13, Ilkka Seppälä, Lin Tong14, Alexander Teumer15, Katharina Schramm16, Gibran Hemani17, Joost Verlouw18, Hanieh Yaghootkar19, Hanieh Yaghootkar20, Hanieh Yaghootkar21, Reyhan Sönmez Flitman22, Reyhan Sönmez Flitman23, Andrew A. Brown24, Andrew A. Brown25, Viktorija Kukushkina10, Anette Kalnapenkis10, Sina Rüeger23, Eleonora Porcu23, Jaanika Kronberg10, Johannes Kettunen, Bernett Lee26, Futao Zhang27, Ting Qi27, Jose Alquicira Hernandez9, Wibowo Arindrarto28, Frank Beutner5, Peter A C 't Hoen29, Joyce B. J. van Meurs18, Jenny van Dongen13, Maarten van Iterson28, Morris A. Swertz, Julia Dmitrieva30, Mahmoud Elansary30, Benjamin P. Fairfax31, Michel Georges30, Bastiaan T. Heijmans28, Alex W. Hewitt32, Mika Kähönen, Yungil Kim6, Yungil Kim33, Julian C. Knight31, Peter Kovacs5, Knut Krohn5, Shuang Li1, Markus Loeffler5, Urko M. Marigorta34, Urko M. Marigorta4, Hailang Mei28, Yukihide Momozawa30, Martina Müller-Nurasyid16, Matthias Nauck15, Michel G. Nivard35, Brenda W.J.H. Penninx13, Jonathan K. Pritchard36, Olli T. Raitakari37, Olli T. Raitakari38, Olaf Rötzschke26, Eline Slagboom28, Coen D.A. Stehouwer39, Michael Stumvoll5, Patrick F. Sullivan40, Joachim Thiery5, Anke Tönjes5, Jan H. Veldink41, Uwe Völker15, Robert Warmerdam1, Cisca Wijmenga1, Morris Swertz, Anand Kumar Andiappan26, Grant W. Montgomery27, Samuli Ripatti42, Markus Perola43, Zoltán Kutalik23, Emmanouil T. Dermitzakis22, Emmanouil T. Dermitzakis25, Sven Bergmann22, Sven Bergmann23, Timothy M. Frayling20, Holger Prokisch44, Habibul Ahsan14, Brandon L. Pierce14, Terho Lehtimäki, Dorret I. Boomsma13, Bruce M. Psaty12, Sina A. Gharib12, Philip Awadalla11, Lili Milani10, Willem H. Ouwehand45, Willem H. Ouwehand8, Willem H. Ouwehand7, Kate Downes8, Kate Downes7, Oliver Stegle46, Oliver Stegle3, Alexis Battle6, Peter M. Visscher27, Jian Yang27, Jian Yang47, Markus Scholz5, Joseph E. Powell9, Joseph E. Powell48, Greg Gibson4, Tõnu Esko10, Lude Franke1 
TL;DR: In this article, the authors performed cis-and trans-expression quantitative trait locus (eQTL) analyses using blood-derived expression from 31,684 individuals through the eQTLGen Consortium.
Abstract: Trait-associated genetic variants affect complex phenotypes primarily via regulatory mechanisms on the transcriptome. To investigate the genetics of gene expression, we performed cis- and trans-expression quantitative trait locus (eQTL) analyses using blood-derived expression from 31,684 individuals through the eQTLGen Consortium. We detected cis-eQTL for 88% of genes, and these were replicable in numerous tissues. Distal trans-eQTL (detected for 37% of 10,317 trait-associated variants tested) showed lower replication rates, partially due to low replication power and confounding by cell type composition. However, replication analyses in single-cell RNA-seq data prioritized intracellular trans-eQTL. Trans-eQTL exerted their effects via several mechanisms, primarily through regulation by transcription factors. Expression of 13% of the genes correlated with polygenic scores for 1,263 phenotypes, pinpointing potential drivers for those traits. In summary, this work represents a large eQTL resource, and its results serve as a starting point for in-depth interpretation of complex phenotypes.

344 citations


Journal ArticleDOI
TL;DR: In this paper, a single-cell and spatially resolved transcriptomics analysis of human breast cancers is presented, which reveals recurrent neoplastic cell heterogeneity and heterotypic interactions play central roles in disease progression.
Abstract: Breast cancers are complex cellular ecosystems where heterotypic interactions play central roles in disease progression and response to therapy. However, our knowledge of their cellular composition and organization is limited. Here we present a single-cell and spatially resolved transcriptomics analysis of human breast cancers. We developed a single-cell method of intrinsic subtype classification (SCSubtype) to reveal recurrent neoplastic cell heterogeneity. Immunophenotyping using cellular indexing of transcriptomes and epitopes by sequencing (CITE-seq) provides high-resolution immune profiles, including new PD-L1/PD-L2+ macrophage populations associated with clinical outcome. Mesenchymal cells displayed diverse functions and cell-surface protein expression through differentiation within three major lineages. Stromal-immune niches were spatially organized in tumors, offering insights into antitumor immune regulation. Using single-cell signatures, we deconvoluted large breast cancer cohorts to stratify them into nine clusters, termed 'ecotypes', with unique cellular compositions and clinical outcomes. This study provides a comprehensive transcriptional atlas of the cellular architecture of breast cancer.

303 citations


Journal ArticleDOI
TL;DR: In this paper, the authors conducted 220 deep-phenotype genome-wide association studies (diseases, biomarkers and medication usage) in BioBank Japan (n = 179,000), by incorporating past medical history and text-mining of electronic medical records.
Abstract: Current genome-wide association studies do not yet capture sufficient diversity in populations and scope of phenotypes. To expand an atlas of genetic associations in non-European populations, we conducted 220 deep-phenotype genome-wide association studies (diseases, biomarkers and medication usage) in BioBank Japan (n = 179,000), by incorporating past medical history and text-mining of electronic medical records. Meta-analyses with the UK Biobank and FinnGen (ntotal = 628,000) identified ~5,000 new loci, which improved the resolution of the genomic map of human traits. This atlas elucidated the landscape of pleiotropy as represented by the major histocompatibility complex locus, where we conducted HLA fine-mapping. Finally, we performed statistical decomposition of matrices of phenome-wide summary statistics, and identified latent genetic components, which pinpointed responsible variants and biological mechanisms underlying current disease classifications across populations. The decomposed components enabled genetically informed subtyping of similar diseases (for example, allergic diseases). Our study suggests a potential avenue for hypothesis-free re-investigation of human diseases through genetics.

291 citations


Journal ArticleDOI
Alexander Kurilshikov1, Carolina Medina-Gomez2, Rodrigo Bacigalupe3, Djawad Radjabzadeh2, Jun Wang4, Jun Wang3, Ayse Demirkan1, Ayse Demirkan5, Caroline I. Le Roy6, Juan Antonio Raygoza Garay7, Casey T. Finnicum8, Xingrong Liu9, Daria V. Zhernakova1, Marc Jan Bonder1, Tue H. Hansen10, Fabian Frost11, Malte C. Rühlemann12, Williams Turpin7, Jee-Young Moon13, Han-Na Kim14, Kreete Lüll15, Elad Barkan16, Shiraz A. Shah17, Myriam Fornage18, Joanna Szopinska-Tokov, Zachary D. Wallen19, Dmitrii Borisevich10, Lars Agréus9, Anna Andreasson20, Corinna Bang12, Larbi Bedrani7, Jordana T. Bell6, Hans Bisgaard17, Michael Boehnke21, Dorret I. Boomsma22, Robert D. Burk13, Annique Claringbould1, Kenneth Croitoru7, Gareth E. Davies8, Gareth E. Davies22, Cornelia M. van Duijn23, Cornelia M. van Duijn2, Liesbeth Duijts2, Gwen Falony3, Jingyuan Fu1, Adriaan van der Graaf1, Torben Hansen10, Georg Homuth11, David A. Hughes24, Richard G. IJzerman25, Matthew A. Jackson23, Matthew A. Jackson6, Vincent W. V. Jaddoe2, Marie Joossens3, Torben Jørgensen10, Daniel Keszthelyi26, Rob Knight27, Markku Laakso28, Matthias Laudes, Lenore J. Launer29, Wolfgang Lieb12, Aldons J. Lusis30, Ad A.M. Masclee26, Henriette A. Moll2, Zlatan Mujagic26, Qi Qibin13, Daphna Rothschild16, Hocheol Shin14, Søren J. Sørensen10, Claire J. Steves6, Jonathan Thorsen17, Nicholas J. Timpson24, Raul Y. Tito3, Sara Vieira-Silva3, Uwe Völker11, Henry Völzke11, Urmo Võsa1, Kaitlin H Wade24, Susanna Walter31, Kyoko Watanabe22, Stefan Weiss11, Frank Ulrich Weiss11, Omer Weissbrod32, Harm-Jan Westra1, Gonneke Willemsen22, Haydeh Payami19, Daisy Jonkers26, Alejandro Arias Vasquez33, Eco J. C. de Geus22, Katie A. Meyer34, Jakob Stokholm17, Eran Segal16, Elin Org15, Cisca Wijmenga1, Hyung Lae Kim35, Robert C. Kaplan36, Tim D. Spector6, André G. Uitterlinden2, Fernando Rivadeneira2, Andre Franke12, Markus M. Lerch11, Lude Franke1, Serena Sanna37, Serena Sanna1, Mauro D'Amato, Oluf Pedersen10, Andrew D. Paterson7, Robert Kraaij2, Jeroen Raes3, Alexandra Zhernakova1 
TL;DR: In this article, the MiBioGen consortium curated and analyzed genome-wide genotypes and 16S fecal microbiome data from 18,340 individuals (24 cohorts) and found high variability across cohorts: only 9 of 410 genera were detected in more than 95% of samples.
Abstract: To study the effect of host genetics on gut microbiome composition, the MiBioGen consortium curated and analyzed genome-wide genotypes and 16S fecal microbiome data from 18,340 individuals (24 cohorts). Microbial composition showed high variability across cohorts: only 9 of 410 genera were detected in more than 95% of samples. A genome-wide association study of host genetic variation regarding microbial taxa identified 31 loci affecting the microbiome at a genome-wide significant (P < 5 × 10−8) threshold. One locus, the lactase (LCT) gene locus, reached study-wide significance (genome-wide association study signal: P = 1.28 × 10−20), and it showed an age-dependent association with Bifidobacterium abundance. Other associations were suggestive (1.95 × 10−10 < P < 5 × 10−8) but enriched for taxa showing high heritability and for genes expressed in the intestine and brain. A phenome-wide association study and Mendelian randomization identified enrichment of microbiome trait loci in the metabolic, nutrition and environment domains and suggested the microbiome might have causal effects in ulcerative colitis and rheumatoid arthritis.

287 citations


Journal ArticleDOI
Douglas P Wightman1, Iris E. Jansen1, Jeanne E. Savage1, Alexey A. Shadrin2, Shahram Bahrami3, Shahram Bahrami2, Dominic Holland4, Arvid Rongve5, Sigrid Børte3, Sigrid Børte6, Sigrid Børte2, Bendik S. Winsvold3, Bendik S. Winsvold6, Ole Kristian Drange6, Amy E Martinsen6, Amy E Martinsen3, Amy E Martinsen2, Anne Heidi Skogholt6, Cristen J. Willer7, Geir Bråthen6, Ingunn Bosnes8, Ingunn Bosnes6, Jonas B. Nielsen6, Jonas B. Nielsen9, Jonas B. Nielsen7, Lars G. Fritsche7, Laurent F. Thomas6, Linda M. Pedersen3, Maiken Elvestad Gabrielsen6, Marianne Bakke Johnsen2, Marianne Bakke Johnsen6, Marianne Bakke Johnsen3, Tore Wergeland Meisingset6, Wei Zhou7, Wei Zhou10, Petroula Proitsi11, Angela Hodges11, Richard Dobson, Latha Velayudhan11, Karl Heilbron, Adam Auton, Julia M. Sealock12, Lea K. Davis12, Nancy L. Pedersen13, Chandra A. Reynolds14, Ida K. Karlsson15, Ida K. Karlsson13, Sigurdur H. Magnusson16, Hreinn Stefansson16, Steinunn Thordardottir, Palmi V. Jonsson17, Jon Snaedal, Anna Zettergren18, Ingmar Skoog18, Ingmar Skoog19, Silke Kern19, Silke Kern18, Margda Waern19, Margda Waern18, Henrik Zetterberg, Kaj Blennow19, Kaj Blennow18, Eystein Stordal8, Eystein Stordal6, Kristian Hveem6, John-Anker Zwart6, John-Anker Zwart2, John-Anker Zwart3, Lavinia Athanasiu3, Lavinia Athanasiu2, Per Selnes20, Ingvild Saltvedt6, Sigrid Botne Sando6, Ingun Ulstein3, Srdjan Djurovic3, Srdjan Djurovic5, Tormod Fladby2, Tormod Fladby20, Dag Aarsland11, Dag Aarsland21, Geir Selbæk2, Geir Selbæk3, Stephan Ripke10, Stephan Ripke22, Stephan Ripke23, Kari Stefansson16, Ole A. Andreassen3, Ole A. Andreassen2, Danielle Posthuma1, Danielle Posthuma24 
TL;DR: This paper identified microglia, immune cells and protein catabolism as relevant genes for late-onset Alzheimer's disease, while identifying and prioritizing previously unidentified genes of potential interest.
Abstract: Late-onset Alzheimer's disease is a prevalent age-related polygenic disease that accounts for 50-70% of dementia cases. Currently, only a fraction of the genetic variants underlying Alzheimer's disease have been identified. Here we show that increased sample sizes allowed identification of seven previously unidentified genetic loci contributing to Alzheimer's disease. This study highlights microglia, immune cells and protein catabolism as relevant to late-onset Alzheimer's disease, while identifying and prioritizing previously unidentified genes of potential interest. We anticipate that these results can be included in larger meta-analyses of Alzheimer's disease to identify further genetic variants that contribute to Alzheimer's pathology.

269 citations


Journal ArticleDOI
TL;DR: In this article, the genetic basis of 35 blood and urine laboratory measurements in the UK Biobank (n = 363,228 individuals) was evaluated and the results delineate the genetic underlying of biomarkers and their causal influences on diseases and improve genetic risk stratification for common diseases.
Abstract: Clinical laboratory tests are a critical component of the continuum of care. We evaluate the genetic basis of 35 blood and urine laboratory measurements in the UK Biobank (n = 363,228 individuals). We identify 1,857 loci associated with at least one trait, containing 3,374 fine-mapped associations and additional sets of large-effect (>0.1 s.d.) protein-altering, human leukocyte antigen (HLA) and copy number variant (CNV) associations. Through Mendelian randomization (MR) analysis, we discover 51 causal relationships, including previously known agonistic effects of urate on gout and cystatin C on stroke. Finally, we develop polygenic risk scores (PRSs) for each biomarker and build 'multi-PRS' models for diseases using 35 PRSs simultaneously, which improved chronic kidney disease, type 2 diabetes, gout and alcoholic cirrhosis genetic risk stratification in an independent dataset (FinnGen; n = 135,500) relative to single-disease PRSs. Together, our results delineate the genetic basis of biomarkers and their causal influences on diseases and improve genetic risk stratification for common diseases.

262 citations


Journal ArticleDOI
TL;DR: RegenerIE as mentioned in this paper is a whole-genome regression method based on ridge regression that enables highly parallelized analysis of quantitative and binary traits in biobank-scale data with reduced computational requirements.
Abstract: Genome-wide association analysis of cohorts with thousands of phenotypes is computationally expensive, particularly when accounting for sample relatedness or population structure. Here we present a novel machine-learning method called REGENIE for fitting a whole-genome regression model for quantitative and binary phenotypes that is substantially faster than alternatives in multi-trait analyses while maintaining statistical efficiency. The method naturally accommodates parallel analysis of multiple phenotypes and requires only local segments of the genotype matrix to be loaded in memory, in contrast to existing alternatives, which must load genome-wide matrices into memory. This results in substantial savings in compute time and memory usage. We introduce a fast, approximate Firth logistic regression test for unbalanced case–control phenotypes. The method is ideally suited to take advantage of distributed computing frameworks. We demonstrate the accuracy and computational benefits of this approach using the UK Biobank dataset with up to 407,746 individuals. REGENIE is a whole-genome regression method based on ridge regression that enables highly parallelized analysis of quantitative and binary traits in biobank-scale data with reduced computational requirements.

239 citations


Journal ArticleDOI
TL;DR: BPNet as discussed by the authors uses DNA sequence to predict base-resolution chromatin immunoprecipitation (ChIP)-nexus binding profiles of pluripotency transcription factor (TF) binding motifs.
Abstract: The arrangement (syntax) of transcription factor (TF) binding motifs is an important part of the cis-regulatory code, yet remains elusive. We introduce a deep learning model, BPNet, that uses DNA sequence to predict base-resolution chromatin immunoprecipitation (ChIP)-nexus binding profiles of pluripotency TFs. We develop interpretation tools to learn predictive motif representations and identify soft syntax rules for cooperative TF binding interactions. Strikingly, Nanog preferentially binds with helical periodicity, and TFs often cooperate in a directional manner, which we validate using clustered regularly interspaced short palindromic repeat (CRISPR)-induced point mutations. Our model represents a powerful general approach to uncover the motifs and syntax of cis-regulatory sequences in genomics data.

229 citations


Journal ArticleDOI
TL;DR: In this paper, it was shown that CRISPR-Cas9 editing generates structural defects of the nucleus, micronuclei and chromosome bridges, which initiate a mutational process called chromothripsis.
Abstract: Genome editing has therapeutic potential for treating genetic diseases and cancer. However, the currently most practicable approaches rely on the generation of DNA double-strand breaks (DSBs), which can give rise to a poorly characterized spectrum of chromosome structural abnormalities. Here, using model cells and single-cell whole-genome sequencing, as well as by editing at a clinically relevant locus in clinically relevant cells, we show that CRISPR-Cas9 editing generates structural defects of the nucleus, micronuclei and chromosome bridges, which initiate a mutational process called chromothripsis. Chromothripsis is extensive chromosome rearrangement restricted to one or a few chromosomes that can cause human congenital disease and cancer. These results demonstrate that chromothripsis is a previously unappreciated on-target consequence of CRISPR-Cas9-generated DSBs. As genome editing is implemented in the clinic, the potential for extensive chromosomal rearrangements should be considered and monitored.

Journal ArticleDOI
David V. Conti1, Burcu F. Darst1, Lilit C. Moss1, Edward J. Saunders2  +251 moreInstitutions (100)
TL;DR: This paper conducted a meta-analysis of prostate cancer genome-wide association studies (107,247 cases and 127,006 controls) and identified 86 new genetic risk variants independently associated with prostate cancer risk, bringing the total to 269 known risk variants.
Abstract: Prostate cancer is a highly heritable disease with large disparities in incidence rates across ancestry populations. We conducted a multiancestry meta-analysis of prostate cancer genome-wide association studies (107,247 cases and 127,006 controls) and identified 86 new genetic risk variants independently associated with prostate cancer risk, bringing the total to 269 known risk variants. The top genetic risk score (GRS) decile was associated with odds ratios that ranged from 5.06 (95% confidence interval (CI), 4.84–5.29) for men of European ancestry to 3.74 (95% CI, 3.36–4.17) for men of African ancestry. Men of African ancestry were estimated to have a mean GRS that was 2.18-times higher (95% CI, 2.14–2.22), and men of East Asian ancestry 0.73-times lower (95% CI, 0.71–0.76), than men of European ancestry. These findings support the role of germline variation contributing to population differences in prostate cancer risk, with the GRS offering an approach for personalized risk prediction.

Journal ArticleDOI
TL;DR: In this paper, the authors performed an updated genome-wide AD meta-analysis, which identified 37 risk loci, including new associations near CCDC6, TSPAN14, NCK2 and SPRED2.
Abstract: Genome-wide association studies have discovered numerous genomic loci associated with Alzheimer's disease (AD); yet the causal genes and variants are incompletely identified. We performed an updated genome-wide AD meta-analysis, which identified 37 risk loci, including new associations near CCDC6, TSPAN14, NCK2 and SPRED2. Using three SNP-level fine-mapping methods, we identified 21 SNPs with >50% probability each of being causally involved in AD risk and others strongly suggested by functional annotation. We followed this with colocalization analyses across 109 gene expression quantitative trait loci datasets and prioritization of genes by using protein interaction networks and tissue-specific expression. Combining this information into a quantitative score, we found that evidence converged on likely causal genes, including the above four genes, and those at previously discovered AD loci, including BIN1, APH1B, PTK2B, PILRA and CASS4.

Journal ArticleDOI
TL;DR: The Polygenic Score (PGS) catalog as discussed by the authors is an open resource of published scores (including variants, alleles and weights) and consistently curated metadata required for reproducibility and independent applications.
Abstract: We present the Polygenic Score (PGS) Catalog ( https://www.PGSCatalog.org ), an open resource of published scores (including variants, alleles and weights) and consistently curated metadata required for reproducibility and independent applications. The PGS Catalog has capabilities for user deposition, expert curation and programmatic access, thus providing the community with a platform for PGS dissemination, research and translation.

Journal ArticleDOI
TL;DR: The UK Biobank Exome Sequencing Consortium (UKB-ESC) as mentioned in this paper is a private-public partnership between the UK Biopartition and eight biopharmaceutical companies that will complete the sequencing of exomes for all ~500,000 UKB participants.
Abstract: The UK Biobank Exome Sequencing Consortium (UKB-ESC) is a private–public partnership between the UK Biobank (UKB) and eight biopharmaceutical companies that will complete the sequencing of exomes for all ~500,000 UKB participants. Here, we describe the early results from ~200,000 UKB participants and the features of this project that enabled its success. The biopharmaceutical industry has increasingly used human genetics to improve success in drug discovery. Recognizing the need for large-scale human genetics data, as well as the unique value of the data access and contribution terms of the UKB, the UKB-ESC was formed. As a result, exome data from 200,643 UKB enrollees are now available. These data include ~10 million exonic variants—a rich resource of rare coding variation that is particularly valuable for drug discovery. The UKB-ESC precompetitive collaboration has further strengthened academic and industry ties and has provided teams with an opportunity to interact with and learn from the wider research community. The UK Biobank Exome Sequencing Consortium aims to sequence all the exomes of approximately 500,000 UK Biobank participants. This Perspective describes the results from approximately 200,000 exomes and discusses the lessons learned from this UK Biobank–biopharmaceutical company collaboration.

Journal ArticleDOI
Ji Chen1, Ji Chen2, Cassandra N. Spracklen3, Cassandra N. Spracklen4  +475 moreInstitutions (146)
TL;DR: This paper aggregated genome-wide association studies comprising up to 281,416 individuals without diabetes (30% non-European ancestry) for whom fasting glucose, 2-h glucose after an oral glucose challenge, glycated hemoglobin and fasting insulin data were available.
Abstract: Glycemic traits are used to diagnose and monitor type 2 diabetes and cardiometabolic health. To date, most genetic studies of glycemic traits have focused on individuals of European ancestry. Here we aggregated genome-wide association studies comprising up to 281,416 individuals without diabetes (30% non-European ancestry) for whom fasting glucose, 2-h glucose after an oral glucose challenge, glycated hemoglobin and fasting insulin data were available. Trans-ancestry and single-ancestry meta-analyses identified 242 loci (99 novel; P < 5 × 10-8), 80% of which had no significant evidence of between-ancestry heterogeneity. Analyses restricted to individuals of European ancestry with equivalent sample size would have led to 24 fewer new loci. Compared with single-ancestry analyses, equivalent-sized trans-ancestry fine-mapping reduced the number of estimated variants in 99% credible sets by a median of 37.5%. Genomic-feature, gene-expression and gene-set analyses revealed distinct biological signatures for each trait, highlighting different underlying biological pathways. Our results increase our understanding of diabetes pathophysiology by using trans-ancestry studies for improved power and resolution.

Journal ArticleDOI
TL;DR: In this article, a tree-based data structure encoding the inferred evolutionary history of the SARS-CoV-2 virus was proposed to enable real-time genomic contact tracing, which greatly improves the speed of phylogenetic placement of new samples and data visualization.
Abstract: As the SARS-CoV-2 virus spreads through human populations, the unprecedented accumulation of viral genome sequences is ushering in a new era of 'genomic contact tracing'-that is, using viral genomes to trace local transmission dynamics. However, because the viral phylogeny is already so large-and will undoubtedly grow many fold-placing new sequences onto the tree has emerged as a barrier to real-time genomic contact tracing. Here, we resolve this challenge by building an efficient tree-based data structure encoding the inferred evolutionary history of the virus. We demonstrate that our approach greatly improves the speed of phylogenetic placement of new samples and data visualization, making it possible to complete the placements under the constraints of real-time contact tracing. Thus, our method addresses an important need for maintaining a fully updated reference phylogeny. We make these tools available to the research community through the University of California Santa Cruz SARS-CoV-2 Genome Browser to enable rapid cross-referencing of information in new virus sequences with an ever-expanding array of molecular and structural biology data. The methods described here will empower research and genomic contact tracing for SARS-CoV-2 specifically for laboratories worldwide.

Journal ArticleDOI
TL;DR: In this paper, the authors dissect the signaling pathways that determine cell fate of the epithelial lineages in the lumenal and glandular microenvironments of the endometrium.
Abstract: The endometrium, the mucosal lining of the uterus, undergoes dynamic changes throughout the menstrual cycle in response to ovarian hormones. We have generated dense single-cell and spatial reference maps of the human uterus and three-dimensional endometrial organoid cultures. We dissect the signaling pathways that determine cell fate of the epithelial lineages in the lumenal and glandular microenvironments. Our benchmark of the endometrial organoids reveals the pathways and cell states regulating differentiation of the secretory and ciliated lineages both in vivo and in vitro. In vitro downregulation of WNT or NOTCH pathways increases the differentiation efficiency along the secretory and ciliated lineages, respectively. We utilize our cellular maps to deconvolute bulk data from endometrial cancers and endometriotic lesions, illuminating the cell types dominating in each of these disorders. These mechanistic insights provide a platform for future development of treatments for common conditions including endometriosis and endometrial carcinoma. Single-cell and spatial transcriptomic profiling of the human endometrium highlights pathways governing the proliferative and secretory phases of the menstrual cycle. Analyses of endometrial organoids show that WNT and NOTCH signaling modulate differentiation into the secretory and ciliated epithelial lineages, respectively.

Journal ArticleDOI
TL;DR: In this paper, a study of 1,051,032 23andMe research participants was conducted to identify genetic and nongenetic associations with testing positive for SARS-CoV-2, respiratory symptoms and hospitalization.
Abstract: COVID-19 presents with a wide range of severity, from asymptomatic in some individuals to fatal in others. Based on a study of 1,051,032 23andMe research participants, we report genetic and nongenetic associations with testing positive for SARS-CoV-2, respiratory symptoms and hospitalization. Using trans-ancestry genome-wide association studies, we identified a strong association between blood type and COVID-19 diagnosis, as well as a gene-rich locus on chromosome 3p21.31 that is more strongly associated with outcome severity. Hospitalization risk factors include advancing age, male sex, obesity, lower socioeconomic status, non-European ancestry and preexisting cardiometabolic conditions. While non-European ancestry was a significant risk factor for hospitalization after adjusting for sociodemographics and preexisting health conditions, we did not find evidence that these two primary genetic associations explain risk differences between populations for severe COVID-19 outcomes.

Journal ArticleDOI
TL;DR: It is discovered that SARS-CoV-2 requires the lysosomal protein TMEM106B to infect human cell lines and primary lung cells, and single-cell RNA-sequencing of airway cells from patients with COVID-19 demonstrated that TMEM 106B expression correlates with Sars-Cov-2 infection.
Abstract: The ongoing COVID-19 pandemic has caused a global economic and health crisis. To identify host factors essential for coronavirus infection, we performed genome-wide functional genetic screens with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and human coronavirus 229E. These screens uncovered virus-specific as well as shared host factors, including TMEM41B and PI3K type 3. We discovered that SARS-CoV-2 requires the lysosomal protein TMEM106B to infect human cell lines and primary lung cells. TMEM106B overexpression enhanced SARS-CoV-2 infection as well as pseudovirus infection, suggesting a role in viral entry. Furthermore, single-cell RNA-sequencing of airway cells from patients with COVID-19 demonstrated that TMEM106B expression correlates with SARS-CoV-2 infection. The present study uncovered a collection of coronavirus host factors that may be exploited to develop drugs against SARS-CoV-2 infection or future zoonotic coronavirus outbreaks.

Journal ArticleDOI
TL;DR: In this paper, a multi-omic single-nucleus study of 191,890 nuclei in late-stage Alzheimer's disease (AD), accessible through their web portal, profiling chromatin accessibility and gene expression in the same biological samples and uncovering vast cellular heterogeneity.
Abstract: The gene-regulatory landscape of the brain is highly dynamic in health and disease, coordinating a menagerie of biological processes across distinct cell types. Here, we present a multi-omic single-nucleus study of 191,890 nuclei in late-stage Alzheimer’s disease (AD), accessible through our web portal, profiling chromatin accessibility and gene expression in the same biological samples and uncovering vast cellular heterogeneity. We identified cell-type-specific, disease-associated candidate cis-regulatory elements and their candidate target genes, including an oligodendrocyte-associated regulatory module containing links to APOE and CLU. We describe cis-regulatory relationships in specific cell types at a subset of AD risk loci defined by genome-wide association studies, demonstrating the utility of this multi-omic single-nucleus approach. Trajectory analysis of glial populations identified disease-relevant transcription factors, such as SREBF1, and their regulatory targets. Finally, we introduce single-nucleus consensus weighted gene coexpression analysis, a coexpression network analysis strategy robust to sparse single-cell data, and perform a systems-level analysis of the AD transcriptome. An integrative analysis of single-nucleus assay for transposase-accessible chromatin with sequencing and RNA sequencing in normal and Alzheimer’s disease brain tissue identifies cell-type-specific cis-regulatory elements and candidate target genes at disease-associated loci.

Journal ArticleDOI
TL;DR: In this article, a genome-wide association study of 2,780 cases and 47,486 controls was conducted to identify 12 genome wide-significant susceptibility loci for HCM, and Mendelian randomization identified diastolic blood pressure as a key modifiable risk factor for sarcomere-negative HCM.
Abstract: Hypertrophic cardiomyopathy (HCM) is a common, serious, genetic heart disorder. Rare pathogenic variants in sarcomere genes cause HCM, but with unexplained phenotypic heterogeneity. Moreover, most patients do not carry such variants. We report a genome-wide association study of 2,780 cases and 47,486 controls that identified 12 genome-wide-significant susceptibility loci for HCM. Single-nucleotide polymorphism heritability indicated a strong polygenic influence, especially for sarcomere-negative HCM (64% of cases; h2g = 0.34 ± 0.02). A genetic risk score showed substantial influence on the odds of HCM in a validation study, halving the odds in the lowest quintile and doubling them in the highest quintile, and also influenced phenotypic severity in sarcomere variant carriers. Mendelian randomization identified diastolic blood pressure (DBP) as a key modifiable risk factor for sarcomere-negative HCM, with a one standard deviation increase in DBP increasing the HCM risk fourfold. Common variants and modifiable risk factors have important roles in HCM that we suggest will be clinically actionable.

Journal ArticleDOI
TL;DR: Open Targets Genetics as discussed by the authors is a community resource that provides systematic fine mapping at human GWAS loci, enabling users to prioritize genes at disease-associated regions and assess their potential as drug targets.
Abstract: Genome-wide association studies (GWASs) have identified many variants associated with complex traits, but identifying the causal gene(s) is a major challenge. In the present study, we present an open resource that provides systematic fine mapping and gene prioritization across 133,441 published human GWAS loci. We integrate genetics (GWAS Catalog and UK Biobank) with transcriptomic, proteomic and epigenomic data, including systematic disease–disease and disease–molecular trait colocalization results across 92 cell types and tissues. We identify 729 loci fine mapped to a single-coding causal variant and colocalized with a single gene. We trained a machine-learning model using the fine-mapped genetics and functional genomics data and 445 gold-standard curated GWAS loci to distinguish causal genes from neighboring genes, outperforming a naive distance-based model. Our prioritized genes were enriched for known approved drug targets (odds ratio = 8.1, 95% confidence interval = 5.7, 11.5). These results are publicly available through a web portal ( http://genetics.opentargets.org ), enabling users to easily prioritize genes at disease-associated loci and assess their potential as drug targets. Open Targets Genetics is a community resource that provides systematic fine mapping at human GWAS loci, enabling users to prioritize genes at disease-associated regions and assess their potential as drug targets.

Journal ArticleDOI
J L Min1, Gibran Hemani1, Eilis Hannon2, Koen F. Dekkers3  +173 moreInstitutions (53)
TL;DR: In this paper, the results of DNAm quantitative trait locus (mQTL) analyses on 32,851 participants were presented, identifying genetic variants associated with DNA methylation at 420,509 DNAm sites in blood.
Abstract: Characterizing genetic influences on DNA methylation (DNAm) provides an opportunity to understand mechanisms underpinning gene regulation and disease. In the present study, we describe results of DNAm quantitative trait locus (mQTL) analyses on 32,851 participants, identifying genetic variants associated with DNAm at 420,509 DNAm sites in blood. We present a database of >270,000 independent mQTLs, of which 8.5% comprise long-range (trans) associations. Identified mQTL associations explain 15-17% of the additive genetic variance of DNAm. We show that the genetic architecture of DNAm levels is highly polygenic. Using shared genetic control between distal DNAm sites, we constructed networks, identifying 405 discrete genomic communities enriched for genomic annotations and complex traits. Shared genetic variants are associated with both DNAm levels and complex diseases, but only in a minority of cases do these associations reflect causal relationships from DNAm to trait or vice versa, indicating a more complex genotype-phenotype map than previously anticipated.

Journal ArticleDOI
Ruth Chia1, Marya S. Sabir, Sara Bandres-Ciga1, Sara Saez-Atienzar1  +163 moreInstitutions (55)
TL;DR: This article performed whole-genome sequencing in large cohorts of Lewy body dementia (LBD) cases and neurologically healthy controls to study the genetic architecture of this understudied form of dementia, and to generate a resource for the scientific community.
Abstract: The genetic basis of Lewy body dementia (LBD) is not well understood. Here, we performed whole-genome sequencing in large cohorts of LBD cases and neurologically healthy controls to study the genetic architecture of this understudied form of dementia, and to generate a resource for the scientific community. Genome-wide association analysis identified five independent risk loci, whereas genome-wide gene-aggregation tests implicated mutations in the gene GBA. Genetic risk scores demonstrate that LBD shares risk profiles and pathways with Alzheimer's disease and Parkinson's disease, providing a deeper molecular understanding of the complex genetic architecture of this age-related neurodegenerative condition.

Journal ArticleDOI
Rafik Tadros1, Rafik Tadros2, Catherine Francis3, Catherine Francis4, Xiao Xu5, Alexa M.C. Vermeer1, Andrew R. Harper6, Roy Huurman7, Ken Kelu Bisabu2, Roddy Walsh1, Edgar T. Hoorntje8, Wouter P. te Rijdt8, Rachel Buchan3, Rachel Buchan4, Hannah G. van Velzen7, Marjon van Slegtenhorst7, Jentien M Vermeulen1, Joost A. Offerhaus1, Wenjia Bai5, Antonio de Marvao5, Najim Lahrouchi1, Leander Beekman1, Jacco C. Karper8, Jan H. Veldink9, Elham Kayvanpour10, Antonis Pantazis3, A. John Baksi4, A. John Baksi3, Nicola Whiffin4, Nicola Whiffin3, Nicola Whiffin5, Francesco Mazzarotto, Geraldine Sloane3, Geraldine Sloane4, Hideaki Suzuki11, Hideaki Suzuki5, Deborah Schneider-Luftman5, Deborah Schneider-Luftman12, Paul Elliott5, Pascale Richard13, Flavie Ader13, Eric Villard13, Peter Lichtner, Thomas Meitinger14, Michael W.T. Tanck1, J. Peter van Tintelen1, J. Peter van Tintelen9, Andrew Thain15, David McCarty15, Robert A. Hegele15, Jason D. Roberts15, Julie Amyot2, Marie-Pierre Dubé2, Julia Cadrin-Tourigny2, Geneviève Giraldeau2, Philippe L. L’Allier2, Patrick Garceau2, Jean-Claude Tardif2, S. Matthijs Boekholdt1, R. Thomas Lumbers16, Folkert W. Asselbergs16, Folkert W. Asselbergs9, Paul J.R. Barton3, Paul J.R. Barton4, Stuart A. Cook, Sanjay K Prasad3, Sanjay K Prasad4, Declan P. O'Regan5, Jolanda van der Velden, Karin J. H. Verweij1, Mario Talajic2, Guillaume Lettre2, Yigal M. Pinto1, Benjamin Meder10, Philippe Charron13, Rudolf A. de Boer8, Imke Christiaans8, Michelle Michels7, Arthur A.M. Wilde1, Hugh Watkins6, Paul M. Matthews5, James S. Ware3, James S. Ware5, James S. Ware4, Connie R. Bezzina1 
TL;DR: In this paper, the authors conducted genome-wide association studies and multi-trait analyses in hypertrophic (HCM) and dilated (DCM) cardiomyopathies.
Abstract: The heart muscle diseases hypertrophic (HCM) and dilated (DCM) cardiomyopathies are leading causes of sudden death and heart failure in young, otherwise healthy, individuals. We conducted genome-wide association studies and multi-trait analyses in HCM (1,733 cases), DCM (5,521 cases) and nine left ventricular (LV) traits (19,260 UK Biobank participants with structurally normal hearts). We identified 16 loci associated with HCM, 13 with DCM and 23 with LV traits. We show strong genetic correlations between LV traits and cardiomyopathies, with opposing effects in HCM and DCM. Two-sample Mendelian randomization supports a causal association linking increased LV contractility with HCM risk. A polygenic risk score explains a significant portion of phenotypic variability in carriers of HCM-causing rare variants. Our findings thus provide evidence that polygenic risk score may account for variability in Mendelian diseases. More broadly, we provide insights into how genetic pathways may lead to distinct disorders through opposing genetic effects.

Journal ArticleDOI
TL;DR: The eQTL Catalogue as discussed by the authors is a set of gene expression quantitative trait locus (eQTL) studies published their summary statistics, which can be used to gain insight into complex human traits by downstream analyses, such as fine mapping and co-localization.
Abstract: Many gene expression quantitative trait locus (eQTL) studies have published their summary statistics, which can be used to gain insight into complex human traits by downstream analyses, such as fine mapping and co-localization. However, technical differences between these datasets are a barrier to their widespread use. Consequently, target genes for most genome-wide association study (GWAS) signals have still not been identified. In the present study, we present the eQTL Catalogue ( https://www.ebi.ac.uk/eqtl ), a resource of quality-controlled, uniformly re-computed gene expression and splicing QTLs from 21 studies. We find that, for matching cell types and tissues, the eQTL effect sizes are highly reproducible between studies. Although most QTLs were shared between most bulk tissues, we identified a greater diversity of cell-type-specific QTLs from purified cell types, a subset of which also manifested as new disease co-localizations. Our summary statistics are freely available to enable the systematic interpretation of human GWAS associations across many cell types and tissues.

Journal ArticleDOI
TL;DR: In this paper, a proteome-wide association study (PWAS) of AD was performed, followed by Mendelian randomization and colocalization analysis to identify loci that confer AD risk through their effects on brain protein abundance to provide new insights into AD pathogenesis.
Abstract: Genome-wide association studies (GWAS) have identified many risk loci for Alzheimer's disease (AD)1,2, but how these loci confer AD risk is unclear. Here, we aimed to identify loci that confer AD risk through their effects on brain protein abundance to provide new insights into AD pathogenesis. To that end, we integrated AD GWAS results with human brain proteomes to perform a proteome-wide association study (PWAS) of AD, followed by Mendelian randomization and colocalization analysis. We identified 11 genes that are consistent with being causal in AD, acting via their cis-regulated brain protein abundance. Nine replicated in a confirmation PWAS and eight represent new AD risk genes not identified before by AD GWAS. Furthermore, we demonstrated that our results were independent of APOE e4. Together, our findings provide new insights into AD pathogenesis and promising targets for further mechanistic and therapeutic studies.

Journal ArticleDOI
TL;DR: In this paper, the genome of Weining rye, an elite Chinese rye variety, was sequenced and the assembled contigs (7.74 Gb) accounted for 98.47% of the estimated genome size with 93.67% assigned to seven chromosomes.
Abstract: Rye is a valuable food and forage crop, an important genetic resource for wheat and triticale improvement and an indispensable material for efficient comparative genomic studies in grasses. Here, we sequenced the genome of Weining rye, an elite Chinese rye variety. The assembled contigs (7.74 Gb) accounted for 98.47% of the estimated genome size (7.86 Gb), with 93.67% of the contigs (7.25 Gb) assigned to seven chromosomes. Repetitive elements constituted 90.31% of the assembled genome. Compared to previously sequenced Triticeae genomes, Daniela, Sumaya and Sumana retrotransposons showed strong expansion in rye. Further analyses of the Weining assembly shed new light on genome-wide gene duplications and their impact on starch biosynthesis genes, physical organization of complex prolamin loci, gene expression features underlying early heading trait and putative domestication-associated chromosomal regions and loci in rye. This genome sequence promises to accelerate genomic and breeding studies in rye and related cereal crops. A high-quality genome assembly of Weining rye sheds new light on gene duplications and their effects on starch biosynthesis genes, gene expression features underlying early heading trait and putative domestication-associated chromosomal regions.

Journal ArticleDOI
TL;DR: In this article, the authors used an efficient multiplexing strategy to differentiate 215 human induced pluripotent stem cell (iPSC) lines toward a midbrain neural fate, including dopaminergic neurons, and use single-cell RNA sequencing (scRNA-seq) to profile over 1 million cells across three differentiation time points.
Abstract: Studying the function of common genetic variants in primary human tissues and during development is challenging. To address this, we use an efficient multiplexing strategy to differentiate 215 human induced pluripotent stem cell (iPSC) lines toward a midbrain neural fate, including dopaminergic neurons, and use single-cell RNA sequencing (scRNA-seq) to profile over 1 million cells across three differentiation time points. The proportion of neurons produced by each cell line is highly reproducible and is predictable by robust molecular markers expressed in pluripotent cells. Expression quantitative trait loci (eQTL) were characterized at different stages of neuronal development and in response to rotenone-induced oxidative stress. Of these, 1,284 eQTL colocalize with known neurological trait risk loci, and 46% are not found in the Genotype-Tissue Expression (GTEx) catalog. Our study illustrates how coupling scRNA-seq with long-term iPSC differentiation enables mechanistic studies of human trait-associated genetic variants in otherwise inaccessible cell states.