Showing papers in "Nature Genetics in 2021"
••
TL;DR: ArchR as discussed by the authors is a software suite for single-cell analysis of regulatory chromatin in R (ArchR; https://www.archrproject.com/ ) that enables fast and comprehensive analysis of singlecell chromatin accessibility data.
Abstract: The advent of single-cell chromatin accessibility profiling has accelerated the ability to map gene regulatory landscapes but has outpaced the development of scalable software to rapidly extract biological meaning from these data. Here we present a software suite for single-cell analysis of regulatory chromatin in R (ArchR; https://www.archrproject.com/ ) that enables fast and comprehensive analysis of single-cell chromatin accessibility data. ArchR provides an intuitive, user-focused interface for complex single-cell analyses, including doublet removal, single-cell clustering and cell type identification, unified peak set generation, cellular trajectory identification, DNA element-to-gene linkage, transcription factor footprinting, mRNA expression level prediction from chromatin accessibility and multi-omic integration with single-cell RNA sequencing (scRNA-seq). Enabling the analysis of over 1.2 million single cells within 8 h on a standard Unix laptop, ArchR is a comprehensive software suite for end-to-end analysis of single-cell chromatin accessibility that will accelerate the understanding of gene regulation at the resolution of individual cells.
406 citations
••
Niamh Mullins1, Andreas J. Forstner2, Andreas J. Forstner3, Andreas J. Forstner4 +396 more•Institutions (119)
TL;DR: The authors performed a genome-wide association study of 41,917 bipolar disorder cases and 371,549 controls of European ancestry, which identified 64 associated genomic loci, including genes encoding targets of antipsychotics, calcium channel blockers, antiepileptics and anesthetics.
Abstract: Bipolar disorder is a heritable mental illness with complex etiology. We performed a genome-wide association study of 41,917 bipolar disorder cases and 371,549 controls of European ancestry, which identified 64 associated genomic loci. Bipolar disorder risk alleles were enriched in genes in synaptic signaling pathways and brain-expressed genes, particularly those with high specificity of expression in neurons of the prefrontal cortex and hippocampus. Significant signal enrichment was found in genes encoding targets of antipsychotics, calcium channel blockers, antiepileptics and anesthetics. Integrating expression quantitative trait locus data implicated 15 genes robustly linked to bipolar disorder via gene expression, encoding druggable targets such as HTR6, MCHR1, DCLK3 and FURIN. Analyses of bipolar disorder subtypes indicated high but imperfect genetic correlation between bipolar disorder type I and II and identified additional associated loci. Together, these results advance our understanding of the biological etiology of bipolar disorder, identify novel therapeutic leads and prioritize genes for functional follow-up studies.
378 citations
••
University Medical Center Groningen1, Netherlands Cancer Institute2, European Bioinformatics Institute3, Georgia Institute of Technology4, Leipzig University5, Johns Hopkins University6, University of Cambridge7, NHS Blood and Transplant8, Garvan Institute of Medical Research9, University of Tartu10, Ontario Institute for Cancer Research11, University of Washington12, Public Health Research Institute13, University of Chicago14, Greifswald University Hospital15, Ludwig Maximilian University of Munich16, University of Bristol17, Erasmus University Rotterdam18, University of Westminster19, Royal Devon and Exeter Hospital20, Luleå University of Technology21, Swiss Institute of Bioinformatics22, University of Lausanne23, University of Dundee24, University of Geneva25, Agency for Science, Technology and Research26, University of Queensland27, Leiden University Medical Center28, Radboud University Nijmegen29, University of Liège30, University of Oxford31, Menzies Research Institute32, Icahn School of Medicine at Mount Sinai33, Ikerbasque34, VU University Amsterdam35, Stanford University36, Turku University Hospital37, University of Turku38, Maastricht University39, Karolinska Institutet40, Utrecht University41, University of Helsinki42, National Institutes of Health43, Technische Universität München44, Wellcome Trust Sanger Institute45, German Cancer Research Center46, Westlake University47, University of New South Wales48
TL;DR: In this article, the authors performed cis-and trans-expression quantitative trait locus (eQTL) analyses using blood-derived expression from 31,684 individuals through the eQTLGen Consortium.
Abstract: Trait-associated genetic variants affect complex phenotypes primarily via regulatory mechanisms on the transcriptome. To investigate the genetics of gene expression, we performed cis- and trans-expression quantitative trait locus (eQTL) analyses using blood-derived expression from 31,684 individuals through the eQTLGen Consortium. We detected cis-eQTL for 88% of genes, and these were replicable in numerous tissues. Distal trans-eQTL (detected for 37% of 10,317 trait-associated variants tested) showed lower replication rates, partially due to low replication power and confounding by cell type composition. However, replication analyses in single-cell RNA-seq data prioritized intracellular trans-eQTL. Trans-eQTL exerted their effects via several mechanisms, primarily through regulation by transcription factors. Expression of 13% of the genes correlated with polygenic scores for 1,263 phenotypes, pinpointing potential drivers for those traits. In summary, this work represents a large eQTL resource, and its results serve as a starting point for in-depth interpretation of complex phenotypes.
344 citations
••
Garvan Institute of Medical Research1, University of New South Wales2, Royal Institute of Technology3, University of North Carolina at Chapel Hill4, Harvard University5, French Institute of Health and Medical Research6, Institut Gustave Roussy7, St. Vincent's Health System8, Royal Prince Alfred Hospital9, Princess Alexandra Hospital10, University of Queensland11, University of Sydney12, Shanghai Jiao Tong University13, National University of Singapore14, Agency for Science, Technology and Research15, St George's Hospital16
TL;DR: In this paper, a single-cell and spatially resolved transcriptomics analysis of human breast cancers is presented, which reveals recurrent neoplastic cell heterogeneity and heterotypic interactions play central roles in disease progression.
Abstract: Breast cancers are complex cellular ecosystems where heterotypic interactions play central roles in disease progression and response to therapy. However, our knowledge of their cellular composition and organization is limited. Here we present a single-cell and spatially resolved transcriptomics analysis of human breast cancers. We developed a single-cell method of intrinsic subtype classification (SCSubtype) to reveal recurrent neoplastic cell heterogeneity. Immunophenotyping using cellular indexing of transcriptomes and epitopes by sequencing (CITE-seq) provides high-resolution immune profiles, including new PD-L1/PD-L2+ macrophage populations associated with clinical outcome. Mesenchymal cells displayed diverse functions and cell-surface protein expression through differentiation within three major lineages. Stromal-immune niches were spatially organized in tumors, offering insights into antitumor immune regulation. Using single-cell signatures, we deconvoluted large breast cancer cohorts to stratify them into nine clusters, termed 'ecotypes', with unique cellular compositions and clinical outcomes. This study provides a comprehensive transcriptional atlas of the cellular architecture of breast cancer.
303 citations
••
Stanford University1, Tohoku University2, Osaka University3, Kyushu University4, Iwate Medical University5, Juntendo University6, Nihon University7, Nippon Medical School8, Japanese Foundation for Cancer Research9, Shiga University of Medical Science10, University of Tokyo11, University of Helsinki12, Broad Institute13, Harvard University14
TL;DR: In this paper, the authors conducted 220 deep-phenotype genome-wide association studies (diseases, biomarkers and medication usage) in BioBank Japan (n = 179,000), by incorporating past medical history and text-mining of electronic medical records.
Abstract: Current genome-wide association studies do not yet capture sufficient diversity in populations and scope of phenotypes. To expand an atlas of genetic associations in non-European populations, we conducted 220 deep-phenotype genome-wide association studies (diseases, biomarkers and medication usage) in BioBank Japan (n = 179,000), by incorporating past medical history and text-mining of electronic medical records. Meta-analyses with the UK Biobank and FinnGen (ntotal = 628,000) identified ~5,000 new loci, which improved the resolution of the genomic map of human traits. This atlas elucidated the landscape of pleiotropy as represented by the major histocompatibility complex locus, where we conducted HLA fine-mapping. Finally, we performed statistical decomposition of matrices of phenome-wide summary statistics, and identified latent genetic components, which pinpointed responsible variants and biological mechanisms underlying current disease classifications across populations. The decomposed components enabled genetically informed subtyping of similar diseases (for example, allergic diseases). Our study suggests a potential avenue for hypothesis-free re-investigation of human diseases through genetics.
291 citations
••
University of Groningen1, Erasmus University Rotterdam2, Katholieke Universiteit Leuven3, Chinese Academy of Sciences4, University of Surrey5, King's College London6, University of Toronto7, Avera Health8, Karolinska Institutet9, University of Copenhagen10, University of Greifswald11, University of Kiel12, Yeshiva University13, Sungkyunkwan University14, University of Tartu15, Weizmann Institute of Science16, Copenhagen University Hospital17, University of Texas Health Science Center at Houston18, University of Alabama at Birmingham19, Stockholm University20, University of Michigan21, VU University Amsterdam22, University of Oxford23, University of Bristol24, University of Amsterdam25, Maastricht University26, University of California, San Diego27, University of Eastern Finland28, National Institutes of Health29, University of California, Los Angeles30, Linköping University31, Harvard University32, Radboud University Nijmegen33, University of North Carolina at Chapel Hill34, Ewha Womans University35, Fred Hutchinson Cancer Research Center36, National Research Council37
TL;DR: In this article, the MiBioGen consortium curated and analyzed genome-wide genotypes and 16S fecal microbiome data from 18,340 individuals (24 cohorts) and found high variability across cohorts: only 9 of 410 genera were detected in more than 95% of samples.
Abstract: To study the effect of host genetics on gut microbiome composition, the MiBioGen consortium curated and analyzed genome-wide genotypes and 16S fecal microbiome data from 18,340 individuals (24 cohorts). Microbial composition showed high variability across cohorts: only 9 of 410 genera were detected in more than 95% of samples. A genome-wide association study of host genetic variation regarding microbial taxa identified 31 loci affecting the microbiome at a genome-wide significant (P < 5 × 10−8) threshold. One locus, the lactase (LCT) gene locus, reached study-wide significance (genome-wide association study signal: P = 1.28 × 10−20), and it showed an age-dependent association with Bifidobacterium abundance. Other associations were suggestive (1.95 × 10−10 < P < 5 × 10−8) but enriched for taxa showing high heritability and for genes expressed in the intestine and brain. A phenome-wide association study and Mendelian randomization identified enrichment of microbiome trait loci in the metabolic, nutrition and environment domains and suggested the microbiome might have causal effects in ulcerative colitis and rheumatoid arthritis.
287 citations
••
VU University Amsterdam1, University of Oslo2, Oslo University Hospital3, University of California, San Diego4, University of Bergen5, Norwegian University of Science and Technology6, University of Michigan7, Namsos Hospital8, Statens Serum Institut9, Harvard University10, King's College London11, Vanderbilt University Medical Center12, Karolinska Institutet13, University of California, Riverside14, Jönköping University15, deCODE genetics16, University of Iceland17, University of Gothenburg18, Sahlgrenska University Hospital19, Akershus University Hospital20, Stavanger University Hospital21, Broad Institute22, Charité23, University of Amsterdam24
TL;DR: This paper identified microglia, immune cells and protein catabolism as relevant genes for late-onset Alzheimer's disease, while identifying and prioritizing previously unidentified genes of potential interest.
Abstract: Late-onset Alzheimer's disease is a prevalent age-related polygenic disease that accounts for 50-70% of dementia cases. Currently, only a fraction of the genetic variants underlying Alzheimer's disease have been identified. Here we show that increased sample sizes allowed identification of seven previously unidentified genetic loci contributing to Alzheimer's disease. This study highlights microglia, immune cells and protein catabolism as relevant to late-onset Alzheimer's disease, while identifying and prioritizing previously unidentified genes of potential interest. We anticipate that these results can be included in larger meta-analyses of Alzheimer's disease to identify further genetic variants that contribute to Alzheimer's pathology.
269 citations
••
TL;DR: In this article, the genetic basis of 35 blood and urine laboratory measurements in the UK Biobank (n = 363,228 individuals) was evaluated and the results delineate the genetic underlying of biomarkers and their causal influences on diseases and improve genetic risk stratification for common diseases.
Abstract: Clinical laboratory tests are a critical component of the continuum of care. We evaluate the genetic basis of 35 blood and urine laboratory measurements in the UK Biobank (n = 363,228 individuals). We identify 1,857 loci associated with at least one trait, containing 3,374 fine-mapped associations and additional sets of large-effect (>0.1 s.d.) protein-altering, human leukocyte antigen (HLA) and copy number variant (CNV) associations. Through Mendelian randomization (MR) analysis, we discover 51 causal relationships, including previously known agonistic effects of urate on gout and cystatin C on stroke. Finally, we develop polygenic risk scores (PRSs) for each biomarker and build 'multi-PRS' models for diseases using 35 PRSs simultaneously, which improved chronic kidney disease, type 2 diabetes, gout and alcoholic cirrhosis genetic risk stratification in an independent dataset (FinnGen; n = 135,500) relative to single-disease PRSs. Together, our results delineate the genetic basis of biomarkers and their causal influences on diseases and improve genetic risk stratification for common diseases.
262 citations
••
TL;DR: RegenerIE as mentioned in this paper is a whole-genome regression method based on ridge regression that enables highly parallelized analysis of quantitative and binary traits in biobank-scale data with reduced computational requirements.
Abstract: Genome-wide association analysis of cohorts with thousands of phenotypes is computationally expensive, particularly when accounting for sample relatedness or population structure. Here we present a novel machine-learning method called REGENIE for fitting a whole-genome regression model for quantitative and binary phenotypes that is substantially faster than alternatives in multi-trait analyses while maintaining statistical efficiency. The method naturally accommodates parallel analysis of multiple phenotypes and requires only local segments of the genotype matrix to be loaded in memory, in contrast to existing alternatives, which must load genome-wide matrices into memory. This results in substantial savings in compute time and memory usage. We introduce a fast, approximate Firth logistic regression test for unbalanced case–control phenotypes. The method is ideally suited to take advantage of distributed computing frameworks. We demonstrate the accuracy and computational benefits of this approach using the UK Biobank dataset with up to 407,746 individuals. REGENIE is a whole-genome regression method based on ridge regression that enables highly parallelized analysis of quantitative and binary traits in biobank-scale data with reduced computational requirements.
239 citations
••
TL;DR: BPNet as discussed by the authors uses DNA sequence to predict base-resolution chromatin immunoprecipitation (ChIP)-nexus binding profiles of pluripotency transcription factor (TF) binding motifs.
Abstract: The arrangement (syntax) of transcription factor (TF) binding motifs is an important part of the cis-regulatory code, yet remains elusive. We introduce a deep learning model, BPNet, that uses DNA sequence to predict base-resolution chromatin immunoprecipitation (ChIP)-nexus binding profiles of pluripotency TFs. We develop interpretation tools to learn predictive motif representations and identify soft syntax rules for cooperative TF binding interactions. Strikingly, Nanog preferentially binds with helical periodicity, and TFs often cooperate in a directional manner, which we validate using clustered regularly interspaced short palindromic repeat (CRISPR)-induced point mutations. Our model represents a powerful general approach to uncover the motifs and syntax of cis-regulatory sequences in genomics data.
229 citations
••
TL;DR: In this paper, it was shown that CRISPR-Cas9 editing generates structural defects of the nucleus, micronuclei and chromosome bridges, which initiate a mutational process called chromothripsis.
Abstract: Genome editing has therapeutic potential for treating genetic diseases and cancer. However, the currently most practicable approaches rely on the generation of DNA double-strand breaks (DSBs), which can give rise to a poorly characterized spectrum of chromosome structural abnormalities. Here, using model cells and single-cell whole-genome sequencing, as well as by editing at a clinically relevant locus in clinically relevant cells, we show that CRISPR-Cas9 editing generates structural defects of the nucleus, micronuclei and chromosome bridges, which initiate a mutational process called chromothripsis. Chromothripsis is extensive chromosome rearrangement restricted to one or a few chromosomes that can cause human congenital disease and cancer. These results demonstrate that chromothripsis is a previously unappreciated on-target consequence of CRISPR-Cas9-generated DSBs. As genome editing is implemented in the clinic, the potential for extensive chromosomal rearrangements should be considered and monitored.
••
TL;DR: This paper conducted a meta-analysis of prostate cancer genome-wide association studies (107,247 cases and 127,006 controls) and identified 86 new genetic risk variants independently associated with prostate cancer risk, bringing the total to 269 known risk variants.
Abstract: Prostate cancer is a highly heritable disease with large disparities in incidence rates across ancestry populations. We conducted a multiancestry meta-analysis of prostate cancer genome-wide association studies (107,247 cases and 127,006 controls) and identified 86 new genetic risk variants independently associated with prostate cancer risk, bringing the total to 269 known risk variants. The top genetic risk score (GRS) decile was associated with odds ratios that ranged from 5.06 (95% confidence interval (CI), 4.84–5.29) for men of European ancestry to 3.74 (95% CI, 3.36–4.17) for men of African ancestry. Men of African ancestry were estimated to have a mean GRS that was 2.18-times higher (95% CI, 2.14–2.22), and men of East Asian ancestry 0.73-times lower (95% CI, 0.71–0.76), than men of European ancestry. These findings support the role of germline variation contributing to population differences in prostate cancer risk, with the GRS offering an approach for personalized risk prediction.
••
TL;DR: In this paper, the authors performed an updated genome-wide AD meta-analysis, which identified 37 risk loci, including new associations near CCDC6, TSPAN14, NCK2 and SPRED2.
Abstract: Genome-wide association studies have discovered numerous genomic loci associated with Alzheimer's disease (AD); yet the causal genes and variants are incompletely identified. We performed an updated genome-wide AD meta-analysis, which identified 37 risk loci, including new associations near CCDC6, TSPAN14, NCK2 and SPRED2. Using three SNP-level fine-mapping methods, we identified 21 SNPs with >50% probability each of being causally involved in AD risk and others strongly suggested by functional annotation. We followed this with colocalization analyses across 109 gene expression quantitative trait loci datasets and prioritization of genes by using protein interaction networks and tissue-specific expression. Combining this information into a quantitative score, we found that evidence converged on likely causal genes, including the above four genes, and those at previously discovered AD loci, including BIN1, APH1B, PTK2B, PILRA and CASS4.
••
TL;DR: The Polygenic Score (PGS) catalog as discussed by the authors is an open resource of published scores (including variants, alleles and weights) and consistently curated metadata required for reproducibility and independent applications.
Abstract: We present the Polygenic Score (PGS) Catalog (
https://www.PGSCatalog.org
), an open resource of published scores (including variants, alleles and weights) and consistently curated metadata required for reproducibility and independent applications. The PGS Catalog has capabilities for user deposition, expert curation and programmatic access, thus providing the community with a platform for PGS dissemination, research and translation.
••
TL;DR: The UK Biobank Exome Sequencing Consortium (UKB-ESC) as mentioned in this paper is a private-public partnership between the UK Biopartition and eight biopharmaceutical companies that will complete the sequencing of exomes for all ~500,000 UKB participants.
Abstract: The UK Biobank Exome Sequencing Consortium (UKB-ESC) is a private–public partnership between the UK Biobank (UKB) and eight biopharmaceutical companies that will complete the sequencing of exomes for all ~500,000 UKB participants. Here, we describe the early results from ~200,000 UKB participants and the features of this project that enabled its success. The biopharmaceutical industry has increasingly used human genetics to improve success in drug discovery. Recognizing the need for large-scale human genetics data, as well as the unique value of the data access and contribution terms of the UKB, the UKB-ESC was formed. As a result, exome data from 200,643 UKB enrollees are now available. These data include ~10 million exonic variants—a rich resource of rare coding variation that is particularly valuable for drug discovery. The UKB-ESC precompetitive collaboration has further strengthened academic and industry ties and has provided teams with an opportunity to interact with and learn from the wider research community. The UK Biobank Exome Sequencing Consortium aims to sequence all the exomes of approximately 500,000 UK Biobank participants. This Perspective describes the results from approximately 200,000 exomes and discusses the lessons learned from this UK Biobank–biopharmaceutical company collaboration.
••
TL;DR: This paper aggregated genome-wide association studies comprising up to 281,416 individuals without diabetes (30% non-European ancestry) for whom fasting glucose, 2-h glucose after an oral glucose challenge, glycated hemoglobin and fasting insulin data were available.
Abstract: Glycemic traits are used to diagnose and monitor type 2 diabetes and cardiometabolic health. To date, most genetic studies of glycemic traits have focused on individuals of European ancestry. Here we aggregated genome-wide association studies comprising up to 281,416 individuals without diabetes (30% non-European ancestry) for whom fasting glucose, 2-h glucose after an oral glucose challenge, glycated hemoglobin and fasting insulin data were available. Trans-ancestry and single-ancestry meta-analyses identified 242 loci (99 novel; P < 5 × 10-8), 80% of which had no significant evidence of between-ancestry heterogeneity. Analyses restricted to individuals of European ancestry with equivalent sample size would have led to 24 fewer new loci. Compared with single-ancestry analyses, equivalent-sized trans-ancestry fine-mapping reduced the number of estimated variants in 99% credible sets by a median of 37.5%. Genomic-feature, gene-expression and gene-set analyses revealed distinct biological signatures for each trait, highlighting different underlying biological pathways. Our results increase our understanding of diabetes pathophysiology by using trans-ancestry studies for improved power and resolution.
••
TL;DR: In this article, a tree-based data structure encoding the inferred evolutionary history of the SARS-CoV-2 virus was proposed to enable real-time genomic contact tracing, which greatly improves the speed of phylogenetic placement of new samples and data visualization.
Abstract: As the SARS-CoV-2 virus spreads through human populations, the unprecedented accumulation of viral genome sequences is ushering in a new era of 'genomic contact tracing'-that is, using viral genomes to trace local transmission dynamics. However, because the viral phylogeny is already so large-and will undoubtedly grow many fold-placing new sequences onto the tree has emerged as a barrier to real-time genomic contact tracing. Here, we resolve this challenge by building an efficient tree-based data structure encoding the inferred evolutionary history of the virus. We demonstrate that our approach greatly improves the speed of phylogenetic placement of new samples and data visualization, making it possible to complete the placements under the constraints of real-time contact tracing. Thus, our method addresses an important need for maintaining a fully updated reference phylogeny. We make these tools available to the research community through the University of California Santa Cruz SARS-CoV-2 Genome Browser to enable rapid cross-referencing of information in new virus sequences with an ever-expanding array of molecular and structural biology data. The methods described here will empower research and genomic contact tracing for SARS-CoV-2 specifically for laboratories worldwide.
••
TL;DR: In this paper, the authors dissect the signaling pathways that determine cell fate of the epithelial lineages in the lumenal and glandular microenvironments of the endometrium.
Abstract: The endometrium, the mucosal lining of the uterus, undergoes dynamic changes throughout the menstrual cycle in response to ovarian hormones. We have generated dense single-cell and spatial reference maps of the human uterus and three-dimensional endometrial organoid cultures. We dissect the signaling pathways that determine cell fate of the epithelial lineages in the lumenal and glandular microenvironments. Our benchmark of the endometrial organoids reveals the pathways and cell states regulating differentiation of the secretory and ciliated lineages both in vivo and in vitro. In vitro downregulation of WNT or NOTCH pathways increases the differentiation efficiency along the secretory and ciliated lineages, respectively. We utilize our cellular maps to deconvolute bulk data from endometrial cancers and endometriotic lesions, illuminating the cell types dominating in each of these disorders. These mechanistic insights provide a platform for future development of treatments for common conditions including endometriosis and endometrial carcinoma. Single-cell and spatial transcriptomic profiling of the human endometrium highlights pathways governing the proliferative and secretory phases of the menstrual cycle. Analyses of endometrial organoids show that WNT and NOTCH signaling modulate differentiation into the secretory and ciliated epithelial lineages, respectively.
••
TL;DR: In this paper, a study of 1,051,032 23andMe research participants was conducted to identify genetic and nongenetic associations with testing positive for SARS-CoV-2, respiratory symptoms and hospitalization.
Abstract: COVID-19 presents with a wide range of severity, from asymptomatic in some individuals to fatal in others. Based on a study of 1,051,032 23andMe research participants, we report genetic and nongenetic associations with testing positive for SARS-CoV-2, respiratory symptoms and hospitalization. Using trans-ancestry genome-wide association studies, we identified a strong association between blood type and COVID-19 diagnosis, as well as a gene-rich locus on chromosome 3p21.31 that is more strongly associated with outcome severity. Hospitalization risk factors include advancing age, male sex, obesity, lower socioeconomic status, non-European ancestry and preexisting cardiometabolic conditions. While non-European ancestry was a significant risk factor for hospitalization after adjusting for sociodemographics and preexisting health conditions, we did not find evidence that these two primary genetic associations explain risk differences between populations for severe COVID-19 outcomes.
••
TL;DR: It is discovered that SARS-CoV-2 requires the lysosomal protein TMEM106B to infect human cell lines and primary lung cells, and single-cell RNA-sequencing of airway cells from patients with COVID-19 demonstrated that TMEM 106B expression correlates with Sars-Cov-2 infection.
Abstract: The ongoing COVID-19 pandemic has caused a global economic and health crisis. To identify host factors essential for coronavirus infection, we performed genome-wide functional genetic screens with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and human coronavirus 229E. These screens uncovered virus-specific as well as shared host factors, including TMEM41B and PI3K type 3. We discovered that SARS-CoV-2 requires the lysosomal protein TMEM106B to infect human cell lines and primary lung cells. TMEM106B overexpression enhanced SARS-CoV-2 infection as well as pseudovirus infection, suggesting a role in viral entry. Furthermore, single-cell RNA-sequencing of airway cells from patients with COVID-19 demonstrated that TMEM106B expression correlates with SARS-CoV-2 infection. The present study uncovered a collection of coronavirus host factors that may be exploited to develop drugs against SARS-CoV-2 infection or future zoonotic coronavirus outbreaks.
••
TL;DR: In this paper, a multi-omic single-nucleus study of 191,890 nuclei in late-stage Alzheimer's disease (AD), accessible through their web portal, profiling chromatin accessibility and gene expression in the same biological samples and uncovering vast cellular heterogeneity.
Abstract: The gene-regulatory landscape of the brain is highly dynamic in health and disease, coordinating a menagerie of biological processes across distinct cell types. Here, we present a multi-omic single-nucleus study of 191,890 nuclei in late-stage Alzheimer’s disease (AD), accessible through our web portal, profiling chromatin accessibility and gene expression in the same biological samples and uncovering vast cellular heterogeneity. We identified cell-type-specific, disease-associated candidate cis-regulatory elements and their candidate target genes, including an oligodendrocyte-associated regulatory module containing links to APOE and CLU. We describe cis-regulatory relationships in specific cell types at a subset of AD risk loci defined by genome-wide association studies, demonstrating the utility of this multi-omic single-nucleus approach. Trajectory analysis of glial populations identified disease-relevant transcription factors, such as SREBF1, and their regulatory targets. Finally, we introduce single-nucleus consensus weighted gene coexpression analysis, a coexpression network analysis strategy robust to sparse single-cell data, and perform a systems-level analysis of the AD transcriptome. An integrative analysis of single-nucleus assay for transposase-accessible chromatin with sequencing and RNA sequencing in normal and Alzheimer’s disease brain tissue identifies cell-type-specific cis-regulatory elements and candidate target genes at disease-associated loci.
••
TL;DR: In this article, a genome-wide association study of 2,780 cases and 47,486 controls was conducted to identify 12 genome wide-significant susceptibility loci for HCM, and Mendelian randomization identified diastolic blood pressure as a key modifiable risk factor for sarcomere-negative HCM.
Abstract: Hypertrophic cardiomyopathy (HCM) is a common, serious, genetic heart disorder. Rare pathogenic variants in sarcomere genes cause HCM, but with unexplained phenotypic heterogeneity. Moreover, most patients do not carry such variants. We report a genome-wide association study of 2,780 cases and 47,486 controls that identified 12 genome-wide-significant susceptibility loci for HCM. Single-nucleotide polymorphism heritability indicated a strong polygenic influence, especially for sarcomere-negative HCM (64% of cases; h2g = 0.34 ± 0.02). A genetic risk score showed substantial influence on the odds of HCM in a validation study, halving the odds in the lowest quintile and doubling them in the highest quintile, and also influenced phenotypic severity in sarcomere variant carriers. Mendelian randomization identified diastolic blood pressure (DBP) as a key modifiable risk factor for sarcomere-negative HCM, with a one standard deviation increase in DBP increasing the HCM risk fourfold. Common variants and modifiable risk factors have important roles in HCM that we suggest will be clinically actionable.
••
TL;DR: Open Targets Genetics as discussed by the authors is a community resource that provides systematic fine mapping at human GWAS loci, enabling users to prioritize genes at disease-associated regions and assess their potential as drug targets.
Abstract: Genome-wide association studies (GWASs) have identified many variants associated with complex traits, but identifying the causal gene(s) is a major challenge. In the present study, we present an open resource that provides systematic fine mapping and gene prioritization across 133,441 published human GWAS loci. We integrate genetics (GWAS Catalog and UK Biobank) with transcriptomic, proteomic and epigenomic data, including systematic disease–disease and disease–molecular trait colocalization results across 92 cell types and tissues. We identify 729 loci fine mapped to a single-coding causal variant and colocalized with a single gene. We trained a machine-learning model using the fine-mapped genetics and functional genomics data and 445 gold-standard curated GWAS loci to distinguish causal genes from neighboring genes, outperforming a naive distance-based model. Our prioritized genes were enriched for known approved drug targets (odds ratio = 8.1, 95% confidence interval = 5.7, 11.5). These results are publicly available through a web portal (
http://genetics.opentargets.org
), enabling users to easily prioritize genes at disease-associated loci and assess their potential as drug targets. Open Targets Genetics is a community resource that provides systematic fine mapping at human GWAS loci, enabling users to prioritize genes at disease-associated regions and assess their potential as drug targets.
••
TL;DR: In this paper, the results of DNAm quantitative trait locus (mQTL) analyses on 32,851 participants were presented, identifying genetic variants associated with DNA methylation at 420,509 DNAm sites in blood.
Abstract: Characterizing genetic influences on DNA methylation (DNAm) provides an opportunity to understand mechanisms underpinning gene regulation and disease. In the present study, we describe results of DNAm quantitative trait locus (mQTL) analyses on 32,851 participants, identifying genetic variants associated with DNAm at 420,509 DNAm sites in blood. We present a database of >270,000 independent mQTLs, of which 8.5% comprise long-range (trans) associations. Identified mQTL associations explain 15-17% of the additive genetic variance of DNAm. We show that the genetic architecture of DNAm levels is highly polygenic. Using shared genetic control between distal DNAm sites, we constructed networks, identifying 405 discrete genomic communities enriched for genomic annotations and complex traits. Shared genetic variants are associated with both DNAm levels and complex diseases, but only in a minority of cases do these associations reflect causal relationships from DNAm to trait or vice versa, indicating a more complex genotype-phenotype map than previously anticipated.
••
TL;DR: This article performed whole-genome sequencing in large cohorts of Lewy body dementia (LBD) cases and neurologically healthy controls to study the genetic architecture of this understudied form of dementia, and to generate a resource for the scientific community.
Abstract: The genetic basis of Lewy body dementia (LBD) is not well understood. Here, we performed whole-genome sequencing in large cohorts of LBD cases and neurologically healthy controls to study the genetic architecture of this understudied form of dementia, and to generate a resource for the scientific community. Genome-wide association analysis identified five independent risk loci, whereas genome-wide gene-aggregation tests implicated mutations in the gene GBA. Genetic risk scores demonstrate that LBD shares risk profiles and pathways with Alzheimer's disease and Parkinson's disease, providing a deeper molecular understanding of the complex genetic architecture of this age-related neurodegenerative condition.
••
University of Amsterdam1, Montreal Heart Institute2, National Health Service3, National Institutes of Health4, Imperial College London5, University of Oxford6, Erasmus University Medical Center7, University Medical Center Groningen8, Utrecht University9, Heidelberg University10, Tohoku University11, Francis Crick Institute12, University of Paris13, Technische Universität München14, University of Western Ontario15, University College London16
TL;DR: In this paper, the authors conducted genome-wide association studies and multi-trait analyses in hypertrophic (HCM) and dilated (DCM) cardiomyopathies.
Abstract: The heart muscle diseases hypertrophic (HCM) and dilated (DCM) cardiomyopathies are leading causes of sudden death and heart failure in young, otherwise healthy, individuals. We conducted genome-wide association studies and multi-trait analyses in HCM (1,733 cases), DCM (5,521 cases) and nine left ventricular (LV) traits (19,260 UK Biobank participants with structurally normal hearts). We identified 16 loci associated with HCM, 13 with DCM and 23 with LV traits. We show strong genetic correlations between LV traits and cardiomyopathies, with opposing effects in HCM and DCM. Two-sample Mendelian randomization supports a causal association linking increased LV contractility with HCM risk. A polygenic risk score explains a significant portion of phenotypic variability in carriers of HCM-causing rare variants. Our findings thus provide evidence that polygenic risk score may account for variability in Mendelian diseases. More broadly, we provide insights into how genetic pathways may lead to distinct disorders through opposing genetic effects.
••
TL;DR: The eQTL Catalogue as discussed by the authors is a set of gene expression quantitative trait locus (eQTL) studies published their summary statistics, which can be used to gain insight into complex human traits by downstream analyses, such as fine mapping and co-localization.
Abstract: Many gene expression quantitative trait locus (eQTL) studies have published their summary statistics, which can be used to gain insight into complex human traits by downstream analyses, such as fine mapping and co-localization. However, technical differences between these datasets are a barrier to their widespread use. Consequently, target genes for most genome-wide association study (GWAS) signals have still not been identified. In the present study, we present the eQTL Catalogue ( https://www.ebi.ac.uk/eqtl ), a resource of quality-controlled, uniformly re-computed gene expression and splicing QTLs from 21 studies. We find that, for matching cell types and tissues, the eQTL effect sizes are highly reproducible between studies. Although most QTLs were shared between most bulk tissues, we identified a greater diversity of cell-type-specific QTLs from purified cell types, a subset of which also manifested as new disease co-localizations. Our summary statistics are freely available to enable the systematic interpretation of human GWAS associations across many cell types and tissues.
••
TL;DR: In this paper, a proteome-wide association study (PWAS) of AD was performed, followed by Mendelian randomization and colocalization analysis to identify loci that confer AD risk through their effects on brain protein abundance to provide new insights into AD pathogenesis.
Abstract: Genome-wide association studies (GWAS) have identified many risk loci for Alzheimer's disease (AD)1,2, but how these loci confer AD risk is unclear. Here, we aimed to identify loci that confer AD risk through their effects on brain protein abundance to provide new insights into AD pathogenesis. To that end, we integrated AD GWAS results with human brain proteomes to perform a proteome-wide association study (PWAS) of AD, followed by Mendelian randomization and colocalization analysis. We identified 11 genes that are consistent with being causal in AD, acting via their cis-regulated brain protein abundance. Nine replicated in a confirmation PWAS and eight represent new AD risk genes not identified before by AD GWAS. Furthermore, we demonstrated that our results were independent of APOE e4. Together, our findings provide new insights into AD pathogenesis and promising targets for further mechanistic and therapeutic studies.
••
TL;DR: In this paper, the genome of Weining rye, an elite Chinese rye variety, was sequenced and the assembled contigs (7.74 Gb) accounted for 98.47% of the estimated genome size with 93.67% assigned to seven chromosomes.
Abstract: Rye is a valuable food and forage crop, an important genetic resource for wheat and triticale improvement and an indispensable material for efficient comparative genomic studies in grasses. Here, we sequenced the genome of Weining rye, an elite Chinese rye variety. The assembled contigs (7.74 Gb) accounted for 98.47% of the estimated genome size (7.86 Gb), with 93.67% of the contigs (7.25 Gb) assigned to seven chromosomes. Repetitive elements constituted 90.31% of the assembled genome. Compared to previously sequenced Triticeae genomes, Daniela, Sumaya and Sumana retrotransposons showed strong expansion in rye. Further analyses of the Weining assembly shed new light on genome-wide gene duplications and their impact on starch biosynthesis genes, physical organization of complex prolamin loci, gene expression features underlying early heading trait and putative domestication-associated chromosomal regions and loci in rye. This genome sequence promises to accelerate genomic and breeding studies in rye and related cereal crops. A high-quality genome assembly of Weining rye sheds new light on gene duplications and their effects on starch biosynthesis genes, gene expression features underlying early heading trait and putative domestication-associated chromosomal regions.
••
TL;DR: In this article, the authors used an efficient multiplexing strategy to differentiate 215 human induced pluripotent stem cell (iPSC) lines toward a midbrain neural fate, including dopaminergic neurons, and use single-cell RNA sequencing (scRNA-seq) to profile over 1 million cells across three differentiation time points.
Abstract: Studying the function of common genetic variants in primary human tissues and during development is challenging. To address this, we use an efficient multiplexing strategy to differentiate 215 human induced pluripotent stem cell (iPSC) lines toward a midbrain neural fate, including dopaminergic neurons, and use single-cell RNA sequencing (scRNA-seq) to profile over 1 million cells across three differentiation time points. The proportion of neurons produced by each cell line is highly reproducible and is predictable by robust molecular markers expressed in pluripotent cells. Expression quantitative trait loci (eQTL) were characterized at different stages of neuronal development and in response to rotenone-induced oxidative stress. Of these, 1,284 eQTL colocalize with known neurological trait risk loci, and 46% are not found in the Genotype-Tissue Expression (GTEx) catalog. Our study illustrates how coupling scRNA-seq with long-term iPSC differentiation enables mechanistic studies of human trait-associated genetic variants in otherwise inaccessible cell states.