Showing papers by "Wellcome Trust Sanger Institute published in 2021"
••
TL;DR: The SAMtools and BCFtools packages represent a unique collection of tools that have been used in numerous other software projects and countless genomic pipelines and are freely available on GitHub under the permissive MIT licence, free for both noncommercial and commercial use.
Abstract: Background:
SAMtools and BCFtools are widely used programs for processing and analysing high-throughput sequencing data. They include tools for file format conversion and manipulation, sorting, querying, statistics, variant calling, and effect analysis amongst other methods.
Findings:
The first version appeared online 12 years ago and has been maintained and further developed ever since, with many new features and improvements added over the years. The SAMtools and BCFtools packages represent a unique collection of tools that have been used in numerous other software projects and countless genomic pipelines.
Conclusion:
Both SAMtools and BCFtools are freely available on GitHub under the permissive MIT licence, free for both non-commercial and commercial use. Both packages have been installed >1 million times via Bioconda. The source code and documentation are available from https://www.htslib.org.
2,448 citations
••
TL;DR: A review of the literature on mutations of the SARS-CoV-2 spike protein, the primary antigen, focusing on their impacts on antigenicity and contextualizing them in the protein structure is presented in this article.
Abstract: Although most mutations in the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genome are expected to be either deleterious and swiftly purged or relatively neutral, a small proportion will affect functional properties and may alter infectivity, disease severity or interactions with host immunity. The emergence of SARS-CoV-2 in late 2019 was followed by a period of relative evolutionary stasis lasting about 11 months. Since late 2020, however, SARS-CoV-2 evolution has been characterized by the emergence of sets of mutations, in the context of ‘variants of concern’, that impact virus characteristics, including transmissibility and antigenicity, probably in response to the changing immune profile of the human population. There is emerging evidence of reduced neutralization of some SARS-CoV-2 variants by postvaccination serum; however, a greater understanding of correlates of protection is required to evaluate how this may impact vaccine effectiveness. Nonetheless, manufacturers are preparing platforms for a possible update of vaccine sequences, and it is crucial that surveillance of genetic and antigenic changes in the global virus population is done alongside experiments to elucidate the phenotypic impacts of mutations. In this Review, we summarize the literature on mutations of the SARS-CoV-2 spike protein, the primary antigen, focusing on their impacts on antigenicity and contextualizing them in the protein structure, and discuss them in the context of observed mutation frequencies in global sequence datasets. The evolution of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has been characterized by the emergence of mutations and so-called variants of concern that impact virus characteristics, including transmissibility and antigenicity. In this Review, members of the COVID-19 Genomics UK (COG-UK) Consortium and colleagues summarize mutations of the SARS-CoV-2 spike protein, focusing on their impacts on antigenicity and contextualizing them in the protein structure, and discuss them in the context of observed mutation frequencies in global sequence datasets.
2,047 citations
••
TL;DR: In this paper, the authors show that changes in VOC frequency inferred from genetic data correspond closely to changes inferred by S gene target failures (SGTF) in community-based diagnostic PCR testing.
Abstract: The SARS-CoV-2 lineage B.1.1.7, designated variant of concern (VOC) 202012/01 by Public Health England1, was first identified in the UK in late summer to early autumn 20202. Whole-genome SARS-CoV-2 sequence data collected from community-based diagnostic testing for COVID-19 show an extremely rapid expansion of the B.1.1.7 lineage during autumn 2020, suggesting that it has a selective advantage. Here we show that changes in VOC frequency inferred from genetic data correspond closely to changes inferred by S gene target failures (SGTF) in community-based diagnostic PCR testing. Analysis of trends in SGTF and non-SGTF case numbers in local areas across England shows that B.1.1.7 has higher transmissibility than non-VOC lineages, even if it has a different latent period or generation time. The SGTF data indicate a transient shift in the age composition of reported cases, with cases of B.1.1.7 including a larger share of under 20-year-olds than non-VOC cases. We estimated time-varying reproduction numbers for B.1.1.7 and co-circulating lineages using SGTF and genomic data. The best-supported models did not indicate a substantial difference in VOC transmissibility among different age groups, but all analyses agreed that B.1.1.7 has a substantial transmission advantage over other lineages, with a 50% to 100% higher reproduction number.
827 citations
••
National Institutes of Health1, University of Cambridge2, Wellcome Trust Sanger Institute3, Rockefeller University4, University of California, Davis5, Leibniz Association6, Seoul National University7, University of Southern California8, European Bioinformatics Institute9, Dresden University of Technology10, Max Planck Society11, Radboud University Nijmegen12, University of St Andrews13, University of Massachusetts Amherst14, University of Adelaide15, University of Missouri16, East Carolina University17, University of Queensland18, Clemson University19, University of Otago20, University of Arizona21, Natural History Museum22, Bangor University23, University of Konstanz24, Harvard University25, Northeastern University26, National Museum of Natural History27, University of Antwerp28, University of Graz29, University of Florida30, University of Basel31, University of California, Santa Cruz32, Zoological Society of San Diego33, Pacific Biosciences34, Pompeu Fabra University35, University of Maryland, College Park36, Harbin Institute of Technology37, University of Chicago38, Oregon Health & Science University39, Monash University Malaysia Campus40, Qatar Airways41, University of Milan42, Goethe University Frankfurt43, Pennsylvania State University44, University of Los Andes45, University of Copenhagen46, Norwegian University of Science and Technology47, Agency for Science, Technology and Research48, Royal Ontario Museum49, Smithsonian Institution50, Howard Hughes Medical Institute51, Walter Reed Army Institute of Research52, University of East Anglia53, University College Dublin54, University of Illinois at Urbana–Champaign55, La Trobe University56, University of California, San Diego57, Nova Southeastern University58
TL;DR: The Vertebrate Genomes Project (VGP) as mentioned in this paper is an international effort to generate high quality, complete reference genomes for all of the roughly 70,000 extant vertebrate species and to help to enable a new era of discovery across the life sciences.
Abstract: High-quality and complete reference genome assemblies are fundamental for the application of genomics to biology, disease, and biodiversity conservation. However, such assemblies are available for only a few non-microbial species1-4. To address this issue, the international Genome 10K (G10K) consortium5,6 has worked over a five-year period to evaluate and develop cost-effective methods for assembling highly accurate and nearly complete reference genomes. Here we present lessons learned from generating assemblies for 16 species that represent six major vertebrate lineages. We confirm that long-read sequencing technologies are essential for maximizing genome quality, and that unresolved complex repeats and haplotype heterozygosity are major sources of assembly error when not handled correctly. Our assemblies correct substantial errors, add missing sequence in some of the best historical reference genomes, and reveal biological discoveries. These include the identification of many false gene duplications, increases in gene sizes, chromosome rearrangements that are specific to lineages, a repeated independent chromosome breakpoint in bat genomes, and a canonical GC-rich pattern in protein-coding genes and their regulatory regions. Adopting these lessons, we have embarked on the Vertebrate Genomes Project (VGP), an international effort to generate high-quality, complete reference genomes for all of the roughly 70,000 extant vertebrate species and to help to enable a new era of discovery across the life sciences.
647 citations
••
TL;DR: The SARS-CoV-2 lineage B.7, now designated Variant of Concern 202012/01 (VOC) by Public Health England, originated in the UK in late Summer to early Autumn 2020 as mentioned in this paper.
Abstract: The SARS-CoV-2 lineage B.1.1.7, now designated Variant of Concern 202012/01 (VOC) by Public Health England, originated in the UK in late Summer to early Autumn 2020. We examine epidemiological evidence for this VOC having a transmission advantage from several perspectives. First, whole genome sequence data collected from community-based diagnostic testing provides an indication of changing prevalence of different genetic variants through time. Phylodynamic modelling additionally indicates that genetic diversity of this lineage has changed in a manner consistent with exponential growth. Second, we find that changes in VOC frequency inferred from genetic data correspond closely to changes inferred by S-gene target failures (SGTF) in community-based diagnostic PCR testing. Third, we examine growth trends in SGTF and non-SGTF case numbers at local area level across England, and show that the VOC has higher transmissibility than non-VOC lineages, even if the VOC has a different latent period or generation time. Available SGTF data indicate a shift in the age composition of reported cases, with a larger share of under 20 year olds among reported VOC than non-VOC cases. Fourth, we assess the association of VOC frequency with independent estimates of the overall SARS-CoV-2 reproduction number through time. Finally, we fit a semi-mechanistic model directly to local VOC and non-VOC case incidence to estimate the reproduction numbers over time for each. There is a consensus among all analyses that the VOC has a substantial transmission advantage, with the estimated difference in reproduction numbers between VOC and non-VOC ranging between 0.4 and 0.7, and the ratio of reproduction numbers varying between 1.4 and 1.8. We note that these estimates of transmission advantage apply to a period where high levels of social distancing were in place in England; extrapolation to other transmission contexts therefore requires caution.
547 citations
••
University of Oxford1, Wellcome Trust Sanger Institute2, University of Cambridge3, Public Health England4, Liverpool School of Tropical Medicine5, University of Sheffield6, Newcastle upon Tyne Hospitals NHS Foundation Trust7, Newcastle University8, University Hospital Southampton NHS Foundation Trust9, University of Southampton10, University Hospitals Bristol NHS Foundation Trust11, St George's, University of London12, Guy's and St Thomas' NHS Foundation Trust13, University College London14, University Hospitals Birmingham NHS Foundation Trust15, University of Glasgow16, North Bristol NHS Trust17, University College Hospital18, University of Hull19, Northwest University (China)20, Glasgow Dental Hospital and School21, Western General Hospital22, Nottingham University Hospitals NHS Trust23, University of Nottingham24, AstraZeneca25, Aneurin Bevan University Health Board26, Cardiff University27
TL;DR: A post-hoc analysis of the efficacy of the adenoviral vector vaccine, ChAdOx1 nCoV-19 (AZD1222), against B.1.7, emerged as the dominant cause of COVID-19 disease in the UK from November, 2020 as discussed by the authors.
521 citations
••
TL;DR: The Unified Human Gastrointestinal Genome (UHGG) collection, comprising 204,938 nonredundant genomes from 4,644 gut prokaryotes, is presented, providing comprehensive resources for microbiome researchers.
Abstract: Comprehensive, high-quality reference genomes are required for functional characterization and taxonomic assignment of the human gut microbiota. We present the Unified Human Gastrointestinal Genome (UHGG) collection, comprising 204,938 nonredundant genomes from 4,644 gut prokaryotes. These genomes encode >170 million protein sequences, which we collated in the Unified Human Gastrointestinal Protein (UHGP) catalog. The UHGP more than doubles the number of gut proteins in comparison to those present in the Integrated Gene Catalog. More than 70% of the UHGG species lack cultured representatives, and 40% of the UHGP lack functional annotations. Intraspecies genomic variation analyses revealed a large reservoir of accessory genes and single-nucleotide variants, many of which are specific to individual human populations. The UHGG and UHGP collections will enable studies linking genotypes to phenotypes in the human gut microbiome.
485 citations
••
TL;DR: In this article, the authors generated and analyzed two single-cell RNA sequencing datasets of the human minor salivary glands and gingiva (9 samples, 13,824 cells), identifying 50 cell clusters.
Abstract: Despite signs of infection-including taste loss, dry mouth and mucosal lesions such as ulcerations, enanthema and macules-the involvement of the oral cavity in coronavirus disease 2019 (COVID-19) is poorly understood. To address this, we generated and analyzed two single-cell RNA sequencing datasets of the human minor salivary glands and gingiva (9 samples, 13,824 cells), identifying 50 cell clusters. Using integrated cell normalization and annotation, we classified 34 unique cell subpopulations between glands and gingiva. Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) viral entry factors such as ACE2 and TMPRSS members were broadly enriched in epithelial cells of the glands and oral mucosae. Using orthogonal RNA and protein expression assessments, we confirmed SARS-CoV-2 infection in the glands and mucosae. Saliva from SARS-CoV-2-infected individuals harbored epithelial cells exhibiting ACE2 and TMPRSS expression and sustained SARS-CoV-2 infection. Acellular and cellular salivary fractions from asymptomatic individuals were found to transmit SARS-CoV-2 ex vivo. Matched nasopharyngeal and saliva samples displayed distinct viral shedding dynamics, and salivary viral burden correlated with COVID-19 symptoms, including taste loss. Upon recovery, this asymptomatic cohort exhibited sustained salivary IgG antibodies against SARS-CoV-2. Collectively, these data show that the oral cavity is an important site for SARS-CoV-2 infection and implicate saliva as a potential route of SARS-CoV-2 transmission.
417 citations
••
TL;DR: In this paper, a tried and tested approach for genome curation using gEVAL, the genome evaluation browser, is described and recommended for assembly curation in a GEVAL-independent context to facilitate the uptake of genome curations in the wider community.
Abstract: Genome sequence assemblies provide the basis for our understanding of biology. Generating error-free assemblies is therefore the ultimate, but sadly still unachieved goal of a multitude of research projects. Despite the ever-advancing improvements in data generation, assembly algorithms and pipelines, no automated approach has so far reliably generated near error-free genome assemblies for eukaryotes. Whilst working towards improved datasets and fully automated pipelines, assembly evaluation and curation is actively used to bridge this shortcoming and significantly reduce the number of assembly errors. In addition to this increase in product value, the insights gained from assembly curation are fed back into the automated assembly strategy and contribute to notable improvements in genome assembly quality. We describe our tried and tested approach for assembly curation using gEVAL, the genome evaluation browser. We outline the procedures applied to genome curation using gEVAL and also our recommendations for assembly curation in a gEVAL-independent context to facilitate the uptake of genome curation in the wider community.
373 citations
••
TL;DR: In this article, a first-in-class catalytic inhibitor of METTL3 was identified and characterized, and a crystal structure of STM2457 in complex with METTL 3 and METTL14 was presented.
Abstract: N6-methyladenosine (m6A) is an abundant internal RNA modification1,2 that is catalysed predominantly by the METTL3–METTL14 methyltransferase complex3,4. The m6A methyltransferase METTL3 has been linked to the initiation and maintenance of acute myeloid leukaemia (AML), but the potential of therapeutic applications targeting this enzyme remains unknown5–7. Here we present the identification and characterization of STM2457, a highly potent and selective first-in-class catalytic inhibitor of METTL3, and a crystal structure of STM2457 in complex with METTL3–METTL14. Treatment of tumours with STM2457 leads to reduced AML growth and an increase in differentiation and apoptosis. These cellular effects are accompanied by selective reduction of m6A levels on known leukaemogenic mRNAs and a decrease in their expression consistent with a translational defect. We demonstrate that pharmacological inhibition of METTL3 in vivo leads to impaired engraftment and prolonged survival in various mouse models of AML, specifically targeting key stem cell subpopulations of AML. Collectively, these results reveal the inhibition of METTL3 as a potential therapeutic strategy against AML, and provide proof of concept that the targeting of RNA-modifying enzymes represents a promising avenue for anticancer therapy. Treatment with a specific inhibitor of the N6-methyladenosine methyltransferase METTL3 leads to reduced growth of cancer cells, indicating the potential of approaches targeting RNA-modifying enzymes for anticancer therapy.
362 citations
••
University Medical Center Groningen1, European Bioinformatics Institute2, Netherlands Cancer Institute3, Georgia Institute of Technology4, Leipzig University5, Johns Hopkins University6, NHS Blood and Transplant7, University of Cambridge8, Garvan Institute of Medical Research9, University of Tartu10, Ontario Institute for Cancer Research11, University of Washington12, Public Health Research Institute13, University of Chicago14, Greifswald University Hospital15, Ludwig Maximilian University of Munich16, University of Bristol17, Erasmus University Rotterdam18, Luleå University of Technology19, Royal Devon and Exeter Hospital20, University of Westminster21, University of Lausanne22, Swiss Institute of Bioinformatics23, University of Geneva24, University of Dundee25, Agency for Science, Technology and Research26, University of Queensland27, Leiden University Medical Center28, Radboud University Nijmegen29, University of Liège30, University of Oxford31, Menzies Research Institute32, Icahn School of Medicine at Mount Sinai33, Ikerbasque34, VU University Amsterdam35, Stanford University36, University of Turku37, Turku University Hospital38, Maastricht University39, Karolinska Institutet40, Utrecht University41, University of Helsinki42, National Institutes of Health43, Technische Universität München44, Wellcome Trust Sanger Institute45, German Cancer Research Center46, Westlake University47, University of New South Wales48
TL;DR: In this article, the authors performed cis-and trans-expression quantitative trait locus (eQTL) analyses using blood-derived expression from 31,684 individuals through the eQTLGen Consortium.
Abstract: Trait-associated genetic variants affect complex phenotypes primarily via regulatory mechanisms on the transcriptome. To investigate the genetics of gene expression, we performed cis- and trans-expression quantitative trait locus (eQTL) analyses using blood-derived expression from 31,684 individuals through the eQTLGen Consortium. We detected cis-eQTL for 88% of genes, and these were replicable in numerous tissues. Distal trans-eQTL (detected for 37% of 10,317 trait-associated variants tested) showed lower replication rates, partially due to low replication power and confounding by cell type composition. However, replication analyses in single-cell RNA-seq data prioritized intracellular trans-eQTL. Trans-eQTL exerted their effects via several mechanisms, primarily through regulation by transcription factors. Expression of 13% of the genes correlated with polygenic scores for 1,263 phenotypes, pinpointing potential drivers for those traits. In summary, this work represents a large eQTL resource, and its results serve as a starting point for in-depth interpretation of complex phenotypes.
••
Newcastle University1, University of Cambridge2, European Bioinformatics Institute3, Wellcome Trust Sanger Institute4, University College London5, Ludwig Maximilian University of Munich6, Newcastle upon Tyne Hospitals NHS Foundation Trust7, University College London Hospitals NHS Foundation Trust8, Royal Free Hospital9, UCL Institute of Child Health10, Harvard University11
TL;DR: In this article, the authors performed single-cell transcriptome, surface proteome and T and B lymphocyte antigen receptor analyses of over 780,000 peripheral blood mononuclear cells from a cross-sectional cohort of 130 patients with varying severities of COVID-19.
Abstract: Analysis of human blood immune cells provides insights into the coordinated response to viral infections such as severe acute respiratory syndrome coronavirus 2, which causes coronavirus disease 2019 (COVID-19). We performed single-cell transcriptome, surface proteome and T and B lymphocyte antigen receptor analyses of over 780,000 peripheral blood mononuclear cells from a cross-sectional cohort of 130 patients with varying severities of COVID-19. We identified expansion of nonclassical monocytes expressing complement transcripts (CD16+C1QA/B/C+) that sequester platelets and were predicted to replenish the alveolar macrophage pool in COVID-19. Early, uncommitted CD34+ hematopoietic stem/progenitor cells were primed toward megakaryopoiesis, accompanied by expanded megakaryocyte-committed progenitors and increased platelet activation. Clonally expanded CD8+ T cells and an increased ratio of CD8+ effector T cells to effector memory T cells characterized severe disease, while circulating follicular helper T cells accompanied mild disease. We observed a relative loss of IgA2 in symptomatic disease despite an overall expansion of plasmablasts and plasma cells. Our study highlights the coordinated immune response that contributes to COVID-19 pathogenesis and reveals discrete cellular components that can be targeted for therapy.
••
26 Aug 2021
TL;DR: This Primer provides an introduction to genome-wide association studies (GWAS), techniques for deriving functional inferences from the results and applications of GWAS in understanding disease risk and trait architecture, and discusses important ethical considerations when considering GWAS populations and data.
Abstract: Genome-wide association studies (GWAS) test hundreds of thousands of genetic variants across many genomes to find those statistically associated with a specific trait or disease. This methodology has generated a myriad of robust associations for a range of traits and diseases, and the number of associated variants is expected to grow steadily as GWAS sample sizes increase. GWAS results have a range of applications, such as gaining insight into a phenotype’s underlying biology, estimating its heritability, calculating genetic correlations, making clinical risk predictions, informing drug development programmes and inferring potential causal relationships between risk factors and health outcomes. In this Primer, we provide the reader with an introduction to GWAS, explaining their statistical basis and how they are conducted, describe state-of-the art approaches and discuss limitations and challenges, concluding with an overview of the current and future applications for GWAS results. Uffelmann et al. describe the key considerations and best practices for conducting genome-wide association studies (GWAS), techniques for deriving functional inferences from the results and applications of GWAS in understanding disease risk and trait architecture. The Primer also provides information on the best practices for data sharing and discusses important ethical considerations when considering GWAS populations and data.
••
TL;DR: The Gut Phage Database as discussed by the authors is a collection of ∼142,000 non-redundant viral genomes (>10 kb) obtained by mining a dataset of 28,060 globally distributed human gut metagenomes and 2,898 reference genomes of cultured gut bacteria.
••
TL;DR: To aid the prioritisation of targets and inform on the potential impact of modulating a given target, evaluation of post-marketing adverse drug reactions and new curated information on target tractability and safety are added.
Abstract: The Open Targets Platform (https://www.targetvalidation.org/) provides users with a queryable knowledgebase and user interface to aid systematic target identification and prioritisation for drug discovery based upon underlying evidence. It is publicly available and the underlying code is open source. Since our last update two years ago, we have had 10 releases to maintain and continuously improve evidence for target-disease relationships from 20 different data sources. In addition, we have integrated new evidence from key datasets, including prioritised targets identified from genome-wide CRISPR knockout screens in 300 cancer models (Project Score), and GWAS/UK BioBank statistical genetic analysis evidence from the Open Targets Genetics Portal. We have evolved our evidence scoring framework to improve target identification. To aid the prioritisation of targets and inform on the potential impact of modulating a given target, we have added evaluation of post-marketing adverse drug reactions and new curated information on target tractability and safety. We have also developed the user interface and backend technologies to improve performance and usability. In this article, we describe the latest enhancements to the Platform, to address the fundamental challenge that developing effective and safe drugs is difficult and expensive.
••
TL;DR: Open Targets Genetics offers tools that enable users to prioritise causal variants and genes at disease-associated loci and access systematic cross-disease and disease-molecular trait colocalization analysis across 92 cell types and tissues including the eQTL Catalogue.
Abstract: Open Targets Genetics (https://genetics.opentargets.org) is an open-access integrative resource that aggregates human GWAS and functional genomics data including gene expression, protein abundance, chromatin interaction and conformation data from a wide range of cell types and tissues to make robust connections between GWAS-associated loci, variants and likely causal genes. This enables systematic identification and prioritisation of likely causal variants and genes across all published trait-associated loci. In this paper, we describe the public resources we aggregate, the technology and analyses we use, and the functionality that the portal offers. Open Targets Genetics can be searched by variant, gene or study/phenotype. It offers tools that enable users to prioritise causal variants and genes at disease-associated loci and access systematic cross-disease and disease-molecular trait colocalization analysis across 92 cell types and tissues including the eQTL Catalogue. Data visualizations such as Manhattan-like plots, regional plots, credible sets overlap between studies and PheWAS plots enable users to explore GWAS signals in depth. The integrated data is made available through the web portal, for bulk download and via a GraphQL API, and the software is open source. Applications of this integrated data include identification of novel targets for drug discovery and drug repurposing.
••
Broad Institute1, Harvard University2, Duke University3, Massachusetts Institute of Technology4, University of California, San Diego5, Icahn School of Medicine at Mount Sinai6, Brigham and Women's Hospital7, Yale University8, Ragon Institute of MGH, MIT and Harvard9, Royal Institute of Technology10, University of Bonn11, Centre national de la recherche scientifique12, Wellcome Trust Sanger Institute13, Karolinska Institutet14, Translational Genomics Research Institute15, Boston University16, Hannover Medical School17, European Bioinformatics Institute18, Boston Medical Center19, Technische Universität München20, University of Cambridge21, Stanford University22
TL;DR: In this paper, cell-type-specific expression of ACE2, TMPRSS2 and CTSL across 107 single-cell RNA-sequencing studies from different tissues was assessed.
Abstract: Angiotensin-converting enzyme 2 (ACE2) and accessory proteases (TMPRSS2 and CTSL) are needed for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) cellular entry, and their expression may shed light on viral tropism and impact across the body. We assessed the cell-type-specific expression of ACE2, TMPRSS2 and CTSL across 107 single-cell RNA-sequencing studies from different tissues. ACE2, TMPRSS2 and CTSL are coexpressed in specific subsets of respiratory epithelial cells in the nasal passages, airways and alveoli, and in cells from other organs associated with coronavirus disease 2019 (COVID-19) transmission or pathology. We performed a meta-analysis of 31 lung single-cell RNA-sequencing studies with 1,320,896 cells from 377 nasal, airway and lung parenchyma samples from 228 individuals. This revealed cell-type-specific associations of age, sex and smoking with expression levels of ACE2, TMPRSS2 and CTSL. Expression of entry factors increased with age and in males, including in airway secretory cells and alveolar type 2 cells. Expression programs shared by ACE2+TMPRSS2+ cells in nasal, lung and gut tissues included genes that may mediate viral entry, key immune functions and epithelial-macrophage cross-talk, such as genes involved in the interleukin-6, interleukin-1, tumor necrosis factor and complement pathways. Cell-type-specific expression patterns may contribute to the pathogenesis of COVID-19, and our work highlights putative molecular pathways for therapeutic intervention.
••
TL;DR: This paper conducted a meta-analysis of prostate cancer genome-wide association studies (107,247 cases and 127,006 controls) and identified 86 new genetic risk variants independently associated with prostate cancer risk, bringing the total to 269 known risk variants.
Abstract: Prostate cancer is a highly heritable disease with large disparities in incidence rates across ancestry populations. We conducted a multiancestry meta-analysis of prostate cancer genome-wide association studies (107,247 cases and 127,006 controls) and identified 86 new genetic risk variants independently associated with prostate cancer risk, bringing the total to 269 known risk variants. The top genetic risk score (GRS) decile was associated with odds ratios that ranged from 5.06 (95% confidence interval (CI), 4.84–5.29) for men of European ancestry to 3.74 (95% CI, 3.36–4.17) for men of African ancestry. Men of African ancestry were estimated to have a mean GRS that was 2.18-times higher (95% CI, 2.14–2.22), and men of East Asian ancestry 0.73-times lower (95% CI, 0.71–0.76), than men of European ancestry. These findings support the role of germline variation contributing to population differences in prostate cancer risk, with the GRS offering an approach for personalized risk prediction.
••
Francis Crick Institute1, Wellcome Trust Sanger Institute2, University of Oxford3, Broad Institute4, University of Toronto5, University of Texas MD Anderson Cancer Center6, University of Cambridge7, Katholieke Universiteit Leuven8, Heidelberg University9, German Cancer Research Center10, Simon Fraser University11, Vancouver Prostate Centre12, NorthShore University HealthSystem13, Oregon Health & Science University14, University of Melbourne15, Walter and Eliza Hall Institute of Medical Research16, Cornell University17, University of California, Santa Cruz18, University College London19, University of California, Los Angeles20, Ontario Institute for Cancer Research21, Harvard University22, University of Chicago23, University of Cologne24, University of Helsinki25, University of Glasgow26, European Bioinformatics Institute27, University of Manchester28
TL;DR: In this article, the authors extensively characterize intra-tumor heterogeneity (ITH) across whole-genome sequences of 2,658 cancer samples spanning 38 cancer types and identify cancer type-specific subclonal patterns of driver gene mutations, fusions, structural variants, and copy number alterations.
••
TL;DR: In this paper, the authors performed an updated genome-wide AD meta-analysis, which identified 37 risk loci, including new associations near CCDC6, TSPAN14, NCK2 and SPRED2.
Abstract: Genome-wide association studies have discovered numerous genomic loci associated with Alzheimer's disease (AD); yet the causal genes and variants are incompletely identified. We performed an updated genome-wide AD meta-analysis, which identified 37 risk loci, including new associations near CCDC6, TSPAN14, NCK2 and SPRED2. Using three SNP-level fine-mapping methods, we identified 21 SNPs with >50% probability each of being causally involved in AD risk and others strongly suggested by functional annotation. We followed this with colocalization analyses across 109 gene expression quantitative trait loci datasets and prioritization of genes by using protein interaction networks and tissue-specific expression. Combining this information into a quantitative score, we found that evidence converged on likely causal genes, including the above four genes, and those at previously discovered AD loci, including BIN1, APH1B, PTK2B, PILRA and CASS4.
••
TL;DR: The Polygenic Score (PGS) catalog as discussed by the authors is an open resource of published scores (including variants, alleles and weights) and consistently curated metadata required for reproducibility and independent applications.
Abstract: We present the Polygenic Score (PGS) Catalog (
https://www.PGSCatalog.org
), an open resource of published scores (including variants, alleles and weights) and consistently curated metadata required for reproducibility and independent applications. The PGS Catalog has capabilities for user deposition, expert curation and programmatic access, thus providing the community with a platform for PGS dissemination, research and translation.
••
TL;DR: In this paper, the transcriptomes of more than 500,000 single cells from developing human fetal skin, healthy adult skin, and adult skin with atopic dermatitis and psoriasis were compared across development, homeostasis, and disease.
Abstract: The skin confers biophysical and immunological protection through a complex cellular network established early in embryonic development. We profiled the transcriptomes of more than 500,000 single cells from developing human fetal skin, healthy adult skin, and adult skin with atopic dermatitis and psoriasis. We leveraged these datasets to compare cell states across development, homeostasis, and disease. Our analysis revealed an enrichment of innate immune cells in skin during the first trimester and clonal expansion of disease-associated lymphocytes in atopic dermatitis and psoriasis. We uncovered and validated in situ a reemergence of prenatal vascular endothelial cell and macrophage cellular programs in atopic dermatitis and psoriasis lesional skin. These data illustrate the dynamism of cutaneous immunity and provide opportunities for targeting pathological developmental programs in inflammatory skin diseases.
••
TL;DR: In this paper, the authors used whole-genome sequencing of clonal cell isolates that developed chemotherapeutic resistance to show that chromothripsis is a major driver of circular extrachromosomal DNA (ecDNA) amplification through mechanisms that depend on poly(ADP-ribose) polymerases (PARP) and the catalytic subunit of DNA-dependent protein kinase (DNA-PKcs).
Abstract: Focal chromosomal amplification contributes to the initiation of cancer by mediating overexpression of oncogenes1–3, and to the development of cancer therapy resistance by increasing the expression of genes whose action diminishes the efficacy of anti-cancer drugs. Here we used whole-genome sequencing of clonal cell isolates that developed chemotherapeutic resistance to show that chromothripsis is a major driver of circular extrachromosomal DNA (ecDNA) amplification (also known as double minutes) through mechanisms that depend on poly(ADP-ribose) polymerases (PARP) and the catalytic subunit of DNA-dependent protein kinase (DNA-PKcs). Longitudinal analyses revealed that a further increase in drug tolerance is achieved by structural evolution of ecDNAs through additional rounds of chromothripsis. In situ Hi-C sequencing showed that ecDNAs preferentially tether near chromosome ends, where they re-integrate when DNA damage is present. Intrachromosomal amplifications that formed initially under low-level drug selection underwent continuing breakage–fusion–bridge cycles, generating amplicons more than 100 megabases in length that became trapped within interphase bridges and then shattered, thereby producing micronuclei whose encapsulated ecDNAs are substrates for chromothripsis. We identified similar genome rearrangement profiles linked to localized gene amplification in human cancers with acquired drug resistance or oncogene amplifications. We propose that chromothripsis is a primary mechanism that accelerates genomic DNA rearrangement and amplification into ecDNA and enables rapid acquisition of tolerance to altered growth conditions. Chromothripsis—a process during which chromosomes are ‘shattered’—drives the evolution of gene amplification and subsequent drug resistance in cancer cells.
••
TL;DR: NanoSeq as discussed by the authors is a duplex sequencing protocol with error rates of less than five errors per billion base pairs in single DNA molecules from cell populations, enabling the study of somatic mutations in any tissue independently of clonality.
Abstract: Somatic mutations drive the development of cancer and may contribute to ageing and other diseases1,2. Despite their importance, the difficulty of detecting mutations that are only present in single cells or small clones has limited our knowledge of somatic mutagenesis to a minority of tissues. Here, to overcome these limitations, we developed nanorate sequencing (NanoSeq), a duplex sequencing protocol with error rates of less than five errors per billion base pairs in single DNA molecules from cell populations. This rate is two orders of magnitude lower than typical somatic mutation loads, enabling the study of somatic mutations in any tissue independently of clonality. We used this single-molecule sensitivity to study somatic mutations in non-dividing cells across several tissues, comparing stem cells to differentiated cells and studying mutagenesis in the absence of cell division. Differentiated cells in blood and colon displayed remarkably similar mutation loads and signatures to their corresponding stem cells, despite mature blood cells having undergone considerably more divisions. We then characterized the mutational landscape of post-mitotic neurons and polyclonal smooth muscle, confirming that neurons accumulate somatic mutations at a constant rate throughout life without cell division, with similar rates to mitotically active tissues. Together, our results suggest that mutational processes that are independent of cell division are important contributors to somatic mutagenesis. We anticipate that the ability to reliably detect mutations in single DNA molecules could transform our understanding of somatic mutagenesis and enable non-invasive studies on large-scale cohorts. NanoSeq is used to detect mutations in single DNA molecules and analyses show that mutational processes that are independent of cell division are important contributors to somatic mutagenesis.
••
TL;DR: This paper aggregated genome-wide association studies comprising up to 281,416 individuals without diabetes (30% non-European ancestry) for whom fasting glucose, 2-h glucose after an oral glucose challenge, glycated hemoglobin and fasting insulin data were available.
Abstract: Glycemic traits are used to diagnose and monitor type 2 diabetes and cardiometabolic health. To date, most genetic studies of glycemic traits have focused on individuals of European ancestry. Here we aggregated genome-wide association studies comprising up to 281,416 individuals without diabetes (30% non-European ancestry) for whom fasting glucose, 2-h glucose after an oral glucose challenge, glycated hemoglobin and fasting insulin data were available. Trans-ancestry and single-ancestry meta-analyses identified 242 loci (99 novel; P < 5 × 10-8), 80% of which had no significant evidence of between-ancestry heterogeneity. Analyses restricted to individuals of European ancestry with equivalent sample size would have led to 24 fewer new loci. Compared with single-ancestry analyses, equivalent-sized trans-ancestry fine-mapping reduced the number of estimated variants in 99% credible sets by a median of 37.5%. Genomic-feature, gene-expression and gene-set analyses revealed distinct biological signatures for each trait, highlighting different underlying biological pathways. Our results increase our understanding of diabetes pathophysiology by using trans-ancestry studies for improved power and resolution.
••
TL;DR: In this article, the authors compared the incidence of VAP and secondary infections using a combination of microbial culture and a TaqMan multi-pathogen array, and determined the lung microbiome composition using 16S RNA analysis in a subset of samples.
Abstract: Pandemic COVID-19 caused by the coronavirus SARS-CoV-2 has a high incidence of patients with severe acute respiratory syndrome (SARS). Many of these patients require admission to an intensive care unit (ICU) for invasive ventilation and are at significant risk of developing a secondary, ventilator-associated pneumonia (VAP). To study the incidence of VAP and bacterial lung microbiome composition of ventilated COVID-19 and non-COVID-19 patients.
In this retrospective observational study, we compared the incidence of VAP and secondary infections using a combination of microbial culture and a TaqMan multi-pathogen array. In addition, we determined the lung microbiome composition using 16S RNA analysis in a subset of samples. The study involved 81 COVID-19 and 144 non-COVID-19 patients receiving invasive ventilation in a single University teaching hospital between March 15th 2020 and August 30th 2020. COVID-19 patients were significantly more likely to develop VAP than patients without COVID (Cox proportional hazard ratio 2.01 95% CI 1.14–3.54, p = 0.0015) with an incidence density of 28/1000 ventilator days versus 13/1000 for patients without COVID (p = 0.009). Although the distribution of organisms causing VAP was similar between the two groups, and the pulmonary microbiome was similar, we identified 3 cases of invasive aspergillosis amongst the patients with COVID-19 but none in the non-COVID-19 cohort. Herpesvirade activation was also numerically more frequent amongst patients with COVID-19. COVID-19 is associated with an increased risk of VAP, which is not fully explained by the prolonged duration of ventilation. The pulmonary dysbiosis caused by COVID-19, and the causative organisms of secondary pneumonia observed are similar to that seen in critically ill patients ventilated for other reasons.
••
Wellcome Trust Sanger Institute1, Laboratory of Molecular Biology2, Queen Mary University of London3, University of Cambridge4, Newcastle University5, University College London6, Cambridge University Hospitals NHS Foundation Trust7, European Bioinformatics Institute8, University of Oxford9, John Radcliffe Hospital10, Garvan Institute of Medical Research11
TL;DR: The cellular landscape of the human intestinal tract is dynamic throughout life, developing in utero and changing in response to functional requirements and environmental exposures as discussed by the authors, using single-cell RNA sequencing and antigen receptor analysis of almost half a million cells from up to 5 anatomical regions of the developing and up to 11 distinct anatomical regions in the healthy human gut.
Abstract: The cellular landscape of the human intestinal tract is dynamic throughout life, developing in utero and changing in response to functional requirements and environmental exposures. Here, to comprehensively map cell lineages, we use single-cell RNA sequencing and antigen receptor analysis of almost half a million cells from up to 5 anatomical regions in the developing and up to 11 distinct anatomical regions in the healthy paediatric and adult human gut. This reveals the existence of transcriptionally distinct BEST4 epithelial cells throughout the human intestinal tract. Furthermore, we implicate IgG sensing as a function of intestinal tuft cells. We describe neural cell populations in the developing enteric nervous system, and predict cell-type-specific expression of genes associated with Hirschsprung’s disease. Finally, using a systems approach, we identify key cell players that drive the formation of secondary lymphoid tissue in early human development. We show that these programs are adopted in inflammatory bowel disease to recruit and retain immune cells at the site of inflammation. This catalogue of intestinal cells will provide new insights into cellular programs in development, homeostasis and disease. Cells from embryonic, fetal, paediatric and adult human intestinal tissue are analysed at different locations along the intestinal tract to construct a single-cell atlas of the developing and adult human intestinal tract, encompassing all cell lineages.
••
TL;DR: In this article, a broad collection of approaches ranging from batch correction of individual omics datasets to association of chromatin accessibility and genetic variation with transcription are reviewed, as the number of single-cell experiments with multiple data modalities increases.
Abstract: The development of single-cell multimodal assays provides a powerful tool for investigating multiple dimensions of cellular heterogeneity, enabling new insights into development, tissue homeostasis and disease. A key challenge in the analysis of single-cell multimodal data is to devise appropriate strategies for tying together data across different modalities. The term ‘data integration’ has been used to describe this task, encompassing a broad collection of approaches ranging from batch correction of individual omics datasets to association of chromatin accessibility and genetic variation with transcription. Although existing integration strategies exploit similar mathematical ideas, they typically have distinct goals and rely on different principles and assumptions. Consequently, new definitions and concepts are needed to contextualize existing methods and to enable development of new methods. As the number of single-cell experiments with multiple data modalities increases, Argelaguet and colleagues review the concepts and challenges of data integration.
••
TL;DR: In this paper, the authors dissect the signaling pathways that determine cell fate of the epithelial lineages in the lumenal and glandular microenvironments of the endometrium.
Abstract: The endometrium, the mucosal lining of the uterus, undergoes dynamic changes throughout the menstrual cycle in response to ovarian hormones. We have generated dense single-cell and spatial reference maps of the human uterus and three-dimensional endometrial organoid cultures. We dissect the signaling pathways that determine cell fate of the epithelial lineages in the lumenal and glandular microenvironments. Our benchmark of the endometrial organoids reveals the pathways and cell states regulating differentiation of the secretory and ciliated lineages both in vivo and in vitro. In vitro downregulation of WNT or NOTCH pathways increases the differentiation efficiency along the secretory and ciliated lineages, respectively. We utilize our cellular maps to deconvolute bulk data from endometrial cancers and endometriotic lesions, illuminating the cell types dominating in each of these disorders. These mechanistic insights provide a platform for future development of treatments for common conditions including endometriosis and endometrial carcinoma. Single-cell and spatial transcriptomic profiling of the human endometrium highlights pathways governing the proliferative and secretory phases of the menstrual cycle. Analyses of endometrial organoids show that WNT and NOTCH signaling modulate differentiation into the secretory and ciliated epithelial lineages, respectively.
••
TL;DR: This tutorial provides a hands-on guide for experimentalists interested in analyzing their data as well as an overview for bioinformaticians seeking to develop new computational methods.
Abstract: Single-cell RNA sequencing (scRNA-seq) is a popular and powerful technology that allows you to profile the whole transcriptome of a large number of individual cells. However, the analysis of the large volumes of data generated from these experiments requires specialized statistical and computational methods. Here we present an overview of the computational workflow involved in processing scRNA-seq data. We discuss some of the most common tasks and the tools available for addressing central biological questions. In this article and our companion website ( https://scrnaseq-course.cog.sanger.ac.uk/website/index.html ), we provide guidelines regarding best practices for performing computational analyses. This tutorial provides a hands-on guide for experimentalists interested in analyzing their data as well as an overview for bioinformaticians seeking to develop new computational methods.