scispace - formally typeset
Search or ask a question

Showing papers by "Wellcome Trust Sanger Institute published in 2021"


Journal ArticleDOI
TL;DR: The SAMtools and BCFtools packages represent a unique collection of tools that have been used in numerous other software projects and countless genomic pipelines and are freely available on GitHub under the permissive MIT licence, free for both noncommercial and commercial use.
Abstract: Background: SAMtools and BCFtools are widely used programs for processing and analysing high-throughput sequencing data. They include tools for file format conversion and manipulation, sorting, querying, statistics, variant calling, and effect analysis amongst other methods. Findings: The first version appeared online 12 years ago and has been maintained and further developed ever since, with many new features and improvements added over the years. The SAMtools and BCFtools packages represent a unique collection of tools that have been used in numerous other software projects and countless genomic pipelines. Conclusion: Both SAMtools and BCFtools are freely available on GitHub under the permissive MIT licence, free for both non-commercial and commercial use. Both packages have been installed >1 million times via Bioconda. The source code and documentation are available from https://www.htslib.org.

2,448 citations


Journal ArticleDOI
TL;DR: A review of the literature on mutations of the SARS-CoV-2 spike protein, the primary antigen, focusing on their impacts on antigenicity and contextualizing them in the protein structure is presented in this article.
Abstract: Although most mutations in the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genome are expected to be either deleterious and swiftly purged or relatively neutral, a small proportion will affect functional properties and may alter infectivity, disease severity or interactions with host immunity. The emergence of SARS-CoV-2 in late 2019 was followed by a period of relative evolutionary stasis lasting about 11 months. Since late 2020, however, SARS-CoV-2 evolution has been characterized by the emergence of sets of mutations, in the context of ‘variants of concern’, that impact virus characteristics, including transmissibility and antigenicity, probably in response to the changing immune profile of the human population. There is emerging evidence of reduced neutralization of some SARS-CoV-2 variants by postvaccination serum; however, a greater understanding of correlates of protection is required to evaluate how this may impact vaccine effectiveness. Nonetheless, manufacturers are preparing platforms for a possible update of vaccine sequences, and it is crucial that surveillance of genetic and antigenic changes in the global virus population is done alongside experiments to elucidate the phenotypic impacts of mutations. In this Review, we summarize the literature on mutations of the SARS-CoV-2 spike protein, the primary antigen, focusing on their impacts on antigenicity and contextualizing them in the protein structure, and discuss them in the context of observed mutation frequencies in global sequence datasets. The evolution of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has been characterized by the emergence of mutations and so-called variants of concern that impact virus characteristics, including transmissibility and antigenicity. In this Review, members of the COVID-19 Genomics UK (COG-UK) Consortium and colleagues summarize mutations of the SARS-CoV-2 spike protein, focusing on their impacts on antigenicity and contextualizing them in the protein structure, and discuss them in the context of observed mutation frequencies in global sequence datasets.

2,047 citations


Journal ArticleDOI
25 Mar 2021-Nature
TL;DR: In this paper, the authors show that changes in VOC frequency inferred from genetic data correspond closely to changes inferred by S gene target failures (SGTF) in community-based diagnostic PCR testing.
Abstract: The SARS-CoV-2 lineage B.1.1.7, designated variant of concern (VOC) 202012/01 by Public Health England1, was first identified in the UK in late summer to early autumn 20202. Whole-genome SARS-CoV-2 sequence data collected from community-based diagnostic testing for COVID-19 show an extremely rapid expansion of the B.1.1.7 lineage during autumn 2020, suggesting that it has a selective advantage. Here we show that changes in VOC frequency inferred from genetic data correspond closely to changes inferred by S gene target failures (SGTF) in community-based diagnostic PCR testing. Analysis of trends in SGTF and non-SGTF case numbers in local areas across England shows that B.1.1.7 has higher transmissibility than non-VOC lineages, even if it has a different latent period or generation time. The SGTF data indicate a transient shift in the age composition of reported cases, with cases of B.1.1.7 including a larger share of under 20-year-olds than non-VOC cases. We estimated time-varying reproduction numbers for B.1.1.7 and co-circulating lineages using SGTF and genomic data. The best-supported models did not indicate a substantial difference in VOC transmissibility among different age groups, but all analyses agreed that B.1.1.7 has a substantial transmission advantage over other lineages, with a 50% to 100% higher reproduction number.

827 citations


Journal ArticleDOI
Arang Rhie1, Shane A. McCarthy2, Shane A. McCarthy3, Olivier Fedrigo4, Joana Damas5, Giulio Formenti4, Sergey Koren1, Marcela Uliano-Silva6, William Chow3, Arkarachai Fungtammasan, J. H. Kim7, Chul Hee Lee7, Byung June Ko7, Mark Chaisson8, Gregory Gedman4, Lindsey J. Cantin4, Françoise Thibaud-Nissen1, Leanne Haggerty9, Iliana Bista2, Iliana Bista3, Michelle Smith3, Bettina Haase4, Jacquelyn Mountcastle4, Sylke Winkler10, Sylke Winkler11, Sadye Paez4, Jason T. Howard, Sonja C. Vernes11, Sonja C. Vernes12, Sonja C. Vernes13, Tanya M. Lama14, Frank Grützner15, Wesley C. Warren16, Christopher N. Balakrishnan17, Dave W Burt18, Jimin George19, Matthew T. Biegler4, David Iorns, Andrew Digby, Daryl Eason, Bruce C. Robertson20, Taylor Edwards21, Mark Wilkinson22, George F. Turner23, Axel Meyer24, Andreas F. Kautt24, Andreas F. Kautt25, Paolo Franchini24, H. William Detrich26, Hannes Svardal27, Hannes Svardal28, Maximilian Wagner29, Gavin J. P. Naylor30, Martin Pippel11, Milan Malinsky31, Milan Malinsky3, Mark Mooney, Maria Simbirsky, Brett T. Hannigan, Trevor Pesout32, Marlys L. Houck33, Ann C Misuraca33, Sarah B. Kingan34, Richard Hall34, Zev N. Kronenberg34, Ivan Sović34, Christopher Dunn34, Zemin Ning3, Alex Hastie, Joyce V. Lee, Siddarth Selvaraj, Richard E. Green32, Nicholas H. Putnam, Ivo Gut35, Jay Ghurye36, Erik Garrison32, Ying Sims3, Joanna Collins3, Sarah Pelan3, James Torrance3, Alan Tracey3, Jonathan Wood3, Robel E. Dagnew8, Dengfeng Guan37, Dengfeng Guan2, Sarah E. London38, David F. Clayton19, Claudio V. Mello39, Samantha R. Friedrich39, Peter V. Lovell39, Ekaterina Osipova11, Farooq O. Al-Ajli40, Farooq O. Al-Ajli41, Simona Secomandi42, Heebal Kim7, Constantina Theofanopoulou4, Michael Hiller43, Yang Zhou, Robert S. Harris44, Kateryna D. Makova44, Paul Medvedev44, Jinna Hoffman1, Patrick Masterson1, Karen Clark1, Fergal J. Martin9, Kevin L. Howe9, Paul Flicek9, Brian P. Walenz1, Woori Kwak, Hiram Clawson32, Mark Diekhans32, Luis R Nassar32, Benedict Paten32, Robert H. S. Kraus11, Robert H. S. Kraus24, Andrew J. Crawford45, M. Thomas P. Gilbert46, M. Thomas P. Gilbert47, Guojie Zhang, Byrappa Venkatesh48, Robert W. Murphy49, Klaus-Peter Koepfli50, Beth Shapiro32, Beth Shapiro51, Warren E. Johnson50, Warren E. Johnson52, Federica Di Palma53, Tomas Marques-Bonet, Emma C. Teeling54, Tandy Warnow55, Jennifer A. Marshall Graves56, Oliver A. Ryder33, Oliver A. Ryder57, David Haussler32, Stephen J. O'Brien58, Jonas Korlach34, Harris A. Lewin5, Kerstin Howe3, Eugene W. Myers11, Eugene W. Myers10, Richard Durbin3, Richard Durbin2, Adam M. Phillippy1, Erich D. Jarvis4, Erich D. Jarvis51 
National Institutes of Health1, University of Cambridge2, Wellcome Trust Sanger Institute3, Rockefeller University4, University of California, Davis5, Leibniz Association6, Seoul National University7, University of Southern California8, European Bioinformatics Institute9, Dresden University of Technology10, Max Planck Society11, Radboud University Nijmegen12, University of St Andrews13, University of Massachusetts Amherst14, University of Adelaide15, University of Missouri16, East Carolina University17, University of Queensland18, Clemson University19, University of Otago20, University of Arizona21, Natural History Museum22, Bangor University23, University of Konstanz24, Harvard University25, Northeastern University26, National Museum of Natural History27, University of Antwerp28, University of Graz29, University of Florida30, University of Basel31, University of California, Santa Cruz32, Zoological Society of San Diego33, Pacific Biosciences34, Pompeu Fabra University35, University of Maryland, College Park36, Harbin Institute of Technology37, University of Chicago38, Oregon Health & Science University39, Monash University Malaysia Campus40, Qatar Airways41, University of Milan42, Goethe University Frankfurt43, Pennsylvania State University44, University of Los Andes45, University of Copenhagen46, Norwegian University of Science and Technology47, Agency for Science, Technology and Research48, Royal Ontario Museum49, Smithsonian Institution50, Howard Hughes Medical Institute51, Walter Reed Army Institute of Research52, University of East Anglia53, University College Dublin54, University of Illinois at Urbana–Champaign55, La Trobe University56, University of California, San Diego57, Nova Southeastern University58
28 Apr 2021-Nature
TL;DR: The Vertebrate Genomes Project (VGP) as mentioned in this paper is an international effort to generate high quality, complete reference genomes for all of the roughly 70,000 extant vertebrate species and to help to enable a new era of discovery across the life sciences.
Abstract: High-quality and complete reference genome assemblies are fundamental for the application of genomics to biology, disease, and biodiversity conservation. However, such assemblies are available for only a few non-microbial species1-4. To address this issue, the international Genome 10K (G10K) consortium5,6 has worked over a five-year period to evaluate and develop cost-effective methods for assembling highly accurate and nearly complete reference genomes. Here we present lessons learned from generating assemblies for 16 species that represent six major vertebrate lineages. We confirm that long-read sequencing technologies are essential for maximizing genome quality, and that unresolved complex repeats and haplotype heterozygosity are major sources of assembly error when not handled correctly. Our assemblies correct substantial errors, add missing sequence in some of the best historical reference genomes, and reveal biological discoveries. These include the identification of many false gene duplications, increases in gene sizes, chromosome rearrangements that are specific to lineages, a repeated independent chromosome breakpoint in bat genomes, and a canonical GC-rich pattern in protein-coding genes and their regulatory regions. Adopting these lessons, we have embarked on the Vertebrate Genomes Project (VGP), an international effort to generate high-quality, complete reference genomes for all of the roughly 70,000 extant vertebrate species and to help to enable a new era of discovery across the life sciences.

647 citations


Posted ContentDOI
04 Jan 2021-medRxiv
TL;DR: The SARS-CoV-2 lineage B.7, now designated Variant of Concern 202012/01 (VOC) by Public Health England, originated in the UK in late Summer to early Autumn 2020 as mentioned in this paper.
Abstract: The SARS-CoV-2 lineage B.1.1.7, now designated Variant of Concern 202012/01 (VOC) by Public Health England, originated in the UK in late Summer to early Autumn 2020. We examine epidemiological evidence for this VOC having a transmission advantage from several perspectives. First, whole genome sequence data collected from community-based diagnostic testing provides an indication of changing prevalence of different genetic variants through time. Phylodynamic modelling additionally indicates that genetic diversity of this lineage has changed in a manner consistent with exponential growth. Second, we find that changes in VOC frequency inferred from genetic data correspond closely to changes inferred by S-gene target failures (SGTF) in community-based diagnostic PCR testing. Third, we examine growth trends in SGTF and non-SGTF case numbers at local area level across England, and show that the VOC has higher transmissibility than non-VOC lineages, even if the VOC has a different latent period or generation time. Available SGTF data indicate a shift in the age composition of reported cases, with a larger share of under 20 year olds among reported VOC than non-VOC cases. Fourth, we assess the association of VOC frequency with independent estimates of the overall SARS-CoV-2 reproduction number through time. Finally, we fit a semi-mechanistic model directly to local VOC and non-VOC case incidence to estimate the reproduction numbers over time for each. There is a consensus among all analyses that the VOC has a substantial transmission advantage, with the estimated difference in reproduction numbers between VOC and non-VOC ranging between 0.4 and 0.7, and the ratio of reproduction numbers varying between 1.4 and 1.8. We note that these estimates of transmission advantage apply to a period where high levels of social distancing were in place in England; extrapolation to other transmission contexts therefore requires caution.

547 citations


Journal ArticleDOI
TL;DR: A post-hoc analysis of the efficacy of the adenoviral vector vaccine, ChAdOx1 nCoV-19 (AZD1222), against B.1.7, emerged as the dominant cause of COVID-19 disease in the UK from November, 2020 as discussed by the authors.

521 citations


Journal ArticleDOI
TL;DR: The Unified Human Gastrointestinal Genome (UHGG) collection, comprising 204,938 nonredundant genomes from 4,644 gut prokaryotes, is presented, providing comprehensive resources for microbiome researchers.
Abstract: Comprehensive, high-quality reference genomes are required for functional characterization and taxonomic assignment of the human gut microbiota. We present the Unified Human Gastrointestinal Genome (UHGG) collection, comprising 204,938 nonredundant genomes from 4,644 gut prokaryotes. These genomes encode >170 million protein sequences, which we collated in the Unified Human Gastrointestinal Protein (UHGP) catalog. The UHGP more than doubles the number of gut proteins in comparison to those present in the Integrated Gene Catalog. More than 70% of the UHGG species lack cultured representatives, and 40% of the UHGP lack functional annotations. Intraspecies genomic variation analyses revealed a large reservoir of accessory genes and single-nucleotide variants, many of which are specific to individual human populations. The UHGG and UHGP collections will enable studies linking genotypes to phenotypes in the human gut microbiome.

485 citations


Journal ArticleDOI
TL;DR: In this article, the authors generated and analyzed two single-cell RNA sequencing datasets of the human minor salivary glands and gingiva (9 samples, 13,824 cells), identifying 50 cell clusters.
Abstract: Despite signs of infection-including taste loss, dry mouth and mucosal lesions such as ulcerations, enanthema and macules-the involvement of the oral cavity in coronavirus disease 2019 (COVID-19) is poorly understood. To address this, we generated and analyzed two single-cell RNA sequencing datasets of the human minor salivary glands and gingiva (9 samples, 13,824 cells), identifying 50 cell clusters. Using integrated cell normalization and annotation, we classified 34 unique cell subpopulations between glands and gingiva. Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) viral entry factors such as ACE2 and TMPRSS members were broadly enriched in epithelial cells of the glands and oral mucosae. Using orthogonal RNA and protein expression assessments, we confirmed SARS-CoV-2 infection in the glands and mucosae. Saliva from SARS-CoV-2-infected individuals harbored epithelial cells exhibiting ACE2 and TMPRSS expression and sustained SARS-CoV-2 infection. Acellular and cellular salivary fractions from asymptomatic individuals were found to transmit SARS-CoV-2 ex vivo. Matched nasopharyngeal and saliva samples displayed distinct viral shedding dynamics, and salivary viral burden correlated with COVID-19 symptoms, including taste loss. Upon recovery, this asymptomatic cohort exhibited sustained salivary IgG antibodies against SARS-CoV-2. Collectively, these data show that the oral cavity is an important site for SARS-CoV-2 infection and implicate saliva as a potential route of SARS-CoV-2 transmission.

417 citations


Journal ArticleDOI
TL;DR: In this paper, a tried and tested approach for genome curation using gEVAL, the genome evaluation browser, is described and recommended for assembly curation in a GEVAL-independent context to facilitate the uptake of genome curations in the wider community.
Abstract: Genome sequence assemblies provide the basis for our understanding of biology. Generating error-free assemblies is therefore the ultimate, but sadly still unachieved goal of a multitude of research projects. Despite the ever-advancing improvements in data generation, assembly algorithms and pipelines, no automated approach has so far reliably generated near error-free genome assemblies for eukaryotes. Whilst working towards improved datasets and fully automated pipelines, assembly evaluation and curation is actively used to bridge this shortcoming and significantly reduce the number of assembly errors. In addition to this increase in product value, the insights gained from assembly curation are fed back into the automated assembly strategy and contribute to notable improvements in genome assembly quality. We describe our tried and tested approach for assembly curation using gEVAL, the genome evaluation browser. We outline the procedures applied to genome curation using gEVAL and also our recommendations for assembly curation in a gEVAL-independent context to facilitate the uptake of genome curation in the wider community.

373 citations


Journal ArticleDOI
26 Apr 2021-Nature
TL;DR: In this article, a first-in-class catalytic inhibitor of METTL3 was identified and characterized, and a crystal structure of STM2457 in complex with METTL 3 and METTL14 was presented.
Abstract: N6-methyladenosine (m6A) is an abundant internal RNA modification1,2 that is catalysed predominantly by the METTL3–METTL14 methyltransferase complex3,4. The m6A methyltransferase METTL3 has been linked to the initiation and maintenance of acute myeloid leukaemia (AML), but the potential of therapeutic applications targeting this enzyme remains unknown5–7. Here we present the identification and characterization of STM2457, a highly potent and selective first-in-class catalytic inhibitor of METTL3, and a crystal structure of STM2457 in complex with METTL3–METTL14. Treatment of tumours with STM2457 leads to reduced AML growth and an increase in differentiation and apoptosis. These cellular effects are accompanied by selective reduction of m6A levels on known leukaemogenic mRNAs and a decrease in their expression consistent with a translational defect. We demonstrate that pharmacological inhibition of METTL3 in vivo leads to impaired engraftment and prolonged survival in various mouse models of AML, specifically targeting key stem cell subpopulations of AML. Collectively, these results reveal the inhibition of METTL3 as a potential therapeutic strategy against AML, and provide proof of concept that the targeting of RNA-modifying enzymes represents a promising avenue for anticancer therapy. Treatment with a specific inhibitor of the N6-methyladenosine methyltransferase METTL3 leads to reduced growth of cancer cells, indicating the potential of approaches targeting RNA-modifying enzymes for anticancer therapy.

362 citations


Journal ArticleDOI
Urmo Võsa1, Annique Claringbould2, Annique Claringbould3, Harm-Jan Westra1, Marc Jan Bonder1, Patrick Deelen, Biao Zeng4, Holger Kirsten5, Ashis Saha6, Roman Kreuzhuber7, Roman Kreuzhuber2, Roman Kreuzhuber8, Seyhan Yazar9, Harm Brugge1, Roy Oelen1, Dylan H. de Vries1, Monique G. P. van der Wijst1, Silva Kasela10, Natalia Pervjakova10, Isabel Alves11, Marie-Julie Favé11, Mawusse Agbessi11, Mark W. Christiansen12, Rick Jansen13, Ilkka Seppälä, Lin Tong14, Alexander Teumer15, Katharina Schramm16, Gibran Hemani17, Joost Verlouw18, Hanieh Yaghootkar19, Hanieh Yaghootkar20, Hanieh Yaghootkar21, Reyhan Sönmez Flitman22, Reyhan Sönmez Flitman23, Andrew A. Brown24, Andrew A. Brown25, Viktorija Kukushkina10, Anette Kalnapenkis10, Sina Rüeger22, Eleonora Porcu22, Jaanika Kronberg10, Johannes Kettunen, Bernett Lee26, Futao Zhang27, Ting Qi27, Jose Alquicira Hernandez9, Wibowo Arindrarto28, Frank Beutner5, Peter A C 't Hoen29, Joyce B. J. van Meurs18, Jenny van Dongen13, Maarten van Iterson28, Morris A. Swertz, Julia Dmitrieva30, Mahmoud Elansary30, Benjamin P. Fairfax31, Michel Georges30, Bastiaan T. Heijmans28, Alex W. Hewitt32, Mika Kähönen, Yungil Kim6, Yungil Kim33, Julian C. Knight31, Peter Kovacs5, Knut Krohn5, Shuang Li1, Markus Loeffler5, Urko M. Marigorta4, Urko M. Marigorta34, Hailang Mei28, Yukihide Momozawa30, Martina Müller-Nurasyid16, Matthias Nauck15, Michel G. Nivard35, Brenda W.J.H. Penninx13, Jonathan K. Pritchard36, Olli T. Raitakari37, Olli T. Raitakari38, Olaf Rötzschke26, Eline Slagboom28, Coen D.A. Stehouwer39, Michael Stumvoll5, Patrick F. Sullivan40, Joachim Thiery5, Anke Tönjes5, Jan H. Veldink41, Uwe Völker15, Robert Warmerdam1, Cisca Wijmenga1, Morris Swertz, Anand Kumar Andiappan26, Grant W. Montgomery27, Samuli Ripatti42, Markus Perola43, Zoltán Kutalik22, Emmanouil T. Dermitzakis24, Emmanouil T. Dermitzakis23, Sven Bergmann23, Sven Bergmann22, Timothy M. Frayling20, Holger Prokisch44, Habibul Ahsan14, Brandon L. Pierce14, Terho Lehtimäki, Dorret I. Boomsma13, Bruce M. Psaty12, Sina A. Gharib12, Philip Awadalla11, Lili Milani10, Willem H. Ouwehand45, Willem H. Ouwehand7, Willem H. Ouwehand8, Kate Downes8, Kate Downes7, Oliver Stegle2, Oliver Stegle46, Alexis Battle6, Peter M. Visscher27, Jian Yang47, Jian Yang27, Markus Scholz5, Joseph E. Powell9, Joseph E. Powell48, Greg Gibson4, Tõnu Esko10, Lude Franke1 
TL;DR: In this article, the authors performed cis-and trans-expression quantitative trait locus (eQTL) analyses using blood-derived expression from 31,684 individuals through the eQTLGen Consortium.
Abstract: Trait-associated genetic variants affect complex phenotypes primarily via regulatory mechanisms on the transcriptome. To investigate the genetics of gene expression, we performed cis- and trans-expression quantitative trait locus (eQTL) analyses using blood-derived expression from 31,684 individuals through the eQTLGen Consortium. We detected cis-eQTL for 88% of genes, and these were replicable in numerous tissues. Distal trans-eQTL (detected for 37% of 10,317 trait-associated variants tested) showed lower replication rates, partially due to low replication power and confounding by cell type composition. However, replication analyses in single-cell RNA-seq data prioritized intracellular trans-eQTL. Trans-eQTL exerted their effects via several mechanisms, primarily through regulation by transcription factors. Expression of 13% of the genes correlated with polygenic scores for 1,263 phenotypes, pinpointing potential drivers for those traits. In summary, this work represents a large eQTL resource, and its results serve as a starting point for in-depth interpretation of complex phenotypes.

Journal ArticleDOI
TL;DR: In this article, the authors performed single-cell transcriptome, surface proteome and T and B lymphocyte antigen receptor analyses of over 780,000 peripheral blood mononuclear cells from a cross-sectional cohort of 130 patients with varying severities of COVID-19.
Abstract: Analysis of human blood immune cells provides insights into the coordinated response to viral infections such as severe acute respiratory syndrome coronavirus 2, which causes coronavirus disease 2019 (COVID-19). We performed single-cell transcriptome, surface proteome and T and B lymphocyte antigen receptor analyses of over 780,000 peripheral blood mononuclear cells from a cross-sectional cohort of 130 patients with varying severities of COVID-19. We identified expansion of nonclassical monocytes expressing complement transcripts (CD16+C1QA/B/C+) that sequester platelets and were predicted to replenish the alveolar macrophage pool in COVID-19. Early, uncommitted CD34+ hematopoietic stem/progenitor cells were primed toward megakaryopoiesis, accompanied by expanded megakaryocyte-committed progenitors and increased platelet activation. Clonally expanded CD8+ T cells and an increased ratio of CD8+ effector T cells to effector memory T cells characterized severe disease, while circulating follicular helper T cells accompanied mild disease. We observed a relative loss of IgA2 in symptomatic disease despite an overall expansion of plasmablasts and plasma cells. Our study highlights the coordinated immune response that contributes to COVID-19 pathogenesis and reveals discrete cellular components that can be targeted for therapy.

Journal ArticleDOI
26 Aug 2021
TL;DR: This Primer provides an introduction to genome-wide association studies (GWAS), techniques for deriving functional inferences from the results and applications of GWAS in understanding disease risk and trait architecture, and discusses important ethical considerations when considering GWAS populations and data.
Abstract: Genome-wide association studies (GWAS) test hundreds of thousands of genetic variants across many genomes to find those statistically associated with a specific trait or disease. This methodology has generated a myriad of robust associations for a range of traits and diseases, and the number of associated variants is expected to grow steadily as GWAS sample sizes increase. GWAS results have a range of applications, such as gaining insight into a phenotype’s underlying biology, estimating its heritability, calculating genetic correlations, making clinical risk predictions, informing drug development programmes and inferring potential causal relationships between risk factors and health outcomes. In this Primer, we provide the reader with an introduction to GWAS, explaining their statistical basis and how they are conducted, describe state-of-the art approaches and discuss limitations and challenges, concluding with an overview of the current and future applications for GWAS results. Uffelmann et al. describe the key considerations and best practices for conducting genome-wide association studies (GWAS), techniques for deriving functional inferences from the results and applications of GWAS in understanding disease risk and trait architecture. The Primer also provides information on the best practices for data sharing and discusses important ethical considerations when considering GWAS populations and data.

Journal ArticleDOI
18 Feb 2021-Cell
TL;DR: The Gut Phage Database as discussed by the authors is a collection of ∼142,000 non-redundant viral genomes (>10 kb) obtained by mining a dataset of 28,060 globally distributed human gut metagenomes and 2,898 reference genomes of cultured gut bacteria.

Journal ArticleDOI
TL;DR: To aid the prioritisation of targets and inform on the potential impact of modulating a given target, evaluation of post-marketing adverse drug reactions and new curated information on target tractability and safety are added.
Abstract: The Open Targets Platform (https://www.targetvalidation.org/) provides users with a queryable knowledgebase and user interface to aid systematic target identification and prioritisation for drug discovery based upon underlying evidence. It is publicly available and the underlying code is open source. Since our last update two years ago, we have had 10 releases to maintain and continuously improve evidence for target-disease relationships from 20 different data sources. In addition, we have integrated new evidence from key datasets, including prioritised targets identified from genome-wide CRISPR knockout screens in 300 cancer models (Project Score), and GWAS/UK BioBank statistical genetic analysis evidence from the Open Targets Genetics Portal. We have evolved our evidence scoring framework to improve target identification. To aid the prioritisation of targets and inform on the potential impact of modulating a given target, we have added evaluation of post-marketing adverse drug reactions and new curated information on target tractability and safety. We have also developed the user interface and backend technologies to improve performance and usability. In this article, we describe the latest enhancements to the Platform, to address the fundamental challenge that developing effective and safe drugs is difficult and expensive.

Journal ArticleDOI
TL;DR: Open Targets Genetics offers tools that enable users to prioritise causal variants and genes at disease-associated loci and access systematic cross-disease and disease-molecular trait colocalization analysis across 92 cell types and tissues including the eQTL Catalogue.
Abstract: Open Targets Genetics (https://genetics.opentargets.org) is an open-access integrative resource that aggregates human GWAS and functional genomics data including gene expression, protein abundance, chromatin interaction and conformation data from a wide range of cell types and tissues to make robust connections between GWAS-associated loci, variants and likely causal genes. This enables systematic identification and prioritisation of likely causal variants and genes across all published trait-associated loci. In this paper, we describe the public resources we aggregate, the technology and analyses we use, and the functionality that the portal offers. Open Targets Genetics can be searched by variant, gene or study/phenotype. It offers tools that enable users to prioritise causal variants and genes at disease-associated loci and access systematic cross-disease and disease-molecular trait colocalization analysis across 92 cell types and tissues including the eQTL Catalogue. Data visualizations such as Manhattan-like plots, regional plots, credible sets overlap between studies and PheWAS plots enable users to explore GWAS signals in depth. The integrated data is made available through the web portal, for bulk download and via a GraphQL API, and the software is open source. Applications of this integrated data include identification of novel targets for drug discovery and drug repurposing.

Journal ArticleDOI
TL;DR: In this paper, cell-type-specific expression of ACE2, TMPRSS2 and CTSL across 107 single-cell RNA-sequencing studies from different tissues was assessed.
Abstract: Angiotensin-converting enzyme 2 (ACE2) and accessory proteases (TMPRSS2 and CTSL) are needed for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) cellular entry, and their expression may shed light on viral tropism and impact across the body. We assessed the cell-type-specific expression of ACE2, TMPRSS2 and CTSL across 107 single-cell RNA-sequencing studies from different tissues. ACE2, TMPRSS2 and CTSL are coexpressed in specific subsets of respiratory epithelial cells in the nasal passages, airways and alveoli, and in cells from other organs associated with coronavirus disease 2019 (COVID-19) transmission or pathology. We performed a meta-analysis of 31 lung single-cell RNA-sequencing studies with 1,320,896 cells from 377 nasal, airway and lung parenchyma samples from 228 individuals. This revealed cell-type-specific associations of age, sex and smoking with expression levels of ACE2, TMPRSS2 and CTSL. Expression of entry factors increased with age and in males, including in airway secretory cells and alveolar type 2 cells. Expression programs shared by ACE2+TMPRSS2+ cells in nasal, lung and gut tissues included genes that may mediate viral entry, key immune functions and epithelial-macrophage cross-talk, such as genes involved in the interleukin-6, interleukin-1, tumor necrosis factor and complement pathways. Cell-type-specific expression patterns may contribute to the pathogenesis of COVID-19, and our work highlights putative molecular pathways for therapeutic intervention.

Journal ArticleDOI
David V. Conti1, Burcu F. Darst1, Lilit C. Moss1, Edward J. Saunders2  +251 moreInstitutions (100)
TL;DR: This paper conducted a meta-analysis of prostate cancer genome-wide association studies (107,247 cases and 127,006 controls) and identified 86 new genetic risk variants independently associated with prostate cancer risk, bringing the total to 269 known risk variants.
Abstract: Prostate cancer is a highly heritable disease with large disparities in incidence rates across ancestry populations. We conducted a multiancestry meta-analysis of prostate cancer genome-wide association studies (107,247 cases and 127,006 controls) and identified 86 new genetic risk variants independently associated with prostate cancer risk, bringing the total to 269 known risk variants. The top genetic risk score (GRS) decile was associated with odds ratios that ranged from 5.06 (95% confidence interval (CI), 4.84–5.29) for men of European ancestry to 3.74 (95% CI, 3.36–4.17) for men of African ancestry. Men of African ancestry were estimated to have a mean GRS that was 2.18-times higher (95% CI, 2.14–2.22), and men of East Asian ancestry 0.73-times lower (95% CI, 0.71–0.76), than men of European ancestry. These findings support the role of germline variation contributing to population differences in prostate cancer risk, with the GRS offering an approach for personalized risk prediction.

Journal ArticleDOI
Stefan C. Dentro1, Stefan C. Dentro2, Stefan C. Dentro3, Ignaty Leshchiner4, Kerstin Haase1, Maxime Tarabichi1, Maxime Tarabichi2, Jeff Wintersinger5, Amit G. Deshwar5, Kaixian Yu6, Yulia Rubanova5, Geoff Macintyre7, Jonas Demeulemeester1, Jonas Demeulemeester8, Ignacio Vázquez-García, Kortine Kleinheinz9, Kortine Kleinheinz10, Dimitri Livitz4, Salem Malikic, Nilgun Donmez11, Nilgun Donmez12, Subhajit Sengupta13, Pavana Anur14, Clemency Jolly1, Marek Cmero15, Marek Cmero16, Daniel Rosebrock4, Steven E. Schumacher4, Yu Fan6, Matthew Fittall1, Ruben M. Drews7, Xiaotong Yao17, Thomas B.K. Watkins1, Juhee Lee18, Matthias Schlesner10, Hongtu Zhu6, David J. Adams2, Nicholas McGranahan19, Charles Swanton1, Charles Swanton19, Gad Getz, Paul C. Boutros5, Paul C. Boutros20, Paul C. Boutros21, Marcin Imielinski17, Rameen Beroukhim22, Rameen Beroukhim4, S. Cenk Sahinalp, Yuan Ji13, Yuan Ji23, Martin Peifer24, Inigo Martincorena2, Florian Markowetz7, Ville Mustonen25, Ke Yuan26, Ke Yuan7, Moritz Gerstung27, Moritz Gerstung2, Paul T. Spellman14, Wenyi Wang6, Quaid Morris, David C. Wedge28, David C. Wedge3, Peter Van Loo1, Santiago Gonzalez, David D.L. Bowtell, Peter J. Campbell, Shaolong Cao, Elizabeth L. Christie, Yupeng Cun, Kevin J. Dawson, Roland Eils, Dale W. Garsed, Gavin Ha, Lara Jerman, Henry Lee-Six, Thomas J. Mitchell, Layla Oesper, Myron Peto, Benjamin J. Raphael, Adriana Salcedo, Ruian Shi, Seung Jun Shin, Lincoln Stein, Oliver Spiro, Shankar Vembu, David A. Wheeler, Tsun-Po Yang 
15 Apr 2021-Cell
TL;DR: In this article, the authors extensively characterize intra-tumor heterogeneity (ITH) across whole-genome sequences of 2,658 cancer samples spanning 38 cancer types and identify cancer type-specific subclonal patterns of driver gene mutations, fusions, structural variants, and copy number alterations.

Journal ArticleDOI
TL;DR: In this paper, the authors performed an updated genome-wide AD meta-analysis, which identified 37 risk loci, including new associations near CCDC6, TSPAN14, NCK2 and SPRED2.
Abstract: Genome-wide association studies have discovered numerous genomic loci associated with Alzheimer's disease (AD); yet the causal genes and variants are incompletely identified. We performed an updated genome-wide AD meta-analysis, which identified 37 risk loci, including new associations near CCDC6, TSPAN14, NCK2 and SPRED2. Using three SNP-level fine-mapping methods, we identified 21 SNPs with >50% probability each of being causally involved in AD risk and others strongly suggested by functional annotation. We followed this with colocalization analyses across 109 gene expression quantitative trait loci datasets and prioritization of genes by using protein interaction networks and tissue-specific expression. Combining this information into a quantitative score, we found that evidence converged on likely causal genes, including the above four genes, and those at previously discovered AD loci, including BIN1, APH1B, PTK2B, PILRA and CASS4.

Journal ArticleDOI
TL;DR: The Polygenic Score (PGS) catalog as discussed by the authors is an open resource of published scores (including variants, alleles and weights) and consistently curated metadata required for reproducibility and independent applications.
Abstract: We present the Polygenic Score (PGS) Catalog ( https://www.PGSCatalog.org ), an open resource of published scores (including variants, alleles and weights) and consistently curated metadata required for reproducibility and independent applications. The PGS Catalog has capabilities for user deposition, expert curation and programmatic access, thus providing the community with a platform for PGS dissemination, research and translation.

Journal ArticleDOI
22 Jan 2021-Science
TL;DR: In this paper, the transcriptomes of more than 500,000 single cells from developing human fetal skin, healthy adult skin, and adult skin with atopic dermatitis and psoriasis were compared across development, homeostasis, and disease.
Abstract: The skin confers biophysical and immunological protection through a complex cellular network established early in embryonic development. We profiled the transcriptomes of more than 500,000 single cells from developing human fetal skin, healthy adult skin, and adult skin with atopic dermatitis and psoriasis. We leveraged these datasets to compare cell states across development, homeostasis, and disease. Our analysis revealed an enrichment of innate immune cells in skin during the first trimester and clonal expansion of disease-associated lymphocytes in atopic dermatitis and psoriasis. We uncovered and validated in situ a reemergence of prenatal vascular endothelial cell and macrophage cellular programs in atopic dermatitis and psoriasis lesional skin. These data illustrate the dynamism of cutaneous immunity and provide opportunities for targeting pathological developmental programs in inflammatory skin diseases.

Journal ArticleDOI
04 Mar 2021-Nature
TL;DR: In this paper, the authors used whole-genome sequencing of clonal cell isolates that developed chemotherapeutic resistance to show that chromothripsis is a major driver of circular extrachromosomal DNA (ecDNA) amplification through mechanisms that depend on poly(ADP-ribose) polymerases (PARP) and the catalytic subunit of DNA-dependent protein kinase (DNA-PKcs).
Abstract: Focal chromosomal amplification contributes to the initiation of cancer by mediating overexpression of oncogenes1–3, and to the development of cancer therapy resistance by increasing the expression of genes whose action diminishes the efficacy of anti-cancer drugs. Here we used whole-genome sequencing of clonal cell isolates that developed chemotherapeutic resistance to show that chromothripsis is a major driver of circular extrachromosomal DNA (ecDNA) amplification (also known as double minutes) through mechanisms that depend on poly(ADP-ribose) polymerases (PARP) and the catalytic subunit of DNA-dependent protein kinase (DNA-PKcs). Longitudinal analyses revealed that a further increase in drug tolerance is achieved by structural evolution of ecDNAs through additional rounds of chromothripsis. In situ Hi-C sequencing showed that ecDNAs preferentially tether near chromosome ends, where they re-integrate when DNA damage is present. Intrachromosomal amplifications that formed initially under low-level drug selection underwent continuing breakage–fusion–bridge cycles, generating amplicons more than 100 megabases in length that became trapped within interphase bridges and then shattered, thereby producing micronuclei whose encapsulated ecDNAs are substrates for chromothripsis. We identified similar genome rearrangement profiles linked to localized gene amplification in human cancers with acquired drug resistance or oncogene amplifications. We propose that chromothripsis is a primary mechanism that accelerates genomic DNA rearrangement and amplification into ecDNA and enables rapid acquisition of tolerance to altered growth conditions. Chromothripsis—a process during which chromosomes are ‘shattered’—drives the evolution of gene amplification and subsequent drug resistance in cancer cells.

Journal ArticleDOI
28 Apr 2021-Nature
TL;DR: NanoSeq as discussed by the authors is a duplex sequencing protocol with error rates of less than five errors per billion base pairs in single DNA molecules from cell populations, enabling the study of somatic mutations in any tissue independently of clonality.
Abstract: Somatic mutations drive the development of cancer and may contribute to ageing and other diseases1,2. Despite their importance, the difficulty of detecting mutations that are only present in single cells or small clones has limited our knowledge of somatic mutagenesis to a minority of tissues. Here, to overcome these limitations, we developed nanorate sequencing (NanoSeq), a duplex sequencing protocol with error rates of less than five errors per billion base pairs in single DNA molecules from cell populations. This rate is two orders of magnitude lower than typical somatic mutation loads, enabling the study of somatic mutations in any tissue independently of clonality. We used this single-molecule sensitivity to study somatic mutations in non-dividing cells across several tissues, comparing stem cells to differentiated cells and studying mutagenesis in the absence of cell division. Differentiated cells in blood and colon displayed remarkably similar mutation loads and signatures to their corresponding stem cells, despite mature blood cells having undergone considerably more divisions. We then characterized the mutational landscape of post-mitotic neurons and polyclonal smooth muscle, confirming that neurons accumulate somatic mutations at a constant rate throughout life without cell division, with similar rates to mitotically active tissues. Together, our results suggest that mutational processes that are independent of cell division are important contributors to somatic mutagenesis. We anticipate that the ability to reliably detect mutations in single DNA molecules could transform our understanding of somatic mutagenesis and enable non-invasive studies on large-scale cohorts. NanoSeq is used to detect mutations in single DNA molecules and analyses show that mutational processes that are independent of cell division are important contributors to somatic mutagenesis.

Journal ArticleDOI
Ji Chen1, Ji Chen2, Cassandra N. Spracklen3, Cassandra N. Spracklen4  +475 moreInstitutions (146)
TL;DR: This paper aggregated genome-wide association studies comprising up to 281,416 individuals without diabetes (30% non-European ancestry) for whom fasting glucose, 2-h glucose after an oral glucose challenge, glycated hemoglobin and fasting insulin data were available.
Abstract: Glycemic traits are used to diagnose and monitor type 2 diabetes and cardiometabolic health. To date, most genetic studies of glycemic traits have focused on individuals of European ancestry. Here we aggregated genome-wide association studies comprising up to 281,416 individuals without diabetes (30% non-European ancestry) for whom fasting glucose, 2-h glucose after an oral glucose challenge, glycated hemoglobin and fasting insulin data were available. Trans-ancestry and single-ancestry meta-analyses identified 242 loci (99 novel; P < 5 × 10-8), 80% of which had no significant evidence of between-ancestry heterogeneity. Analyses restricted to individuals of European ancestry with equivalent sample size would have led to 24 fewer new loci. Compared with single-ancestry analyses, equivalent-sized trans-ancestry fine-mapping reduced the number of estimated variants in 99% credible sets by a median of 37.5%. Genomic-feature, gene-expression and gene-set analyses revealed distinct biological signatures for each trait, highlighting different underlying biological pathways. Our results increase our understanding of diabetes pathophysiology by using trans-ancestry studies for improved power and resolution.

Journal ArticleDOI
TL;DR: In this article, the authors compared the incidence of VAP and secondary infections using a combination of microbial culture and a TaqMan multi-pathogen array, and determined the lung microbiome composition using 16S RNA analysis in a subset of samples.
Abstract: Pandemic COVID-19 caused by the coronavirus SARS-CoV-2 has a high incidence of patients with severe acute respiratory syndrome (SARS). Many of these patients require admission to an intensive care unit (ICU) for invasive ventilation and are at significant risk of developing a secondary, ventilator-associated pneumonia (VAP). To study the incidence of VAP and bacterial lung microbiome composition of ventilated COVID-19 and non-COVID-19 patients. In this retrospective observational study, we compared the incidence of VAP and secondary infections using a combination of microbial culture and a TaqMan multi-pathogen array. In addition, we determined the lung microbiome composition using 16S RNA analysis in a subset of samples. The study involved 81 COVID-19 and 144 non-COVID-19 patients receiving invasive ventilation in a single University teaching hospital between March 15th 2020 and August 30th 2020. COVID-19 patients were significantly more likely to develop VAP than patients without COVID (Cox proportional hazard ratio 2.01 95% CI 1.14–3.54, p = 0.0015) with an incidence density of 28/1000 ventilator days versus 13/1000 for patients without COVID (p = 0.009). Although the distribution of organisms causing VAP was similar between the two groups, and the pulmonary microbiome was similar, we identified 3 cases of invasive aspergillosis amongst the patients with COVID-19 but none in the non-COVID-19 cohort. Herpesvirade activation was also numerically more frequent amongst patients with COVID-19. COVID-19 is associated with an increased risk of VAP, which is not fully explained by the prolonged duration of ventilation. The pulmonary dysbiosis caused by COVID-19, and the causative organisms of secondary pneumonia observed are similar to that seen in critically ill patients ventilated for other reasons.

Journal ArticleDOI
09 Sep 2021-Nature
TL;DR: The cellular landscape of the human intestinal tract is dynamic throughout life, developing in utero and changing in response to functional requirements and environmental exposures as discussed by the authors, using single-cell RNA sequencing and antigen receptor analysis of almost half a million cells from up to 5 anatomical regions of the developing and up to 11 distinct anatomical regions in the healthy human gut.
Abstract: The cellular landscape of the human intestinal tract is dynamic throughout life, developing in utero and changing in response to functional requirements and environmental exposures. Here, to comprehensively map cell lineages, we use single-cell RNA sequencing and antigen receptor analysis of almost half a million cells from up to 5 anatomical regions in the developing and up to 11 distinct anatomical regions in the healthy paediatric and adult human gut. This reveals the existence of transcriptionally distinct BEST4 epithelial cells throughout the human intestinal tract. Furthermore, we implicate IgG sensing as a function of intestinal tuft cells. We describe neural cell populations in the developing enteric nervous system, and predict cell-type-specific expression of genes associated with Hirschsprung’s disease. Finally, using a systems approach, we identify key cell players that drive the formation of secondary lymphoid tissue in early human development. We show that these programs are adopted in inflammatory bowel disease to recruit and retain immune cells at the site of inflammation. This catalogue of intestinal cells will provide new insights into cellular programs in development, homeostasis and disease. Cells from embryonic, fetal, paediatric and adult human intestinal tissue are analysed at different locations along the intestinal tract to construct a single-cell atlas of the developing and adult human intestinal tract, encompassing all cell lineages.

Journal ArticleDOI
TL;DR: In this article, a broad collection of approaches ranging from batch correction of individual omics datasets to association of chromatin accessibility and genetic variation with transcription are reviewed, as the number of single-cell experiments with multiple data modalities increases.
Abstract: The development of single-cell multimodal assays provides a powerful tool for investigating multiple dimensions of cellular heterogeneity, enabling new insights into development, tissue homeostasis and disease. A key challenge in the analysis of single-cell multimodal data is to devise appropriate strategies for tying together data across different modalities. The term ‘data integration’ has been used to describe this task, encompassing a broad collection of approaches ranging from batch correction of individual omics datasets to association of chromatin accessibility and genetic variation with transcription. Although existing integration strategies exploit similar mathematical ideas, they typically have distinct goals and rely on different principles and assumptions. Consequently, new definitions and concepts are needed to contextualize existing methods and to enable development of new methods. As the number of single-cell experiments with multiple data modalities increases, Argelaguet and colleagues review the concepts and challenges of data integration.

Journal ArticleDOI
TL;DR: In this paper, the authors dissect the signaling pathways that determine cell fate of the epithelial lineages in the lumenal and glandular microenvironments of the endometrium.
Abstract: The endometrium, the mucosal lining of the uterus, undergoes dynamic changes throughout the menstrual cycle in response to ovarian hormones. We have generated dense single-cell and spatial reference maps of the human uterus and three-dimensional endometrial organoid cultures. We dissect the signaling pathways that determine cell fate of the epithelial lineages in the lumenal and glandular microenvironments. Our benchmark of the endometrial organoids reveals the pathways and cell states regulating differentiation of the secretory and ciliated lineages both in vivo and in vitro. In vitro downregulation of WNT or NOTCH pathways increases the differentiation efficiency along the secretory and ciliated lineages, respectively. We utilize our cellular maps to deconvolute bulk data from endometrial cancers and endometriotic lesions, illuminating the cell types dominating in each of these disorders. These mechanistic insights provide a platform for future development of treatments for common conditions including endometriosis and endometrial carcinoma. Single-cell and spatial transcriptomic profiling of the human endometrium highlights pathways governing the proliferative and secretory phases of the menstrual cycle. Analyses of endometrial organoids show that WNT and NOTCH signaling modulate differentiation into the secretory and ciliated epithelial lineages, respectively.

Journal ArticleDOI
TL;DR: This tutorial provides a hands-on guide for experimentalists interested in analyzing their data as well as an overview for bioinformaticians seeking to develop new computational methods.
Abstract: Single-cell RNA sequencing (scRNA-seq) is a popular and powerful technology that allows you to profile the whole transcriptome of a large number of individual cells. However, the analysis of the large volumes of data generated from these experiments requires specialized statistical and computational methods. Here we present an overview of the computational workflow involved in processing scRNA-seq data. We discuss some of the most common tasks and the tools available for addressing central biological questions. In this article and our companion website ( https://scrnaseq-course.cog.sanger.ac.uk/website/index.html ), we provide guidelines regarding best practices for performing computational analyses. This tutorial provides a hands-on guide for experimentalists interested in analyzing their data as well as an overview for bioinformaticians seeking to develop new computational methods.