Showing papers by "Wellcome Trust Sanger Institute published in 2020"

PDF

Open Access

Journal Article•DOI•

The mutational constraint spectrum quantified from variation in 141,456 humans

[...]

Konrad J. Karczewski¹, Laurent C. Francioli¹, Grace Tiao¹, Beryl B. Cummings¹, Jessica Alföldi¹, Qingbo Wang¹, Ryan L. Collins¹, Kristen M. Laricchia¹, Andrea Ganna¹, Daniel P. Birnbaum¹, Laura D. Gauthier¹, Harrison Brand¹, Matthew Solomonson¹, Nicholas A. Watts¹, Daniel R. Rhodes², Moriel Singer-Berk¹, Eleina M. England¹, Eleanor G. Seaby¹, Jack A. Kosmicki¹, Raymond K. Walters¹, Katherine Tashman¹, Yossi Farjoun¹, Eric Banks¹, Timothy Poterba¹, Arcturus Wang¹, Cotton Seed¹, Nicola Whiffin¹, Jessica X. Chong³, Kaitlin E. Samocha⁴, Emma Pierce-Hoffman¹, Zachary Zappala¹, Anne H. O’Donnell-Luria¹, Eric Vallabh Minikel¹, Ben Weisburd¹, Monkol Lek⁵, James S. Ware¹, Christopher Vittal⁶, Irina M. Armean¹, Louis Bergelson¹, Kristian Cibulskis¹, Kristen M. Connolly¹, Miguel Covarrubias¹, Stacey Donnelly¹, Steven Ferriera¹, Stacey Gabriel¹, Jeff Gentry¹, Namrata Gupta¹, Thibault Jeandet¹, Diane Kaplan¹, Christopher Llanwarne¹, Ruchi Munshi¹, Sam Novod¹, Nikelle Petrillo¹, David Roazen¹, Valentin Ruano-Rubio¹, Andrea Saltzman¹, Molly Schleicher¹, Jose Soto¹, Kathleen Tibbetts¹, Charlotte Tolonen¹, Gordon Wade¹, Michael E. Talkowski¹, Benjamin M. Neale¹, Mark J. Daly¹, Daniel G. MacArthur¹ - Show less +61 more•Institutions (6)

Broad Institute¹, Queen Mary University of London², University of Washington³, Wellcome Trust Sanger Institute⁴, Yale University⁵, Harvard University⁶

27 May 2020-Nature

TL;DR: A catalogue of predicted loss-of-function variants in 125,748 whole-exome and 15,708 whole-genome sequencing datasets from the Genome Aggregation Database (gnomAD) reveals the spectrum of mutational constraints that affect these human protein-coding genes.

...read moreread less

Abstract: Genetic variants that inactivate protein-coding genes are a powerful source of information about the phenotypic consequences of gene disruption: genes that are crucial for the function of an organism will be depleted of such variants in natural populations, whereas non-essential genes will tolerate their accumulation. However, predicted loss-of-function variants are enriched for annotation errors, and tend to be found at extremely low frequencies, so their analysis requires careful variant annotation and very large sample sizes1. Here we describe the aggregation of 125,748 exomes and 15,708 genomes from human sequencing studies into the Genome Aggregation Database (gnomAD). We identify 443,769 high-confidence predicted loss-of-function variants in this cohort after filtering for artefacts caused by sequencing and annotation errors. Using an improved model of human mutation rates, we classify human protein-coding genes along a spectrum that represents tolerance to inactivation, validate this classification using data from model organisms and engineered human cells, and show that it can be used to improve the power of gene discovery for both common and rare diseases. A catalogue of predicted loss-of-function variants in 125,748 whole-exome and 15,708 whole-genome sequencing datasets from the Genome Aggregation Database (gnomAD) reveals the spectrum of mutational constraints that affect these human protein-coding genes.

...read moreread less

4,913 citations

Journal Article•DOI•

SARS-CoV-2 entry factors are highly expressed in nasal epithelial cells together with innate immune genes.

[...]

Waradon Sungnak¹, Ni Huang¹, Christophe Bécavin², Marijn Berg³, Rachel Queen⁴, Monika Litvinukova⁵, Monika Litvinukova¹, Carlos Talavera-López¹, Henrike Maatz⁵, Daniel Reichart⁶, Fotios Sampaziotis⁷, Kaylee B Worlock⁸, Masahiro Yoshida⁸, Josephine Barnes⁸ - Show less +10 more•Institutions (8)

Wellcome Trust Sanger Institute¹, Centre national de la recherche scientifique², University of Groningen³, Newcastle University⁴, Max Delbrück Center for Molecular Medicine⁵, Harvard University⁶, University of Cambridge⁷, University College London⁸

23 Apr 2020-Nature Medicine

TL;DR: In this paper, the expression of viral entry-associated genes in single-cell RNA-sequencing data from multiple tissues from healthy human donors was investigated, and co-detected these transcripts in specific respiratory, corneal and intestinal epithelial cells, potentially explaining the high efficiency of SARS-CoV-2 transmission.

...read moreread less

Abstract: We investigated SARS-CoV-2 potential tropism by surveying expression of viral entry-associated genes in single-cell RNA-sequencing data from multiple tissues from healthy human donors. We co-detected these transcripts in specific respiratory, corneal and intestinal epithelial cells, potentially explaining the high efficiency of SARS-CoV-2 transmission. These genes are co-expressed in nasal epithelial cells with genes involved in innate immunity, highlighting the cells' potential role in initial viral infection, spread and clearance. The study offers a useful resource for further lines of inquiry with valuable clinical samples from COVID-19 patients and we provide our data in a comprehensive, open and user-friendly fashion at www.covid19cellatlas.org.

...read moreread less

2,024 citations

Journal Article•DOI•

SARS-CoV-2 Entry Genes Are Most Highly Expressed in Nasal Goblet and Ciliated Cells within Human Airways

[...]

Waradon Sungnak¹, Ni Huang¹, Christophe Bécavin², Marijn Berg³•Institutions (3)

Wellcome Trust Sanger Institute¹, Centre national de la recherche scientifique², University of Groningen³

13 Mar 2020-arXiv: Cell Behavior

TL;DR: Analysis of the compendium of data points to a particularly relevant role for nasal goblet and ciliated cells as early viral targets and potential reservoirs of SARS-CoV-2 infection and underscores the importance of the availability of the Human Cell Atlas as a reference dataset.

...read moreread less

Abstract: The SARS-CoV-2 coronavirus, the etiologic agent responsible for COVID-19 coronavirus disease, is a global threat. To better understand viral tropism, we assessed the RNA expression of the coronavirus receptor, ACE2, as well as the viral S protein priming protease TMPRSS2 thought to govern viral entry in single-cell RNA-sequencing (scRNA-seq) datasets from healthy individuals generated by the Human Cell Atlas consortium. We found that ACE2, as well as the protease TMPRSS2, are differentially expressed in respiratory and gut epithelial cells. In-depth analysis of epithelial cells in the respiratory tree reveals that nasal epithelial cells, specifically goblet/secretory cells and ciliated cells, display the highest ACE2 expression of all the epithelial cells analyzed. The skewed expression of viral receptors/entry-associated proteins towards the upper airway may be correlated with enhanced transmissivity. Finally, we showed that many of the top genes associated with ACE2 airway epithelial expression are innate immune-associated, antiviral genes, highly enriched in the nasal epithelial cells. This association with immune pathways might have clinical implications for the course of infection and viral pathology, and highlights the specific significance of nasal epithelia in viral infection. Our findings underscore the importance of the availability of the Human Cell Atlas as a reference dataset. In this instance, analysis of the compendium of data points to a particularly relevant role for nasal goblet and ciliated cells as early viral targets and potential reservoirs of SARS-CoV-2 infection. This, in turn, serves as a biological framework for dissecting viral transmission and developing clinical strategies for prevention and therapy.

...read moreread less

1,602 citations

Journal Article•DOI•

Pan-cancer analysis of whole genomes

[...]

Peter J. Campbell¹, Gad Getz², Jan O. Korbel³, Joshua M. Stuart⁴ +1329 more•Institutions (238)

06 Feb 2020-Nature

TL;DR: The flagship paper of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes Consortium describes the generation of the integrative analyses of 2,658 whole-cancer genomes and their matching normal tissues across 38 tumour types, the structures for international data sharing and standardized analyses, and the main scientific findings from across the consortium studies.

...read moreread less

Abstract: Cancer is driven by genetic change, and the advent of massively parallel sequencing has enabled systematic documentation of this variation at the whole-genome scale1,2,3. Here we report the integrative analysis of 2,658 whole-cancer genomes and their matching normal tissues across 38 tumour types from the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA). We describe the generation of the PCAWG resource, facilitated by international data sharing using compute clouds. On average, cancer genomes contained 4–5 driver mutations when combining coding and non-coding genomic elements; however, in around 5% of cases no drivers were identified, suggesting that cancer driver discovery is not yet complete. Chromothripsis, in which many clustered structural variants arise in a single catastrophic event, is frequently an early event in tumour evolution; in acral melanoma, for example, these events precede most somatic point mutations and affect several cancer-associated genes simultaneously. Cancers with abnormal telomere maintenance often originate from tissues with low replicative activity and show several mechanisms of preventing telomere attrition to critical levels. Common and rare germline variants affect patterns of somatic mutation, including point mutations, structural variants and somatic retrotransposition. A collection of papers from the PCAWG Consortium describes non-coding mutations that drive cancer beyond those in the TERT promoter4; identifies new signatures of mutational processes that cause base substitutions, small insertions and deletions and structural variation5,6; analyses timings and patterns of tumour evolution7; describes the diverse transcriptional consequences of somatic mutation on splicing, expression levels, fusion genes and promoter activity8,9; and evaluates a range of more-specialized features of cancer genomes8,10,11,12,13,14,15,16,17,18.

...read moreread less

1,600 citations

Journal Article•DOI•

The Repertoire of Mutational Signatures in Human Cancer

[...]

Ludmil B. Alexandrov¹, Jaegil Kim², Nicholas J. Haradhvala³, Nicholas J. Haradhvala², Mi Ni Huang⁴, Alvin Wei Tian Ng⁴, Yang Wu⁴, Arnoud Boot⁴, Kyle R. Covington⁵, Dmitry A. Gordenin⁶, Erik N. Bergstrom¹, S M Ashiqul Islam¹, Nuria Lopez-Bigas⁷, Nuria Lopez-Bigas⁸, Leszek J. Klimczak⁶, John R. McPherson⁴, Sandro Morganella⁹, Radhakrishnan Sabarinathan⁷, Radhakrishnan Sabarinathan¹⁰, David A. Wheeler⁵, Ville Mustonen¹¹, Gad Getz, Steven G. Rozen⁴, Michael R. Stratton⁹ - Show less +20 more•Institutions (11)

University of California, San Diego¹, Broad Institute², Harvard University³, National University of Singapore⁴, Baylor College of Medicine⁵, National Institutes of Health⁶, Pompeu Fabra University⁷, Catalan Institution for Research and Advanced Studies⁸, Wellcome Trust Sanger Institute⁹, National Centre for Biological Sciences¹⁰, University of Helsinki¹¹

05 Feb 2020-Nature

TL;DR: The characterization of 4,645 whole-genome and 19,184 exome sequences, covering most types of cancer, identifies 81 single-base substitution, doublet- base substitution and small-insertion-and-deletion mutational signatures, providing a systematic overview of the mutational processes that contribute to cancer development.

...read moreread less

Abstract: Somatic mutations in cancer genomes are caused by multiple mutational processes, each of which generates a characteristic mutational signature1. Here, as part of the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium2 of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA), we characterized mutational signatures using 84,729,690 somatic mutations from 4,645 whole-genome and 19,184 exome sequences that encompass most types of cancer. We identified 49 single-base-substitution, 11 doublet-base-substitution, 4 clustered-base-substitution and 17 small insertion-and-deletion signatures. The substantial size of our dataset, compared with previous analyses3–15, enabled the discovery of new signatures, the separation of overlapping signatures and the decomposition of signatures into components that may represent associated—but distinct—DNA damage, repair and/or replication mechanisms. By estimating the contribution of each signature to the mutational catalogues of individual cancer genomes, we revealed associations of signatures to exogenous or endogenous exposures, as well as to defective DNA-maintenance processes. However, many signatures are of unknown cause. This analysis provides a systematic perspective on the repertoire of mutational processes that contribute to the development of human cancer. The characterization of 4,645 whole-genome and 19,184 exome sequences, covering most types of cancer, identifies 81 single-base substitution, doublet-base substitution and small-insertion-and-deletion mutational signatures, providing a systematic overview of the mutational processes that contribute to cancer development.

...read moreread less

1,521 citations

Journal Article•DOI•

CellPhoneDB: inferring cell-cell communication from combined expression of multi-subunit ligand-receptor complexes.

[...]

Mirjana Efremova¹, Miquel Vento-Tormo, Sarah A. Teichmann², Sarah A. Teichmann¹, Roser Vento-Tormo¹ - Show less +1 more•Institutions (2)

Wellcome Trust Sanger Institute¹, University of Cambridge²

26 Feb 2020-Nature Protocols

TL;DR: The structure and content of CellPhoneDB is outlined, procedures for inferring cell–cell communication networks from single-cell RNA sequencing data are provided and a practical step-by-step guide to help implement the protocol is presented.

...read moreread less

Abstract: Cell–cell communication mediated by ligand–receptor complexes is critical to coordinating diverse biological processes, such as development, differentiation and inflammation. To investigate how the context-dependent crosstalk of different cell types enables physiological processes to proceed, we developed CellPhoneDB, a novel repository of ligands, receptors and their interactions. In contrast to other repositories, our database takes into account the subunit architecture of both ligands and receptors, representing heteromeric complexes accurately. We integrated our resource with a statistical framework that predicts enriched cellular interactions between two cell types from single-cell transcriptomics data. Here, we outline the structure and content of our repository, provide procedures for inferring cell–cell communication networks from single-cell RNA sequencing data and present a practical step-by-step guide to help implement the protocol. CellPhoneDB v.2.0 is an updated version of our resource that incorporates additional functionalities to enable users to introduce new interacting molecules and reduces the time and resources needed to interrogate large datasets. CellPhoneDB v.2.0 is publicly available, both as code and as a user-friendly web interface; it can be used by both experts and researchers with little experience in computational genomics. In our protocol, we demonstrate how to evaluate meaningful biological interactions with CellPhoneDB v.2.0 using published datasets. This protocol typically takes ~2 h to complete, from installation to statistical analysis and visualization, for a dataset of ~10 GB, 10,000 cells and 19 cell types, and using five threads. CellPhoneDB combines an interactive database and a statistical framework for the exploration of ligand–receptor interactions inferred from single-cell transcriptomics measurements.

...read moreread less

1,392 citations

Journal Article•DOI•

Large-Scale Exome Sequencing Study Implicates Both Developmental and Functional Changes in the Neurobiology of Autism

[...]

F. Kyle Satterstrom¹, F. Kyle Satterstrom², Jack A. Kosmicki, Jiebiao Wang³ +198 more•Institutions (53)

06 Feb 2020-Cell

TL;DR: The largest exome sequencing study of autism spectrum disorder (ASD) to date, using an enhanced analytical framework to integrate de novo and case-control rare variation, identifies 102 risk genes at a false discovery rate of 0.1 or less, consistent with multiple paths to an excitatory-inhibitory imbalance underlying ASD.

...read moreread less

1,169 citations

Journal Article•DOI•

Identifying and removing haplotypic duplication in primary genome assemblies.

[...]

Dengfeng Guan¹, Dengfeng Guan², Shane A. McCarthy¹, Jonathan Wood³, Kerstin Howe³, Yadong Wang², Richard Durbin³, Richard Durbin¹ - Show less +4 more•Institutions (3)

University of Cambridge¹, Harbin Institute of Technology², Wellcome Trust Sanger Institute³

01 May 2020-Bioinformatics

TL;DR: A novel tool, purge_dups, is presented, that uses sequence similarity and read depth to automatically identify and remove both haplotigs and heterozygous overlaps and can reduce heter allele duplication and increase assembly continuity while maintaining completeness of the primary assembly.

...read moreread less

Abstract: Motivation Rapid development in long-read sequencing and scaffolding technologies is accelerating the production of reference-quality assemblies for large eukaryotic genomes. However, haplotype divergence in regions of high heterozygosity often results in assemblers creating two copies rather than one copy of a region, leading to breaks in contiguity and compromising downstream steps such as gene annotation. Several tools have been developed to resolve this problem. However, they either focus only on removing contained duplicate regions, also known as haplotigs, or fail to use all the relevant information and hence make errors. Results Here we present a novel tool, purge_dups, that uses sequence similarity and read depth to automatically identify and remove both haplotigs and heterozygous overlaps. In comparison with current tools, we demonstrate that purge_dups can reduce heterozygous duplication and increase assembly continuity while maintaining completeness of the primary assembly. Moreover, purge_dups is fully automatic and can easily be integrated into assembly pipelines. Availability and implementation The source code is written in C and is available at https://github.com/dfguan/purge_dups. Supplementary information Supplementary data are available at Bioinformatics online.

...read moreread less

728 citations

Journal Article•DOI•

Cells of the adult human heart.

[...]

Monika Litviňuková¹, Monika Litviňuková², Carlos Talavera-López³, Carlos Talavera-López¹, Henrike Maatz², Daniel Reichart⁴, Daniel Reichart⁵, Catherine L. Worth², Eric L. Lindberg², Masatoshi Kanda², Masatoshi Kanda⁶, Krzysztof Polanski¹, Matthias Heinig⁷, Michael Lee⁸, Emily R. Nadelmann⁴, Kenny Roberts¹, Liz Tuck¹, Eirini S. Fasouli¹, Daniel M. DeLaughter⁴, Barbara McDonough⁹, Barbara McDonough¹⁰, Barbara McDonough⁴, Hiroko Wakimoto⁴, Joshua M. Gorham⁴, Sara Samari⁸, Krishnaa T. Mahbubani¹¹, Kourosh Saeb-Parsy¹¹, Giannino Patone², Joseph J. Boyle⁸, Hongbo Zhang¹, Hongbo Zhang¹², Hao Zhang¹³, Anissa Viveiros¹³, Gavin Y. Oudit¹³, Omer Ali Bayraktar¹, Jonathan G. Seidman⁴, Christine E. Seidman¹⁰, Christine E. Seidman⁴, Christine E. Seidman⁹, Michela Noseda⁸, Michela Noseda¹⁴, Norbert Hubner, Sarah A. Teichmann¹, Sarah A. Teichmann¹¹ - Show less +40 more•Institutions (14)

Wellcome Trust Sanger Institute¹, Max Delbrück Center for Molecular Medicine², European Bioinformatics Institute³, Harvard University⁴, University of Hamburg⁵, Sapporo Medical University⁶, Technische Universität München⁷, National Institutes of Health⁸, Brigham and Women's Hospital⁹, Howard Hughes Medical Institute¹⁰, University of Cambridge¹¹, Sun Yat-sen University¹², University of Alberta¹³, British Heart Foundation¹⁴

24 Sep 2020-Nature

TL;DR: The state-of-the-art analyses of large-scale single-cell and single-nucleus transcriptomes are used to construct a cellular atlas of the human heart that will aid further research into cardiac physiology and disease and provides a valuable reference for future studies.

...read moreread less

Abstract: Cardiovascular disease is the leading cause of death worldwide. Advanced insights into disease mechanisms and therapeutic strategies require a deeper understanding of the molecular processes involved in the healthy heart. Knowledge of the full repertoire of cardiac cells and their gene expression profiles is a fundamental first step in this endeavour. Here, using state-of-the-art analyses of large-scale single-cell and single-nucleus transcriptomes, we characterize six anatomical adult heart regions. Our results highlight the cellular heterogeneity of cardiomyocytes, pericytes and fibroblasts, and reveal distinct atrial and ventricular subsets of cells with diverse developmental origins and specialized properties. We define the complexity of the cardiac vasculature and its changes along the arterio-venous axis. In the immune compartment, we identify cardiac-resident macrophages with inflammatory and protective transcriptional signatures. Furthermore, analyses of cell-to-cell interactions highlight different networks of macrophages, fibroblasts and cardiomyocytes between atria and ventricles that are distinct from those of skeletal muscle. Our human cardiac cell atlas improves our understanding of the human heart and provides a valuable reference for future studies.

...read moreread less

703 citations

Journal Article•DOI•

Eleven grand challenges in single-cell data science

[...]

David Lähnemann¹, David Lähnemann², Johannes Köster³, Johannes Köster¹, Ewa Szczurek⁴, Davis J. McCarthy⁵, Davis J. McCarthy⁶, Stephanie C. Hicks⁷, Mark D. Robinson⁸, Catalina A. Vallejos⁹, Catalina A. Vallejos¹⁰, Kieran R Campbell¹¹, Kieran R Campbell¹², Niko Beerenwinkel⁸, Niko Beerenwinkel¹³, Ahmed Mahfouz¹⁴, Ahmed Mahfouz¹⁵, Luca Pinello³, Luca Pinello¹⁶, Pavel Skums¹⁷, Alexandros Stamatakis¹⁸, Alexandros Stamatakis¹⁹, Camille Stephan Otto Attolini, Samuel Aparicio¹¹, Samuel Aparicio¹², Jasmijn A. Baaijens²⁰, Marleen Balvert²¹, Marleen Balvert²⁰, Buys de Barbanson²¹, Antonio Cappuccio²², Giacomo Corleone²³, Bas E. Dutilh²¹, Bas E. Dutilh²⁴, Maria Florescu²¹, Victor Guryev²⁵, Rens Holmer²⁶, Katharina Jahn⁸, Katharina Jahn¹³, Thamar Jessurun Lobo²⁵, Emma M. Keizer²⁶, Indu Khatri¹⁵, Szymon M. Kielbasa¹⁵, Jan O. Korbel, Alexey M. Kozlov¹⁸, Tzu Hao Kuo, Boudewijn P. F. Lelieveldt¹⁵, Boudewijn P. F. Lelieveldt¹⁴, Ion I. Mandoiu²⁷, John C. Marioni²⁸, John C. Marioni²⁹, John C. Marioni³⁰, Tobias Marschall³¹, Tobias Marschall³², Felix Mölder¹, Amir Niknejad³³, Lukasz Raczkowski⁴, Marcel J. T. Reinders¹⁴, Marcel J. T. Reinders¹⁵, Jeroen de Ridder²¹, Antoine-Emmanuel Saliba, Antonios Somarakis¹⁵, Oliver Stegle³⁰, Oliver Stegle³⁴, Fabian J. Theis, Huan Yang³⁵, Alexander Zelikovsky³⁶, Alexander Zelikovsky¹⁷, Alice C. McHardy, Benjamin J. Raphael³⁷, Sohrab P. Shah³⁸, Alexander Schönhuth²⁰, Alexander Schönhuth²¹ - Show less +68 more•Institutions (38)

University of Duisburg-Essen¹, University of Düsseldorf², Harvard University³, University of Warsaw⁴, St. Vincent's Institute of Medical Research⁵, University of Melbourne⁶, Johns Hopkins University⁷, Swiss Institute of Bioinformatics⁸, The Turing Institute⁹, Western General Hospital¹⁰, BC Cancer Agency¹¹, University of British Columbia¹², ETH Zurich¹³, Delft University of Technology¹⁴, Leiden University Medical Center¹⁵, Broad Institute¹⁶, Georgia State University¹⁷, Heidelberg Institute for Theoretical Studies¹⁸, Karlsruhe Institute of Technology¹⁹, Centrum Wiskunde & Informatica²⁰, Utrecht University²¹, University of Amsterdam²², Imperial College London²³, Radboud University Nijmegen²⁴, University Medical Center Groningen²⁵, Wageningen University and Research Centre²⁶, University of Connecticut²⁷, Wellcome Trust Sanger Institute²⁸, University of Cambridge²⁹, European Bioinformatics Institute³⁰, Max Planck Society³¹, Saarland University³², Zuse Institute Berlin³³, German Cancer Research Center³⁴, Leiden University³⁵, I.M. Sechenov First Moscow State Medical University³⁶, Princeton University³⁷, Memorial Sloan Kettering Cancer Center³⁸

07 Feb 2020-Genome Biology

TL;DR: This compendium is for established researchers, newcomers, and students alike, highlighting interesting and rewarding problems for the coming years in single-cell data science.

...read moreread less

Abstract: The recent boom in microfluidics and combinatorial indexing strategies, combined with low sequencing costs, has empowered single-cell sequencing technology. Thousands-or even millions-of cells analyzed in a single experiment amount to a data revolution in single-cell biology and pose unique data science problems. Here, we outline eleven challenges that will be central to bringing this emerging field of single-cell data science forward. For each challenge, we highlight motivating research questions, review prior work, and formulate open problems. This compendium is for established researchers, newcomers, and students alike, highlighting interesting and rewarding problems for the coming years.

...read moreread less

677 citations

Posted Content•DOI•

Towards complete and error-free genome assemblies of all vertebrate species

[...]

Arang Rhie¹, Shane A. McCarthy², Olivier Fedrigo³, Joana Damas⁴, Giulio Formenti³, Sergey Koren¹, Marcela Uliano-Silva², William Chow², Arkarachai Fungtammasan, Gregory Gedman³, Lindsey J. Cantin³, Françoise Thibaud-Nissen¹, Leanne Haggerty⁵, Chul Hee Lee⁶, Byung June Ko⁶, J. H. Kim⁶, Iliana Bista², Michelle Smith², Bettina Haase³, Jacquelyn Mountcastle³, Sylke Winkler⁷, Sadye Paez³, Jason T. Howard⁸, Sonja C. Vernes⁷, Tanya M. Lama⁹, Frank Grützner¹⁰, Wesley C. Warren¹¹, Christopher N. Balakrishnan¹², Dave W Burt¹³, Jimin George¹⁴, Matthew T. Biegler³, David Iorns¹⁵, Andrew Digby, Daryl Eason, Taylor Edwards¹⁶, Mark Wilkinson¹⁷, George F. Turner¹⁸, Axel Meyer¹⁹, Andreas F. Kautt¹⁹, Paolo Franchini¹⁹, H. William Detrich²⁰, Hannes Svardal²¹, Maximilian Wagner²², Gavin J. P. Naylor²³, Martin Pippel⁷, Milan Malinsky², Mark Mooney, Maria Simbirsky, Brett T. Hannigan, Trevor Pesout²⁴, Marlys L. Houck, Ann C Misuraca, Sarah B. Kingan²⁵, Richard Hall²⁵, Zev N. Kronenberg²⁵, Jonas Korlach²⁵, Ivan Sović²⁵, Christopher Dunn²⁵, Zemin Ning², Alex Hastie, Joyce V. Lee, Siddarth Selvaraj, Richard E. Green²⁴, Nicholas H. Putnam, Jay Ghurye²⁶, Erik Garrison²⁴, Ying Sims², Joanna Collins², Sarah Pelan², James Torrance², Alan Tracey², Jonathan Wood², Dengfeng Guan²⁷, Sarah E. London²⁸, David F. Clayton¹⁴, Claudio V. Mello²⁹, Samantha R. Friedrich²⁹, Peter V. Lovell²⁹, Ekaterina Osipova⁷, Farooq O. Al-Ajli³⁰, Simona Secomandi³¹, Heebal Kim⁶, Constantina Theofanopoulou³, Yang Zhou³², Robert S. Harris³³, Kateryna D. Makova³³, Paul Medvedev³³, Jinna Hoffman¹, Patrick Masterson¹, Karen Clark¹, Fergal J. Martin⁵, Kevin L. Howe⁵, Paul Flicek⁵, Brian P. Walenz¹, Woori Kwak, Hiram Clawson²⁴, Mark Diekhans²⁴, Luis R Nassar²⁴, Benedict Paten²⁴, Robert H. S. Kraus¹⁹, Harris A. Lewin⁴, Andrew J. Crawford³⁴, M. Thomas P. Gilbert³², Guojie Zhang³², Byrappa Venkatesh³⁵, Robert W. Murphy³⁶, Klaus-Peter Koepfli³⁷, Beth Shapiro²⁴, Warren E. Johnson³⁷, Federica Di Palma³⁸, Tomas Marques-Bonet³⁹, Emma C. Teeling⁴⁰, Tandy Warnow⁴¹, Jennifer A. Marshall Graves⁴², Oliver A. Ryder⁴³, David Haussler²⁴, Stephen J. O'Brien⁴⁴, Kerstin Howe², Eugene W. Myers⁴⁵, Richard Durbin², Adam M. Phillippy¹, Erich D. Jarvis³ - Show less +118 more•Institutions (45)

National Institutes of Health¹, Wellcome Trust Sanger Institute², Rockefeller University³, University of California, Davis⁴, European Bioinformatics Institute⁵, Seoul National University⁶, Max Planck Society⁷, Durham University⁸, University of Massachusetts Amherst⁹, University of Adelaide¹⁰, University of Missouri¹¹, East Carolina University¹², University of Queensland¹³, Queen Mary University of London¹⁴, Wellington Management Company¹⁵, University of Arizona¹⁶, Natural History Museum¹⁷, Bangor University¹⁸, University of Konstanz¹⁹, Northeastern University²⁰, Naturalis²¹, University of Graz²², Florida Museum of Natural History²³, University of California, Santa Cruz²⁴, Pacific Biosciences²⁵, University of Maryland, College Park²⁶, Harbin Institute of Technology²⁷, University of Chicago²⁸, Oregon Health & Science University²⁹, Monash University Malaysia Campus³⁰, University of Milan³¹, University of Copenhagen³², Pennsylvania State University³³, University of Los Andes³⁴, Agency for Science, Technology and Research³⁵, Royal Ontario Museum³⁶, Smithsonian Conservation Biology Institute³⁷, University of East Anglia³⁸, Pompeu Fabra University³⁹, University College Dublin⁴⁰, University of Illinois at Urbana–Champaign⁴¹, La Trobe University⁴², University of California, San Diego⁴³, UPRRP College of Natural Sciences⁴⁴, Dresden University of Technology⁴⁵

23 May 2020-bioRxiv

TL;DR: The Vertebrate Genomes Project is embarked on, an effort to generate high-quality, complete reference genomes for all ~70,000 extant vertebrate species and help enable a new era of discovery across the life sciences.

...read moreread less

Abstract: High-quality and complete reference genome assemblies are fundamental for the application of genomics to biology, disease, and biodiversity conservation. However, such assemblies are only available for a few non-microbial species. To address this issue, the international Genome 10K (G10K) consortium has worked over a five-year period to evaluate and develop cost-effective methods for assembling the most accurate and complete reference genomes to date. Here we summarize these developments, introduce a set of quality standards, and present lessons learned from sequencing and assembling 16 species representing major vertebrate lineages (mammals, birds, reptiles, amphibians, teleost fishes and cartilaginous fishes). We confirm that long-read sequencing technologies are essential for maximizing genome quality and that unresolved complex repeats and haplotype heterozygosity are major sources of error in assemblies. Our new assemblies identify and correct substantial errors in some of the best historical reference genomes. Adopting these lessons, we have embarked on the Vertebrate Genomes Project (VGP), an effort to generate high-quality, complete reference genomes for all ~70,000 extant vertebrate species and help enable a new era of discovery across the life sciences.

...read moreread less

Journal Article•DOI•

The evolutionary history of 2,658 cancers

[...]

Moritz Gerstung¹, Moritz Gerstung², Clemency Jolly³, Ignaty Leshchiner⁴, Stefan C. Dentro⁵, Stefan C. Dentro³, Stefan C. Dentro¹, Santiago Gonzalez², Daniel Rosebrock⁴, Thomas J. Mitchell⁶, Thomas J. Mitchell¹, Yulia Rubanova⁷, Pavana Anur⁸, Kaixian Yu⁹, Maxime Tarabichi³, Maxime Tarabichi¹, Amit G. Deshwar⁷, Jeff Wintersinger⁷, Kortine Kleinheinz¹⁰, Kortine Kleinheinz¹¹, Ignacio Vázquez-García⁶, Ignacio Vázquez-García¹, Kerstin Haase³, Lara Jerman¹², Lara Jerman², Subhajit Sengupta¹³, Geoff Macintyre⁶, Salem Malikic¹⁴, Salem Malikic¹⁵, Nilgun Donmez¹⁴, Nilgun Donmez¹⁵, Dimitri Livitz⁴, Marek Cmero¹⁶, Marek Cmero¹⁷, Jonas Demeulemeester³, Jonas Demeulemeester¹⁸, Steven E. Schumacher⁴, Yu Fan⁹, Xiaotong Yao¹⁹, Juhee Lee²⁰, Matthias Schlesner¹⁰, Paul C. Boutros²¹, Paul C. Boutros²², Paul C. Boutros⁷, David D.L. Bowtell²³, Hongtu Zhu⁹, Gad Getz, Marcin Imielinski¹⁹, Rameen Beroukhim²⁴, Rameen Beroukhim⁴, S. Cenk Sahinalp¹⁴, S. Cenk Sahinalp²⁵, Yuan Ji¹³, Yuan Ji²⁶, Martin Peifer²⁷, Florian Markowetz⁶, Ville Mustonen²⁸, Ke Yuan⁶, Ke Yuan²⁹, Wenyi Wang⁹, Quaid Morris⁷, Paul T. Spellman⁸, David C. Wedge⁵, Peter Van Loo¹⁸, Peter Van Loo³ - Show less +61 more•Institutions (29)

Wellcome Trust Sanger Institute¹, European Bioinformatics Institute², Francis Crick Institute³, Broad Institute⁴, University of Oxford⁵, University of Cambridge⁶, University of Toronto⁷, Oregon Health & Science University⁸, University of Texas MD Anderson Cancer Center⁹, German Cancer Research Center¹⁰, Heidelberg University¹¹, University of Ljubljana¹², NorthShore University HealthSystem¹³, Vancouver Prostate Centre¹⁴, Simon Fraser University¹⁵, Walter and Eliza Hall Institute of Medical Research¹⁶, University of Melbourne¹⁷, Katholieke Universiteit Leuven¹⁸, Cornell University¹⁹, University of California, Santa Cruz²⁰, Ontario Institute for Cancer Research²¹, University of California, Los Angeles²², Peter MacCallum Cancer Centre²³, Harvard University²⁴, Indiana University²⁵, University of Chicago²⁶, University of Cologne²⁷, University of Helsinki²⁸, University of Glasgow²⁹

06 Feb 2020-Nature

TL;DR: Whole-genome sequencing data for 2,778 cancer samples from 2,658 unique donors is used to reconstruct the evolutionary history of cancer, revealing that driver mutations can precede diagnosis by several years to decades.

...read moreread less

Abstract: Cancer develops through a process of somatic evolution1,2. Sequencing data from a single biopsy represent a snapshot of this process that can reveal the timing of specific genomic aberrations and the changing influence of mutational processes3. Here, by whole-genome sequencing analysis of 2,658 cancers as part of the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA)4, we reconstruct the life history and evolution of mutational processes and driver mutation sequences of 38 types of cancer. Early oncogenesis is characterized by mutations in a constrained set of driver genes, and specific copy number gains, such as trisomy 7 in glioblastoma and isochromosome 17q in medulloblastoma. The mutational spectrum changes significantly throughout tumour evolution in 40% of samples. A nearly fourfold diversification of driver genes and increased genomic instability are features of later stages. Copy number alterations often occur in mitotic crises, and lead to simultaneous gains of chromosomal segments. Timing analyses suggest that driver mutations often precede diagnosis by many years, if not decades. Together, these results determine the evolutionary trajectories of cancer, and highlight opportunities for early cancer detection.

...read moreread less

Journal Article•DOI•

Telomere-to-telomere assembly of a complete human X chromosome

[...]

Karen H. Miga¹, Sergey Koren², Arang Rhie², Mitchell R. Vollger³, Ariel Gershman⁴, Andrey Bzikadze⁵, Shelise Brooks², Edmund Howe⁶, David Porubsky³, Glennis A. Logsdon³, Valerie A. Schneider², Tamara A. Potapova⁶, Jonathan Wood⁷, William Chow⁷, Joel Armstrong¹, Jeanne Fredrickson³, Evgenia Pak², Kristof Tigyi¹, Milinn Kremitzki⁸, Christopher Markovic⁸, Valerie Maduro², Amalia Dutra², Gerard G. Bouffard², Alexander M. Chang², Nancy F. Hansen², Amy B. Wilfert³, Françoise Thibaud-Nissen², Anthony D. Schmitt, Jon Matthew Belton, Siddarth Selvaraj, Megan Y. Dennis⁹, Daniela C. Soto⁹, Ruta Sahasrabudhe⁹, Gulhan Kaya⁹, Josh Quick¹⁰, Nicholas J. Loman¹⁰, Nadine Holmes¹¹, Matthew Loose¹¹, Urvashi Surti¹², Rosa Ana Risques³, Tina A. Graves Lindsay⁸, Robert S. Fulton⁸, Ira M. Hall⁸, Benedict Paten¹, Kerstin Howe⁷, Winston Timp⁴, Alice Young², James C. Mullikin², Pavel A. Pevzner⁵, Jennifer L. Gerton⁶, Beth A. Sullivan¹³, Evan E. Eichler³, Adam M. Phillippy² - Show less +49 more•Institutions (13)

University of California, Santa Cruz¹, National Institutes of Health², University of Washington³, Johns Hopkins University⁴, University of California, San Diego⁵, Stowers Institute for Medical Research⁶, Wellcome Trust Sanger Institute⁷, Washington University in St. Louis⁸, University of California, Davis⁹, University of Birmingham¹⁰, University of Nottingham¹¹, University of Pittsburgh¹², Duke University¹³

03 Sep 2020-Nature

TL;DR: High-coverage, ultra-long-read nanopore sequencing is used to create a new human genome assembly that improves on the coverage and accuracy of the current reference (GRCh38) and includes the gap-free, telomere-to-telomere sequence of the X chromosome.

...read moreread less

Abstract: After two decades of improvements, the current human reference genome (GRCh38) is the most accurate and complete vertebrate genome ever produced. However, no single chromosome has been finished end to end, and hundreds of unresolved gaps persist1,2. Here we present a human genome assembly that surpasses the continuity of GRCh382, along with a gapless, telomere-to-telomere assembly of a human chromosome. This was enabled by high-coverage, ultra-long-read nanopore sequencing of the complete hydatidiform mole CHM13 genome, combined with complementary technologies for quality improvement and validation. Focusing our efforts on the human X chromosome3, we reconstructed the centromeric satellite DNA array (approximately 3.1 Mb) and closed the 29 remaining gaps in the current reference, including new sequences from the human pseudoautosomal regions and from cancer-testis ampliconic gene families (CT-X and GAGE). These sequences will be integrated into future human reference genome releases. In addition, the complete chromosome X, combined with the ultra-long nanopore data, allowed us to map methylation patterns across complex tandem repeats and satellite arrays. Our results demonstrate that finishing the entire human genome is now within reach, and the data presented here will facilitate ongoing efforts to complete the other human chromosomes. High-coverage, ultra-long-read nanopore sequencing is used to create a new human genome assembly that improves on the coverage and accuracy of the current reference (GRCh38) and includes the gap-free, telomere-to-telomere sequence of the X chromosome.

...read moreread less

Journal Article•DOI•

Comparative host-coronavirus protein interaction networks reveal pan-viral disease mechanisms.

[...]

David E. Gordon, Joseph Hiatt, Mehdi Bouhaddou, Veronica V. Rezelj¹ +200 more•Institutions (16)

04 Dec 2020-Science

TL;DR: The authors identified shared biology and host-directed drug targets to prioritize therapeutics with potential for rapid deployment against current and future coronavirus outbreaks, and found that individuals with genotypes corresponding to higher soluble IL17RA levels in plasma are at decreased risk of COVID-19 hospitalization.

...read moreread less

Abstract: The COVID-19 pandemic, caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), is a grave threat to public health and the global economy. SARS-CoV-2 is closely related to the more lethal but less transmissible coronaviruses SARS-CoV-1 and Middle East respiratory syndrome coronavirus (MERS-CoV). Here, we have carried out comparative viral-human protein-protein interaction and viral protein localization analyses for all three viruses. Subsequent functional genetic screening identified host factors that functionally impinge on coronavirus proliferation, including Tom70, a mitochondrial chaperone protein that interacts with both SARS-CoV-1 and SARS-CoV-2 ORF9b, an interaction we structurally characterized using cryo-electron microscopy. Combining genetically validated host factors with both COVID-19 patient genetic data and medical billing records identified molecular mechanisms and potential drug treatments that merit further molecular and clinical study.

...read moreread less

Journal Article•DOI•

Patterns of somatic structural variation in human cancer genomes

[...]

Yang Li¹, Nicola D. Roberts¹, Jeremiah Wala², Jeremiah Wala³, Ofer Shapira³, Ofer Shapira², Steven E. Schumacher³, Steven E. Schumacher², Kiran Kumar³, Kiran Kumar², Ekta Khurana⁴, Sebastian M. Waszak, Jan O. Korbel, James E. Haber⁵, Marcin Imielinski, Joachim Weischenfeldt⁶, Rameen Beroukhim², Rameen Beroukhim³, Peter J. Campbell⁷, Peter J. Campbell¹ - Show less +16 more•Institutions (7)

Wellcome Trust Sanger Institute¹, Harvard University², Broad Institute³, Cornell University⁴, Brandeis University⁵, University of Copenhagen⁶, University of Cambridge⁷

05 Feb 2020-Nature

TL;DR: Whole-genome sequencing data from more than 2,500 cancers of 38 tumour types reveal 16 signatures that can be used to classify somatic structural variants, highlighting the diversity of genomic rearrangements in cancer.

...read moreread less

Abstract: A key mutational process in cancer is structural variation, in which rearrangements delete, amplify or reorder genomic segments that range in size from kilobases to whole chromosomes1-7. Here we develop methods to group, classify and describe somatic structural variants, using data from the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA), which aggregated whole-genome sequencing data from 2,658 cancers across 38 tumour types8. Sixteen signatures of structural variation emerged. Deletions have a multimodal size distribution, assort unevenly across tumour types and patients, are enriched in late-replicating regions and correlate with inversions. Tandem duplications also have a multimodal size distribution, but are enriched in early-replicating regions-as are unbalanced translocations. Replication-based mechanisms of rearrangement generate varied chromosomal structures with low-level copy-number gains and frequent inverted rearrangements. One prominent structure consists of 2-7 templates copied from distinct regions of the genome strung together within one locus. Such cycles of templated insertions correlate with tandem duplications, and-in liver cancer-frequently activate the telomerase gene TERT. A wide variety of rearrangement processes are active in cancer, which generate complex configurations of the genome upon which selection can act.

...read moreread less

Journal Article•DOI•

BlobToolKit - Interactive Quality Assessment of Genome Assemblies.

[...]

Richard Challis¹, Richard Challis², Edward Richards³, Jeena Ragan³, Guy Cochrane³, Mark Blaxter², Mark Blaxter¹ - Show less +3 more•Institutions (3)

University of Edinburgh¹, Wellcome Trust Sanger Institute², European Bioinformatics Institute³

01 Apr 2020-G3: Genes, Genomes, Genetics

TL;DR: BlobToolKit, a software suite to aid researchers in identifying and isolating non-target data in draft and publicly available genome assemblies, is presented, providing an indication of assembly quality alongside the public record with links out to allow full exploration in the browser-based Viewer.

...read moreread less

Abstract: Reconstruction of target genomes from sequence data produced by instruments that are agnostic as to the species-of-origin may be confounded by contaminant DNA. Whether introduced during sample processing or through co-extraction alongside the target DNA, if insufficient care is taken during the assembly process, the final assembled genome may be a mixture of data from several species. Such assemblies can confound sequence-based biological inference and, when deposited in public databases, may be included in downstream analyses by users unaware of underlying problems. We present BlobToolKit, a software suite to aid researchers in identifying and isolating non-target data in draft and publicly available genome assemblies. BlobToolKit can be used to process assembly, read and analysis files for fully reproducible interactive exploration in the browser-based Viewer. BlobToolKit can be used during assembly to filter non-target DNA, helping researchers produce assemblies with high biological credibility. We have been running an automated BlobToolKit pipeline on eukaryotic assemblies publicly available in the International Nucleotide Sequence Data Collaboration and are making the results available through a public instance of the Viewer at https://blobtoolkit.genomehubs.org/view We aim to complete analysis of all publicly available genomes and then maintain currency with the flow of new genomes. We have worked to embed these views into the presentation of genome assemblies at the European Nucleotide Archive, providing an indication of assembly quality alongside the public record with links out to allow full exploration in the Viewer.

...read moreread less

Journal Article•DOI•

Screening of healthcare workers for SARS-CoV-2 highlights the role of asymptomatic carriage in COVID-19 transmission.

[...]

Lucy Rivett¹, Lucy Rivett², Sushmita Sridhar¹, Sushmita Sridhar³, Dominic Sparkes¹, Dominic Sparkes², Matthew Routledge², Matthew Routledge¹, Nick K Jones, Sally Forrest¹, Jamie Young¹, Joana Pereira-Dias¹, William L Hamilton², William L Hamilton¹, Mark Ferris⁴, M. Estée Török¹, Luke W. Meredith¹, Martin D. Curran², Stewart Fuller, Afzal N. Chaudhry⁴, Ashley Shaw, Richard J. Samworth¹, John Bradley⁵, John Bradley¹, Gordon Dougan¹, Kenneth G. C. Smith¹, Paul J. Lehner¹, Nicholas J Matheson, Giles Wright⁴, Ian Goodfellow¹, Stephen Baker¹, Michael P. Weekes¹ - Show less +28 more•Institutions (5)

University of Cambridge¹, Public Health England², Wellcome Trust Sanger Institute³, Cambridge University Hospitals NHS Foundation Trust⁴, National Institute for Health Research⁵

11 May 2020-eLife

TL;DR: The utility of comprehensive screening of HCWs with minimal or no symptoms for SARS-CoV-2 testing is demonstrated, and this approach will be critical for protecting patients and hospital staff.

...read moreread less

Abstract: Significant differences exist in the availability of healthcare worker (HCW) SARS-CoV-2 testing between countries, and existing programmes focus on screening symptomatic rather than asymptomatic staff. Over a 3 week period (April 2020), 1032 asymptomatic HCWs were screened for SARS-CoV-2 in a large UK teaching hospital. Symptomatic staff and symptomatic household contacts were additionally tested. Real-time RT-PCR was used to detect viral RNA from a throat+nose self-swab. 3% of HCWs in the asymptomatic screening group tested positive for SARS-CoV-2. 17/30 (57%) were truly asymptomatic/pauci-symptomatic. 12/30 (40%) had experienced symptoms compatible with coronavirus disease 2019 (COVID-19)>7 days prior to testing, most self-isolating, returning well. Clusters of HCW infection were discovered on two independent wards. Viral genome sequencing showed that the majority of HCWs had the dominant lineage B∙1. Our data demonstrates the utility of comprehensive screening of HCWs with minimal or no symptoms. This approach will be critical for protecting patients and hospital staff.

...read moreread less

Journal Article•DOI•

Insights into human genetic variation and population history from 929 diverse genomes.

[...]

Anders Bergström¹, Anders Bergström², Shane A. McCarthy³, Shane A. McCarthy², Ruoyun Hui³, Ruoyun Hui⁴, Mohamed A. Almarri², Qasim Ayub⁵, Qasim Ayub², Petr Danecek², Yuan Chen², Sabine Felkel², Sabine Felkel⁶, Pille Hallast², Pille Hallast⁷, Jack Kamm², Jack Kamm³, Hélène Blanché, Jean-François Deleuze, Howard M. Cann, Swapan Mallick⁸, Swapan Mallick⁹, David Reich⁸, David Reich⁹, Manjinder S. Sandhu³, Manjinder S. Sandhu², Pontus Skoglund¹, Aylwyn Scally³, Yali Xue², Richard Durbin², Richard Durbin³, Chris Tyler-Smith² - Show less +28 more•Institutions (9)

Francis Crick Institute¹, Wellcome Trust Sanger Institute², University of Cambridge³, McDonald Institute for Archaeological Research⁴, Monash University Malaysia Campus⁵, University of Veterinary Medicine Vienna⁶, University of Tartu⁷, Broad Institute⁸, Harvard University⁹

20 Mar 2020-Science

TL;DR: The authors' study adds data about African, Oceanian, and Amerindian populations and indicates that diversity tends to result from differences at the single-nucleotide level rather than copy number variation.

...read moreread less

Abstract: Genome sequences from diverse human groups are needed to understand the structure of genetic variation in our species and the history of, and relationships between, different populations. We present 929 high-coverage genome sequences from 54 diverse human populations, 26 of which are physically phased using linked-read sequencing. Analyses of these genomes reveal an excess of previously undocumented common genetic variation private to southern Africa, central Africa, Oceania, and the Americas, but an absence of such variants fixed between major geographical regions. We also find deep and gradual population separations within Africa, contrasting population size histories between hunter-gatherer and agriculturalist groups in the past 10,000 years, and a contrast between single Neanderthal but multiple Denisovan source populations contributing to present-day human populations.

...read moreread less

Journal Article•DOI•

Identification of region-specific astrocyte subtypes at single cell resolution

[...]

Mykhailo Y. Batiuk¹, Araks Martirosyan¹, Jérôme Wahis¹, Filip De Vin¹, Catherine Marneffe¹, Carola Kusserow¹, Jordan Koeppen¹, João Filipe Viana², João Filipe Oliveira², João Filipe Oliveira³, Thierry Voet¹, Thierry Voet⁴, Chris P. Ponting⁵, Chris P. Ponting⁶, Chris P. Ponting⁴, T. Grant Belgard⁵, Matthew Holt⁷, Matthew Holt¹ - Show less +14 more•Institutions (7)

Katholieke Universiteit Leuven¹, University of Minho², Polytechnic Institute of Cávado and Ave³, Wellcome Trust Sanger Institute⁴, University of Oxford⁵, University of Edinburgh⁶, Allen Institute for Brain Science⁷

05 Mar 2020-Nature Communications

TL;DR: Using single cell transcriptome sequencing, the authors identify multiple astrocyte subtypes in the adult mouse CNS, which map to distinct spatial locations and show correlations to cell morphology and physiology.

...read moreread less

Abstract: Astrocytes, a major cell type found throughout the central nervous system, have general roles in the modulation of synapse formation and synaptic transmission, blood-brain barrier formation, and regulation of blood flow, as well as metabolic support of other brain resident cells. Crucially, emerging evidence shows specific adaptations and astrocyte-encoded functions in regions, such as the spinal cord and cerebellum. To investigate the true extent of astrocyte molecular diversity across forebrain regions, we used single-cell RNA sequencing. Our analysis identifies five transcriptomically distinct astrocyte subtypes in adult mouse cortex and hippocampus. Validation of our data in situ reveals distinct spatial positioning of defined subtypes, reflecting the distribution of morphologically and physiologically distinct astrocyte populations. Our findings are evidence for specialized astrocyte subtypes between and within brain regions. The data are available through an online database (https://holt-sc.glialab.org/), providing a resource on which to base explorations of local astrocyte diversity and function in the brain.

...read moreread less

Journal Article•DOI•

SoupX removes ambient RNA contamination from droplet-based single-cell RNA sequencing data

[...]

Matthew D. Young¹, Sam Behjati², Sam Behjati³, Sam Behjati¹•Institutions (3)

Wellcome Trust Sanger Institute¹, Cambridge University Hospitals NHS Foundation Trust², University of Cambridge³

26 Dec 2020-GigaScience

TL;DR: SoupX, a tool for removing ambient RNA contamination from droplet-based single-cell RNA sequencing experiments, has broad applicability, and its application can improve the biological utility of existing and future datasets.

...read moreread less

Abstract: Background Droplet-based single-cell RNA sequence analyses assume that all acquired RNAs are endogenous to cells. However, any cell-free RNAs contained within the input solution are also captured by these assays. This sequencing of cell-free RNA constitutes a background contamination that confounds the biological interpretation of single-cell transcriptomic data. Results We demonstrate that contamination from this "soup" of cell-free RNAs is ubiquitous, with experiment-specific variations in composition and magnitude. We present a method, SoupX, for quantifying the extent of the contamination and estimating "background-corrected" cell expression profiles that seamlessly integrate with existing downstream analysis tools. Applying this method to several datasets using multiple droplet sequencing technologies, we demonstrate that its application improves biological interpretation of otherwise misleading data, as well as improving quality control metrics. Conclusions We present SoupX, a tool for removing ambient RNA contamination from droplet-based single-cell RNA sequencing experiments. This tool has broad applicability, and its application can improve the biological utility of existing and future datasets.

...read moreread less

Journal Article•DOI•

A brief history of human disease genetics.

[...]

Melina Claussnitzer¹, Melina Claussnitzer², Melina Claussnitzer³, Judy H. Cho⁴, Rory Collins⁵, Nancy J. Cox⁶, Emmanouil T. Dermitzakis⁷, Matthew E. Hurles⁸, Sekar Kathiresan⁹, Sekar Kathiresan¹, Eimear E. Kenny⁴, Cecilia M. Lindgren⁵, Cecilia M. Lindgren¹, Daniel G. MacArthur⁹, Daniel G. MacArthur¹, Kathryn N. North¹⁰, Sharon E. Plon¹¹, Sharon E. Plon¹², Heidi L. Rehm, Neil Risch¹³, Charles N. Rotimi¹⁴, Jay Shendure¹⁵, Jay Shendure¹⁶, Nicole Soranzo¹⁷, Nicole Soranzo⁸, Mark I. McCarthy - Show less +22 more•Institutions (17)

Broad Institute¹, University of Hohenheim², Beth Israel Deaconess Medical Center³, Icahn School of Medicine at Mount Sinai⁴, University of Oxford⁵, Vanderbilt University Medical Center⁶, University of Geneva⁷, Wellcome Trust Sanger Institute⁸, Harvard University⁹, University of Melbourne¹⁰, Baylor College of Medicine¹¹, Boston Children's Hospital¹², University of California, San Francisco¹³, National Institutes of Health¹⁴, University of Washington¹⁵, Howard Hughes Medical Institute¹⁶, University of Cambridge¹⁷

08 Jan 2020-Nature

TL;DR: Progress is described in the study of human genetics, in which rapid advances in technology, foundational genomic resources and analytical tools have contributed to the understanding of the mechanisms responsible for many rare and common diseases and to preventative and therapeutic strategies for many of these conditions.

...read moreread less

Abstract: A primary goal of human genetics is to identify DNA sequence variants that influence biomedical traits, particularly those related to the onset and progression of human disease. Over the past 25 years, progress in realizing this objective has been transformed by advances in technology, foundational genomic resources and analytical tools, and by access to vast amounts of genotype and phenotype data. Genetic discoveries have substantially improved our understanding of the mechanisms responsible for many rare and common diseases and driven development of novel preventative and therapeutic strategies. Medical innovation will increasingly focus on delivering care tailored to individual patterns of genetic predisposition.

...read moreread less

Journal Article•DOI•

Rapid implementation of SARS-CoV-2 sequencing to investigate cases of health-care associated COVID-19: a prospective genomic surveillance study.

[...]

Luke W. Meredith¹, William L Hamilton¹, Ben Warne¹, Charlotte J. Houldcroft¹, Myra Hosmillo¹, Aminu S Jahun¹, Martin D. Curran², Surendra Parmar², Laura G Caller¹, Laura G Caller³, Sarah L Caddy¹, Fahad A Khokhar¹, Anna Yakovleva¹, Grant Hall¹, Theresa Feltwell¹, Sally Forrest¹, Sushmita Sridhar¹, Sushmita Sridhar⁴, Michael P. Weekes¹, Stephen Baker¹, Nicholas M. Brown², Elinor Moore¹, Ashley Popay², Iain Roddick², Mark Reacher², Theodore Gouliouris², Theodore Gouliouris¹, Sharon J. Peacock¹, Sharon J. Peacock², Gordon Dougan¹, M. Estée Török¹, Ian Goodfellow¹ - Show less +28 more•Institutions (4)

University of Cambridge¹, Public Health England², Francis Crick Institute³, Wellcome Trust Sanger Institute⁴

01 Nov 2020-Lancet Infectious Diseases

TL;DR: Real-time genomic surveillance of SARS-CoV-2 in a UK hospital was established and showed the benefit of combined genomic and epidemiological analysis for the investigation of health-care associated COVID-19 cases.

...read moreread less

Abstract: Summary Background The burden and influence of health-care associated severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infections is unknown. We aimed to examine the use of rapid SARS-CoV-2 sequencing combined with detailed epidemiological analysis to investigate health-care associated SARS-CoV-2 infections and inform infection control measures. Methods In this prospective surveillance study, we set up rapid SARS-CoV-2 nanopore sequencing from PCR-positive diagnostic samples collected from our hospital (Cambridge, UK) and a random selection from hospitals in the East of England, enabling sample-to-sequence in less than 24 h. We established a weekly review and reporting system with integration of genomic and epidemiological data to investigate suspected health-care associated COVID-19 cases. Findings Between March 13 and April 24, 2020, we collected clinical data and samples from 5613 patients with COVID-19 from across the East of England. We sequenced 1000 samples producing 747 high-quality genomes. We combined epidemiological and genomic analysis of the 299 patients from our hospital and identified 35 clusters of identical viruses involving 159 patients. 92 (58%) of 159 patients had strong epidemiological links and 32 (20%) patients had plausible epidemiological links. These results were fed back to clinical, infection control, and hospital management teams, leading to infection-control interventions and informing patient safety reporting. Interpretation We established real-time genomic surveillance of SARS-CoV-2 in a UK hospital and showed the benefit of combined genomic and epidemiological analysis for the investigation of health-care associated COVID-19. This approach enabled us to detect cryptic transmission events and identify opportunities to target infection-control interventions to further reduce health-care associated infections. Our findings have important implications for national public health policy as they enable rapid tracking and investigation of infections in hospital and community settings. Funding COVID-19 Genomics UK (supported by UK Research and Innovation, the National Institute of Health Research, the Wellcome Sanger Institute), the Wellcome Trust, the Academy of Medical Sciences and the Health Foundation, and the National Institute for Health Research Cambridge Biomedical Research Centre.

...read moreread less

Journal Article•DOI•

A cell atlas of human thymic development defines T cell repertoire formation.

[...]

Jong-Eun Park¹, Rachel A. Botting², Cecilia Domínguez Conde¹, Dorin-Mirel Popescu², Marieke Lavaert³, Daniel J Kunz⁴, Daniel J Kunz⁵, Daniel J Kunz¹, Issac Goh², Emily Stephenson², Roberta Ragazzini⁶, Roberta Ragazzini⁷, Elizabeth Tuck¹, Anna Wilbrey-Clark¹, Kenny Roberts¹, Veronika R. Kedlian¹, John R. Ferdinand⁸, Xiaoling He⁴, Simone Webb², Daniel Maunder², Niels Vandamme³, Krishnaa T. Mahbubani⁴, Krzysztof Polanski¹, Lira Mamanova¹, Liam Bolt¹, David Crossland², David Crossland⁹, Fabrizio De Rita⁹, Andrew Fuller², Andrew Filby², Gary Reynolds², David Dixon², Kourosh Saeb-Parsy⁴, Steven Lisgo², Deborah J. Henderson², Roser Vento-Tormo¹, Omer Ali Bayraktar¹, Roger A. Barker⁴, Kerstin B. Meyer¹, Yvan Saeys³, Paola Bonfanti⁶, Paola Bonfanti⁷, Sam Behjati¹, Sam Behjati⁴, Menna R. Clatworthy¹⁰, Menna R. Clatworthy¹, Menna R. Clatworthy⁸, Tom Taghon³, Muzlifah Haniffa², Muzlifah Haniffa¹, Sarah A. Teichmann¹, Sarah A. Teichmann⁴ - Show less +48 more•Institutions (10)

Wellcome Trust Sanger Institute¹, Newcastle University², Ghent University³, University of Cambridge⁴, Wellcome Trust/Cancer Research UK Gurdon Institute⁵, Francis Crick Institute⁶, University College London⁷, Laboratory of Molecular Biology⁸, Freeman Hospital⁹, Cambridge University Hospitals NHS Foundation Trust¹⁰

21 Feb 2020-Science

TL;DR: The authors' single-cell transcriptome profile of the thymus across the human lifetime and across species provides a high-resolution census of T cell development within the native tissue microenvironment, and identifies novel subpopulations of human thymic fibroblasts and epithelial cells and located them in situ.

...read moreread less

Abstract: The thymus provides a nurturing environment for the differentiation and selection of T cells, a process orchestrated by their interaction with multiple thymic cell types. We used single-cell RNA sequencing to create a cell census of the human thymus across the life span and to reconstruct T cell differentiation trajectories and T cell receptor (TCR) recombination kinetics. Using this approach, we identified and located in situ CD8αα+ T cell populations, thymic fibroblast subtypes, and activated dendritic cell states. In addition, we reveal a bias in TCR recombination and selection, which is attributed to genomic position and the kinetics of lineage commitment. Taken together, our data provide a comprehensive atlas of the human thymus across the life span with new insights into human T cell development.

...read moreread less

Posted Content•DOI•

Significantly improving the quality of genome assemblies through curation

[...]

Kerstin Howe¹, William Chow¹, Joanna Collins¹, Sarah Pelan¹, Damon-Lee Pointon¹, Ying Sims¹, James Torrance¹, Alan Tracey¹, Jonathan Wood¹ - Show less +5 more•Institutions (1)

Wellcome Trust Sanger Institute¹

13 Aug 2020-bioRxiv

TL;DR: This work describes the tried and tested approach for assembly curation using gEVAL, the genome evaluation browser, and outlines the procedures applied to genome curations using g EVAL and also outlines the recommendations for assemblyCuration in an gevAL-independent context to facilitate the uptake of genome curation in the wider community.

...read moreread less

Abstract: Background Genome sequence assemblies provide the basis for our understanding of biology. Generating error-free assemblies is therefore the ultimate, but sadly still unachieved goal of a multitude of research projects. Despite the ever-advancing improvements in data generation, assembly algorithms and pipelines, no automated approach has so far reliably generated near error-free genome assemblies for eukaryotes. Results Whilst working towards improved data sets and fully automated pipelines, assembly evaluation and curation is actively employed to bridge this shortcoming and significantly reduce the number of assembly errors. In addition to this increase in product value, the insights gained from assembly curation are fed back into the automated assembly strategy and contribute to notable improvements in genome assembly quality. Conclusions We describe our tried and tested approach for assembly curation using gEVAL, the genome evaluation browser. We outline the procedures applied to genome curation using gEVAL and also our recommendations for assembly curation in an gEVAL-independent context to facilitate the uptake of genome curation in the wider community.

...read moreread less

Journal Article•DOI•

Pan-cancer computational histopathology reveals mutations, tumor composition and prognosis

[...]

Yu Fu¹, Alexander W. Jung¹, Ramón Viñas Torné¹, Ramón Viñas Torné², Santiago Gonzalez¹, Harald Vöhringer¹, Artem Shmatko¹, Artem Shmatko³, Lucy R. Yates⁴, Mercedes Jimenez-Linan, Luiza Moore⁴, Moritz Gerstung¹ - Show less +8 more•Institutions (4)

European Bioinformatics Institute¹, University of Cambridge², Moscow State University³, Wellcome Trust Sanger Institute⁴

27 Jul 2020

TL;DR: Deep transfer learning is used to quantify histopathological patterns across 17,355 hematoxylin and eosin-stained histopathology slide images from 28 cancer types and correlate these with matched genomic, transcriptomic and survival data, showing the remarkable potential of computer vision in characterizing the molecular basis of tumor Histopathology.

...read moreread less

Abstract: We use deep transfer learning to quantify histopathological patterns across 17,355 hematoxylin and eosin-stained histopathology slide images from 28 cancer types and correlate these with matched genomic, transcriptomic and survival data. This approach accurately classifies cancer types and provides spatially resolved tumor and normal tissue distinction. Automatically learned computational histopathological features correlate with a large range of recurrent genetic aberrations across cancer types. This includes whole-genome duplications, which display universal features across cancer types, individual chromosomal aneuploidies, focal amplifications and deletions, as well as driver gene mutations. There are widespread associations between bulk gene expression levels and histopathology, which reflect tumor composition and enable the localization of transcriptomically defined tumor-infiltrating lymphocytes. Computational histopathology augments prognosis based on histopathological subtyping and grading, and highlights prognostically relevant areas such as necrosis or lymphocytic aggregates. These findings show the remarkable potential of computer vision in characterizing the molecular basis of tumor histopathology. Two papers by Kather and colleagues and Gerstung and colleagues develop workflows to predict a wide range of molecular alterations from pan-cancer digital pathology slides.

...read moreread less

Journal Article•DOI•

scRNA-seq assessment of the human lung, spleen, and esophagus tissue stability after cold preservation

[...]

Elo Madissoon¹, Elo Madissoon², Anna Wilbrey-Clark², Ricardo J. Miragaia², Kourosh Saeb-Parsy³, Krishnaa T. Mahbubani³, Nikitas Georgakopoulos³, Philippa Harding², Krzysztof Polanski², Ni Huang², Karol Nowicki-Osuch³, Rebecca C. Fitzgerald³, Kevin W. Loudon, John R. Ferdinand, Menna R. Clatworthy, A. Tsingene², S. van Dongen², Monika Dabrowska², Minal Patel², Michael J. T. Stubbington², Sarah A. Teichmann², Oliver Stegle¹, Kerstin B. Meyer² - Show less +19 more•Institutions (3)

European Bioinformatics Institute¹, Wellcome Trust Sanger Institute², University of Cambridge³

01 Dec 2020-Genome Biology

TL;DR: In this article, the effect of cold storage on fresh healthy spleen, esophagus, and lung from ≥ 5 donors over 72h was assessed, and robust protocols for tissue preservation for up to 24h prior to scRNA-seq analysis were presented.

...read moreread less

Abstract: The Human Cell Atlas is a large international collaborative effort to map all cell types of the human body. Single-cell RNA sequencing can generate high-quality data for the delivery of such an atlas. However, delays between fresh sample collection and processing may lead to poor data and difficulties in experimental design. This study assesses the effect of cold storage on fresh healthy spleen, esophagus, and lung from ≥ 5 donors over 72 h. We collect 240,000 high-quality single-cell transcriptomes with detailed cell type annotations and whole genome sequences of donors, enabling future eQTL studies. Our data provide a valuable resource for the study of these 3 organs and will allow cross-organ comparison of cell types. We see little effect of cold ischemic time on cell yield, total number of reads per cell, and other quality control metrics in any of the tissues within the first 24 h. However, we observe a decrease in the proportions of lung T cells at 72 h, higher percentage of mitochondrial reads, and increased contamination by background ambient RNA reads in the 72-h samples in the spleen, which is cell type specific. In conclusion, we present robust protocols for tissue preservation for up to 24 h prior to scRNA-seq analysis. This greatly facilitates the logistics of sample collection for Human Cell Atlas or clinical studies since it increases the time frames for sample processing.

...read moreread less

Journal Article•DOI•

MOFA+: a statistical framework for comprehensive integration of multi-modal single-cell data

[...]

Ricard Argelaguet¹, Damien Arnol¹, Danila Bredikhin¹, Yonatan Deloro¹, Britta Velten¹, Britta Velten², John C. Marioni³, John C. Marioni¹, John C. Marioni⁴, Oliver Stegle², Oliver Stegle¹ - Show less +7 more•Institutions (4)

European Bioinformatics Institute¹, German Cancer Research Center², Wellcome Trust Sanger Institute³, University of Cambridge⁴

11 May 2020-Genome Biology

TL;DR: This work presents Multi-Omics Factor Analysis v2 (MOFA+), a statistical framework for the comprehensive and scalable integration of single-cell multi-modal data that reconstructs a low-dimensional representation of the data using computationally efficient variational inference and supports flexible sparsity constraints.

...read moreread less

Abstract: Technological advances have enabled the profiling of multiple molecular layers at single-cell resolution, assaying cells from multiple samples or conditions. Consequently, there is a growing need for computational strategies to analyze data from complex experimental designs that include multiple data modalities and multiple groups of samples. We present Multi-Omics Factor Analysis v2 (MOFA+), a statistical framework for the comprehensive and scalable integration of single-cell multi-modal data. MOFA+ reconstructs a low-dimensional representation of the data using computationally efficient variational inference and supports flexible sparsity constraints, allowing to jointly model variation across multiple sample groups and data modalities.

...read moreread less

Journal Article•DOI•

Transcription phenotypes of pancreatic cancer are driven by genomic events during tumor evolution

[...]

Michelle Chan-Seng-Yue¹, Michelle Chan-Seng-Yue², J. Kim², J. Kim¹, Gavin W. Wilson¹, Karen Ng¹, Eugenia Flores Figueroa¹, Grainne M. O'Kane², Grainne M. O'Kane³, Ashton A. Connor³, Robert E. Denroche², Robert C. Grant³, Jessica McLeod¹, Julie M. Wilson², Gun Ho Jang², Amy Zhang², Anna Dodd³, Sheng-Ben Liang³, Ayelet Borgida⁴, Dianne Chadwick³, Sangeetha N Kalimuthu³, Ilinca Lungu², John M. S. Bartlett², Paul M. Krzyzanowski², Vandana Sandhu¹, Hervé Tiriac⁵, Hervé Tiriac⁶, Hervé Tiriac⁷, Fieke E. M. Froeling⁸, Fieke E. M. Froeling⁶, Fieke E. M. Froeling⁷, Joanna M. Karasinska, James T. Topham, Daniel J. Renouf, David F. Schaeffer⁹, Steven J.M. Jones¹⁰, Marco A. Marra¹⁰, Janessa Laskin, Runjan Chetty³, Lincoln Stein¹¹, Lincoln Stein², George Zogopoulos¹², George Zogopoulos¹³, Benjamin Haibe-Kains, Peter J. Campbell¹⁴, Peter J. Campbell¹⁵, David A. Tuveson⁷, David A. Tuveson⁶, Jennifer J. Knox², Jennifer J. Knox³, Sandra Fischer³, Sandra Fischer¹¹, Steven Gallinger, Faiyaz Notta¹, Faiyaz Notta¹¹, Faiyaz Notta² - Show less +52 more•Institutions (15)

Princess Margaret Cancer Centre¹, Ontario Institute for Cancer Research², University Health Network³, Lunenfeld-Tanenbaum Research Institute⁴, University of California, San Diego⁵, Cold Spring Harbor Laboratory⁶, Lustgarten Foundation⁷, Imperial College London⁸, Vancouver General Hospital⁹, University of British Columbia¹⁰, University of Toronto¹¹, McGill University Health Centre¹², McGill University¹³, University of Cambridge¹⁴, Wellcome Trust Sanger Institute¹⁵

13 Jan 2020-Nature Genetics

TL;DR: Whole-genome sequencing, transcriptome sequencing and single-cell analysis of primary and metastatic pancreatic adenocarcinoma identify molecular subtypes and intratumor heterogeneity, and support the premise that the constellation of genomic aberrations in the tumor gives rise to the molecular subtype.

...read moreread less

Abstract: Pancreatic adenocarcinoma presents as a spectrum of a highly aggressive disease in patients. The basis of this disease heterogeneity has proved difficult to resolve due to poor tumor cellularity and extensive genomic instability. To address this, a dataset of whole genomes and transcriptomes was generated from purified epithelium of primary and metastatic tumors. Transcriptome analysis demonstrated that molecular subtypes are a product of a gene expression continuum driven by a mixture of intratumoral subpopulations, which was confirmed by single-cell analysis. Integrated whole-genome analysis uncovered that molecular subtypes are linked to specific copy number aberrations in genes such as mutant KRAS and GATA6. By mapping tumor genetic histories, tetraploidization emerged as a key mutational process behind these events. Taken together, these data support the premise that the constellation of genomic aberrations in the tumor gives rise to the molecular subtype, and that disease heterogeneity is due to ongoing genomic instability during progression.

...read moreread less

Journal Article•DOI•

Evidence for 28 genetic disorders discovered by combining healthcare and research data

[...]

Joanna Kaplanis¹, Kaitlin E. Samocha¹, Laurens Wiel², Z Zhang³, Kevin J. Arvai³, Ruth Y. Eberhardt¹, Giuseppe Gallone¹, Stefan H. Lelieveld², Hilary C. Martin¹, Jeremy F. McRae¹, Patrick J. Short¹, Rebecca I. Torene³, E. de Boer², Petr Danecek¹, Eugene J. Gardner¹, Ni Huang¹, Jenny Lord⁴, Jenny Lord¹, Inigo Martincorena¹, Rolph Pfundt², Reijnders Mrf.², Reijnders Mrf.⁵, A Yeung, Helger G. Yntema², S Borras², C Clark², J Dean³, Z Miedzybrodzka⁶, A Ross¹, A Ross⁷, S Tennant⁸, T Dabir⁸, D Donnelly¹, M Humphreys², A Magee², V McConnell³, Shane McKee, Susan E. McNerlan, P J Morrison, Gillian Rea, Fiona Stewart, Trevor Cole, Nicola S. Cooper, L Cooper-Charles, Helen Cox, L Islam, Joseph P. Jarvis, Rebecca Keelagher, D Lim, Dominic J. McMullan, Jenny Morton, S Naik, M O’Driscoll, K R Ong, Deborah Osio, Nicola K. Ragge, S Turton, Julie Vogt, Denise Williams, S. Bodek, Alan Donaldson, A. Hills, K Low, Ruth Newbury-Ecob, A M Norman, E. Roberts, Ingrid Scurr, Sarah F. Smithson, Madeleine J. Tooley, S Abbs, Ruth Armstrong, C Dunn, Simon Holden, Soo-Mi Park, Joan Paterson, Lucy Raymond, E Reid, R Sandford, Ingrid Simonic, Marc Tischkowitz, G Woods, Lisa Bradley, J Comerford, Angie Green, Sally Ann Lynch, S McQuaid, B Mullaney, Jonathan Berg, David Goudie, E Mavrak, J McLean, C McWilliam, E Reavey, T Azam, E Cleary, Andrew Jackson, Wayne Lam, AK Lampe, David Moore, Mary E. M. Porteous, Emma L. Baple, Julia Baptista, C Brewer - Show less +99 more•Institutions (8)

Wellcome Trust Sanger Institute¹, Radboud University Nijmegen², GeneDx³, University of Southampton⁴, Maastricht University Medical Centre⁵, Royal Devon and Exeter Hospital⁶, Cambridge University Hospitals NHS Foundation Trust⁷, Western General Hospital⁸

14 Oct 2020-Nature

TL;DR: To identify novel DD-associated genes, healthcare and research exome sequences are integrated on 31,058 DD parent-offspring trios, and a simulation-based statistical test is developed to identify gene-specific enrichments of DNMs.

...read moreread less

Abstract: De novo mutations in protein-coding genes are a well-established cause of developmental disorders1. However, genes known to be associated with developmental disorders account for only a minority of the observed excess of such de novo mutations1,2. Here, to identify previously undescribed genes associated with developmental disorders, we integrate healthcare and research exome-sequence data from 31,058 parent-offspring trios of individuals with developmental disorders, and develop a simulation-based statistical test to identify gene-specific enrichment of de novo mutations. We identified 285 genes that were significantly associated with developmental disorders, including 28 that had not previously been robustly associated with developmental disorders. Although we detected more genes associated with developmental disorders, much of the excess of de novo mutations in protein-coding genes remains unaccounted for. Modelling suggests that more than 1,000 genes associated with developmental disorders have not yet been described, many of which are likely to be less penetrant than the currently known genes. Research access to clinical diagnostic datasets will be critical for completing the map of genes associated with developmental disorders.

...read moreread less

Journal Article•DOI•

The polygenic and monogenic basis of blood traits and diseases

[...]

Dragana Vuckovic¹, Erik L. Bao², Parsa Akbari¹, Caleb A. Lareau², Abdou Mousas³, Tao Jiang¹, Ming-Huei Chen, Laura M. Raffield⁴, Manuel Tardaguila⁵, Jennifer E. Huffman⁶, Scott C. Ritchie¹, Karyn Megy¹, Hannes Ponstingl⁵, Christopher J. Penkett¹, Patrick K. Albers⁵, Emilie M. Wigdor⁵, Saori Sakaue⁷, Arden Moscati⁸, Regina Manansala⁹, Ken Sin Lo³, Huijun Qian⁴, Masato Akiyama¹⁰, Traci M. Bartz¹¹, Yoav Ben-Shlomo¹², Andrew D Beswick¹², Jette Bork-Jensen¹³, Erwin P. Bottinger⁸, Jennifer A. Brody¹¹, Frank J. A. van Rooij¹⁴, Kumaraswamy Naidu Chitrala¹⁵, Peter W.F. Wilson¹⁶, Hélène Choquet¹⁷, John Danesh, Emanuele Di Angelantonio, Niki Dimou¹⁸, Jingzhong Ding¹⁹, Paul Elliott²⁰, Tõnu Esko²¹, Michele K. Evans¹⁵, Stephan B. Felix²², James S. Floyd¹¹, Linda Broer¹⁴, Niels Grarup¹³, Michael H. Guo²³, Qi Guo²⁴, Andreas Greinacher²², Jeffrey Haessler²⁵, Torben Hansen¹³, J. M. M. Howson¹, Wei Huang²⁶, Eric Jorgenson¹⁷, Tim Kacprowski²⁷, Mika Kähönen²⁸, Yoichiro Kamatani²⁹, Masahiro Kanai², Savita Karthikeyan²⁴, Fotios Koskeridis³⁰, Leslie A. Lange³¹, Terho Lehtimäki, Allan Linneberg¹³, Yongmei Liu³², Leo-Pekka Lyytikäinen, Ani Manichaikul³³, Koichi Matsuda²⁹, Karen L. Mohlke⁴, Nina Mononen, Yoshinori Murakami²⁹, Girish N. Nadkarni⁸, Kjell Nikus²⁸, Nathan Pankratz³⁴, Oluf Pedersen¹³, Michael Preuss⁸, Bruce M. Psaty¹¹, Olli T. Raitakari³⁵, Stephen S. Rich³³, Benjamin Rodriguez, Jonathan D. Rosen⁴, Jerome I. Rotter³⁶, Petra Schubert⁶, Cassandra N. Spracklen⁴, Praveen Surendran⁵, Hua Tang³⁷, Jean-Claude Tardif³, Mohsen Ghanbari³⁸, Uwe Völker²², Henry Völzke²², Nicholas A. Watkins³⁹, Stefan Weiss²², VA Million Veteran Program⁵, Na Cai⁵, Kousik Kundu⁵, Stephen B. Watt⁵, Klaudia Walter⁵, Alan B. Zonderman¹⁵, Kelly Cho⁴⁰, Yun Li⁴, Ruth J. F. Loos⁸, Julian C. Knight⁴¹, Michel Georges⁴², Oliver Stegle⁴³, Evangelos Evangelou²⁰, Yukinori Okada⁷, David J. Roberts⁴⁴, Michael Inouye, Andrew D. Johnson, Paul L. Auer⁹, William J. Astle¹, Alexander P. Reiner¹¹, Adam S. Butterworth, Willem H. Ouwehand¹, Guillaume Lettre³, Vijay G. Sankaran², Vijay G. Sankaran²¹, Nicole Soranzo - Show less +110 more•Institutions (44)

National Institute for Health Research¹, Harvard University², Montreal Heart Institute³, University of North Carolina at Chapel Hill⁴, Wellcome Trust Sanger Institute⁵, VA Boston Healthcare System⁶, Osaka University⁷, Icahn School of Medicine at Mount Sinai⁸, University of Wisconsin–Milwaukee⁹, Kyushu University¹⁰, University of Washington¹¹, University of Bristol¹², University of Copenhagen¹³, Erasmus University Medical Center¹⁴, National Institutes of Health¹⁵, Veterans Health Administration¹⁶, Kaiser Permanente¹⁷, International Agency for Research on Cancer¹⁸, Wake Forest University¹⁹, Imperial College London²⁰, Broad Institute²¹, Greifswald University Hospital²², University of Pennsylvania²³, British Heart Foundation²⁴, Fred Hutchinson Cancer Research Center²⁵, Chinese National Human Genome Center²⁶, Technische Universität München²⁷, University of Tampere²⁸, University of Tokyo²⁹, University of Ioannina³⁰, University of Colorado Denver³¹, Duke University³², University of Virginia³³, University of Minnesota³⁴, Turku University Hospital³⁵, Los Angeles Biomedical Research Institute³⁶, Stanford University³⁷, Mashhad University of Medical Sciences³⁸, NHS Blood and Transplant³⁹, Brigham and Women's Hospital⁴⁰, University of Oxford⁴¹, University of Liège⁴², European Bioinformatics Institute⁴³, John Radcliffe Hospital⁴⁴

03 Sep 2020-Cell

TL;DR: The results show the power of large-scale blood cell trait GWAS to interrogate clinically meaningful variants across a wide allelic spectrum of human variation.

...read moreread less

Collapse