scispace - formally typeset
Search or ask a question

Showing papers by "Wellcome Trust Sanger Institute published in 2018"


Journal ArticleDOI
Naomi R. Wray1, Stephan Ripke2, Stephan Ripke3, Stephan Ripke4  +259 moreInstitutions (79)
TL;DR: A genome-wide association meta-analysis of individuals with clinically assessed or self-reported depression identifies 44 independent and significant loci and finds important relationships of genetic risk for major depression with educational attainment, body mass, and schizophrenia.
Abstract: Major depressive disorder (MDD) is a common illness accompanied by considerable morbidity, mortality, costs, and heightened risk of suicide. We conducted a genome-wide association meta-analysis based in 135,458 cases and 344,901 controls and identified 44 independent and significant loci. The genetic findings were associated with clinical features of major depression and implicated brain regions exhibiting anatomical differences in cases. Targets of antidepressant medications and genes involved in gene splicing were enriched for smaller association signal. We found important relationships of genetic risk for major depression with educational attainment, body mass, and schizophrenia: lower educational attainment and higher body mass were putatively causal, whereas major depression and schizophrenia reflected a partly shared biological etiology. All humans carry lesser or greater numbers of genetic risk factors for major depression. These findings help refine the basis of major depression and imply that a continuous measure of risk underlies the clinical phenotype.

1,898 citations


Journal ArticleDOI
TL;DR: This work presents a strategy for batch correction based on the detection of mutual nearest neighbors (MNNs) in the high-dimensional expression space and demonstrates the superiority of this approach compared with existing methods by using both simulated and real scRNA-seq data sets.
Abstract: Large-scale single-cell RNA sequencing (scRNA-seq) data sets that are produced in different laboratories and at different times contain batch effects that may compromise the integration and interpretation of the data. Existing scRNA-seq analysis methods incorrectly assume that the composition of cell populations is either known or identical across batches. We present a strategy for batch correction based on the detection of mutual nearest neighbors (MNNs) in the high-dimensional expression space. Our approach does not rely on predefined or equal population compositions across batches; instead, it requires only that a subset of the population be shared between batches. We demonstrate the superiority of our approach compared with existing methods by using both simulated and real scRNA-seq data sets. Using multiple droplet-based scRNA-seq data sets, we demonstrate that our MNN batch-effect-correction method can be scaled to large numbers of cells.

1,423 citations


Journal ArticleDOI
22 Jun 2018-Science
TL;DR: It is demonstrated that, in the general population, the personality trait neuroticism is significantly correlated with almost every psychiatric disorder and migraine, and it is shown that both psychiatric and neurological disorders have robust correlations with cognitive and personality measures.
Abstract: Disorders of the brain can exhibit considerable epidemiological comorbidity and often share symptoms, provoking debate about their etiologic overlap. We quantified the genetic sharing of 25 brain disorders from genome-wide association studies of 265,218 patients and 784,643 control participants and assessed their relationship to 17 phenotypes from 1,191,588 individuals. Psychiatric disorders share common variant risk, whereas neurological disorders appear more distinct from one another and from the psychiatric disorders. We also identified significant sharing between disorders and a number of brain phenotypes, including cognitive measures. Further, we conducted simulations to explore how statistical power, diagnostic misclassification, and phenotypic heterogeneity affect genetic correlations. These results highlight the importance of common genetic variation as a risk factor for brain disorders and the value of heritability-based methods in understanding their etiology.

1,357 citations


Journal ArticleDOI
14 Nov 2018-Nature
TL;DR: A single-cell atlas of the maternal–fetal interface reveals the cellular organization of the decidua and placenta, and the interactions that are critical for placentation and reproductive success, and develops a repository of ligand–receptor complexes and a statistical tool to predict the cell–cell communication via these molecular interactions.
Abstract: During early human pregnancy the uterine mucosa transforms into the decidua, into which the fetal placenta implants and where placental trophoblast cells intermingle and communicate with maternal cells. Trophoblast-decidual interactions underlie common diseases of pregnancy, including pre-eclampsia and stillbirth. Here we profile the transcriptomes of about 70,000 single cells from first-trimester placentas with matched maternal blood and decidual cells. The cellular composition of human decidua reveals subsets of perivascular and stromal cells that are located in distinct decidual layers. There are three major subsets of decidual natural killer cells that have distinctive immunomodulatory and chemokine profiles. We develop a repository of ligand-receptor complexes and a statistical tool to predict the cell-type specificity of cell-cell communication via these molecular interactions. Our data identify many regulatory interactions that prevent harmful innate or adaptive immune responses in this environment. Our single-cell atlas of the maternal-fetal interface reveals the cellular organization of the decidua and placenta, and the interactions that are critical for placentation and reproductive success.

1,315 citations


Journal ArticleDOI
TL;DR: A simple and robust plate-based single-cell ATAC-seq method that works in fresh and cryopreserved cells and identifies distinct immune cell types and reveal cell type-specific regulatory regions and related transcription factors is developed.
Abstract: The assay for transposase-accessible chromatin using sequencing (ATAC-seq) is widely used to identify regulatory regions throughout the genome. However, very few studies have been performed at the single cell level (scATAC-seq) due to technical challenges. Here we developed a simple and robust plate-based scATAC-seq method, combining upfront bulk Tn5 tagging with single-nuclei sorting. We demonstrate that our method works robustly across various systems, including fresh and cryopreserved cells from primary tissues. By profiling over 3000 splenocytes, we identify distinct immune cell types and reveal cell type-specific regulatory regions and related transcription factors. ATAC-seq is widely used to identify regulatory regions in the genome. Here the authors develop a simple and robust plate-based single-cell ATAC-seq method that works in fresh and cryopreserved cells.

1,260 citations


Journal ArticleDOI
TL;DR: It is shown that DNA breaks introduced by single-guide RNA/Cas9 frequently resolved into deletions extending over many kilobases, and the observed genomic damage in mitotically active cells caused by CRISPR–Cas9 editing may have pathogenic consequences.
Abstract: CRISPR-Cas9 is poised to become the gene editing tool of choice in clinical contexts. Thus far, exploration of Cas9-induced genetic alterations has been limited to the immediate vicinity of the target site and distal off-target sequences, leading to the conclusion that CRISPR-Cas9 was reasonably specific. Here we report significant on-target mutagenesis, such as large deletions and more complex genomic rearrangements at the targeted sites in mouse embryonic stem cells, mouse hematopoietic progenitors and a human differentiated cell line. Using long-read sequencing and long-range PCR genotyping, we show that DNA breaks introduced by single-guide RNA/Cas9 frequently resolved into deletions extending over many kilobases. Furthermore, lesions distal to the cut site and crossover events were identified. The observed genomic damage in mitotically active cells caused by CRISPR-Cas9 editing may have pathogenic consequences.

1,232 citations


Journal ArticleDOI
Anubha Mahajan1, Daniel Taliun2, Matthias Thurner1, Neil R. Robertson1, Jason M. Torres1, N. William Rayner3, N. William Rayner1, Anthony Payne1, Valgerdur Steinthorsdottir4, Robert A. Scott5, Niels Grarup6, James P. Cook7, Ellen M. Schmidt2, Matthias Wuttke8, Chloé Sarnowski9, Reedik Mägi10, Jana Nano11, Christian Gieger, Stella Trompet12, Cécile Lecoeur13, Michael Preuss14, Bram P. Prins3, Xiuqing Guo15, Lawrence F. Bielak2, Jennifer E. Below16, Donald W. Bowden17, John C. Chambers, Young-Jin Kim, Maggie C.Y. Ng17, Lauren E. Petty16, Xueling Sim18, Weihua Zhang19, Weihua Zhang20, Amanda J. Bennett1, Jette Bork-Jensen6, Chad M. Brummett2, Mickaël Canouil13, Kai-Uwe Ec Kardt21, Krista Fischer10, Sharon L.R. Kardia2, Florian Kronenberg22, Kristi Läll10, Ching-Ti Liu9, Adam E. Locke23, Jian'an Luan5, Ioanna Ntalla24, Vibe Nylander1, Sebastian Schönherr22, Claudia Schurmann14, Loic Yengo13, Erwin P. Bottinger14, Ivan Brandslund25, Cramer Christensen, George Dedoussis26, Jose C. Florez, Ian Ford27, Oscar H. Franco11, Timothy M. Frayling28, Vilmantas Giedraitis29, Sophie Hackinger3, Andrew T. Hattersley28, Christian Herder30, M. Arfan Ikram11, Martin Ingelsson29, Marit E. Jørgensen25, Marit E. Jørgensen31, Torben Jørgensen6, Torben Jørgensen32, Jennifer Kriebel, Johanna Kuusisto33, Symen Ligthart11, Cecilia M. Lindgren34, Cecilia M. Lindgren1, Allan Linneberg35, Allan Linneberg6, Valeriya Lyssenko36, Valeriya Lyssenko37, Vasiliki Mamakou26, Thomas Meitinger38, Karen L. Mohlke39, Andrew D. Morris40, Andrew D. Morris41, Girish N. Nadkarni14, James S. Pankow42, Annette Peters, Naveed Sattar43, Alena Stančáková33, Konstantin Strauch44, Kent D. Taylor15, Barbara Thorand, Gudmar Thorleifsson4, Unnur Thorsteinsdottir45, Unnur Thorsteinsdottir4, Jaakko Tuomilehto, Daniel R. Witte46, Josée Dupuis9, Patricia A. Peyser2, Eleftheria Zeggini3, Ruth J. F. Loos14, Philippe Froguel13, Philippe Froguel19, Erik Ingelsson47, Erik Ingelsson48, Lars Lind29, Leif Groop37, Leif Groop49, Markku Laakso33, Francis S. Collins50, J. Wouter Jukema12, Colin N. A. Palmer51, Harald Grallert, Andres Metspalu10, Abbas Dehghan19, Abbas Dehghan11, Anna Köttgen8, Gonçalo R. Abecasis2, James B. Meigs52, Jerome I. Rotter15, Jonathan Marchini1, Oluf Pedersen6, Torben Hansen25, Torben Hansen6, Claudia Langenberg5, Nicholas J. Wareham5, Kari Stefansson4, Kari Stefansson45, Anna L. Gloyn1, Andrew P. Morris10, Andrew P. Morris1, Andrew P. Morris7, Michael Boehnke2, Mark I. McCarthy1 
TL;DR: Combining 32 genome-wide association studies with high-density imputation provides a comprehensive view of the genetic contribution to type 2 diabetes in individuals of European ancestry with respect to locus discovery, causal-variant resolution, and mechanistic insight.
Abstract: We expanded GWAS discovery for type 2 diabetes (T2D) by combining data from 898,130 European-descent individuals (9% cases), after imputation to high-density reference panels. With these data, we (i) extend the inventory of T2D-risk variants (243 loci, 135 newly implicated in T2D predisposition, comprising 403 distinct association signals); (ii) enrich discovery of lower-frequency risk alleles (80 index variants with minor allele frequency 2); (iii) substantially improve fine-mapping of causal variants (at 51 signals, one variant accounted for >80% posterior probability of association (PPA)); (iv) extend fine-mapping through integration of tissue-specific epigenomic information (islet regulatory annotations extend the number of variants with PPA >80% to 73); (v) highlight validated therapeutic targets (18 genes with associations attributable to coding variants); and (vi) demonstrate enhanced potential for clinical translation (genome-wide chip heritability explains 18% of T2D risk; individuals in the extremes of a T2D polygenic risk score differ more than ninefold in prevalence).

1,136 citations


Journal ArticleDOI
18 Apr 2018-Nature
TL;DR: A large panel of cell surface markers in skin and mammary primary tumours is screened, and the existence of multiple tumour subpopulations associated with different EMT stages are identified: from epithelial to completely mesenchymal states, passing through intermediate hybrid states.
Abstract: In cancer, the epithelial-to-mesenchymal transition (EMT) is associated with tumour stemness, metastasis and resistance to therapy. It has recently been proposed that, rather than being a binary process, EMT occurs through distinct intermediate states. However, there is no direct in vivo evidence for this idea. Here we screen a large panel of cell surface markers in skin and mammary primary tumours, and identify the existence of multiple tumour subpopulations associated with different EMT stages: from epithelial to completely mesenchymal states, passing through intermediate hybrid states. Although all EMT subpopulations presented similar tumour-propagating cell capacity, they displayed differences in cellular plasticity, invasiveness and metastatic potential. Their transcriptional and epigenetic landscapes identify the underlying gene regulatory networks, transcription factors and signalling pathways that control these different EMT transition states. Finally, these tumour subpopulations are localized in different niches that differentially regulate EMT transition states.

981 citations


Journal ArticleDOI
06 Jun 2018-Nature
TL;DR: The genetic architecture of the human plasma proteome in healthy blood donors from the INTERVAL study is characterized, and it is shown that protein quantitative trait loci overlap with gene expression quantitative traits, as well as with disease-associated loci, and evidence that protein biomarkers have causal roles in disease is found.
Abstract: Although plasma proteins have important roles in biological processes and are the direct targets of many drugs, the genetic factors that control inter-individual variation in plasma protein levels are not well understood. Here we characterize the genetic architecture of the human plasma proteome in healthy blood donors from the INTERVAL study. We identify 1,927 genetic associations with 1,478 proteins, a fourfold increase on existing knowledge, including trans associations for 1,104 proteins. To understand the consequences of perturbations in plasma protein levels, we apply an integrated approach that links genetic variation with biological pathway, disease, and drug databases. We show that protein quantitative trait loci overlap with gene expression quantitative trait loci, as well as with disease-associated loci, and find evidence that protein biomarkers have causal roles in disease using Mendelian randomization analysis. By linking genetic factors to diseases via specific proteins, our analyses highlight potential therapeutic targets, opportunities for matching existing drugs with new disease indications, and potential safety concerns for drugs under development.

961 citations


Journal ArticleDOI
TL;DR: The 2018 Catalogue of Somatic Mutations in Cancer (COSMIC) Cancer Gene Census (CGC) is discussed, an expert-curated description of human cancer genes, which has recently been expanded to include functional descriptions of how each gene contributes to cancer.
Abstract: The Catalogue of Somatic Mutations in Cancer (COSMIC) Cancer Gene Census (CGC) is an expert-curated description of the genes driving human cancer that is used as a standard in cancer genetics across basic research, medical reporting and pharmaceutical development. After a major expansion and complete re-evaluation, the 2018 CGC describes in detail the effect of 719 cancer-driving genes. The recent expansion includes functional and mechanistic descriptions of how each gene contributes to disease generation in terms of the key cancer hallmarks and the impact of mutations on gene and protein function. These functional characteristics depict the extraordinary complexity of cancer biology and suggest multiple cancer-related functions for many genes, which are often highly tissue-dependent or tumour stage-dependent. The 2018 CGC encompasses a second tier, describing an expanding list of genes (currently 145) from more recent cancer studies that show supportive but less detailed indications of a role in cancer.

895 citations


Journal ArticleDOI
23 Nov 2018-Science
TL;DR: Targeted gene sequencing of normal esophageal epithelium from nine human donors found strong positive selection of clones carrying mutations in 14 cancer genes, with tens to hundreds of clones per square centimeter in middle-aged and elderly donors.
Abstract: The extent to which cells in normal tissues accumulate mutations throughout life is poorly understood. Some mutant cells expand into clones that can be detected by genome sequencing. We mapped mutant clones in normal esophageal epithelium from nine donors (age range, 20 to 75 years). Somatic mutations accumulated with age and were caused mainly by intrinsic mutational processes. We found strong positive selection of clones carrying mutations in 14 cancer genes, with tens to hundreds of clones per square centimeter. In middle-aged and elderly donors, clones with cancer-associated mutations covered much of the epithelium, with NOTCH1 and TP53 mutations affecting 12 to 80% and 2 to 37% of cells, respectively. Unexpectedly, the prevalence of NOTCH1 mutations in normal esophagus was several times higher than in esophageal cancers. These findings have implications for our understanding of cancer and aging.

Journal ArticleDOI
TL;DR: In this article, the largest genetic association study of blood pressure traits (systolic, diastolic and pulse pressure) to date in over 1 million people of European ancestry was conducted.
Abstract: High blood pressure is a highly heritable and modifiable risk factor for cardiovascular disease We report the largest genetic association study of blood pressure traits (systolic, diastolic and pulse pressure) to date in over 1 million people of European ancestry We identify 535 novel blood pressure loci that not only offer new biological insights into blood pressure regulation but also highlight shared genetic architecture between blood pressure and lifestyle exposures Our findings identify new biological pathways for blood pressure regulation with potential for improved cardiovascular disease prevention in the future

Journal ArticleDOI
TL;DR: In this paper, the authors highlight the key technological developments that have enabled the growth in the data obtained from single-cell RNA-seq experiments, and highlight the advantages of using large numbers of cells.
Abstract: Measurement of the transcriptomes of single cells has been feasible for only a few years, but it has become an extremely popular assay. While many types of analysis can be carried out and various questions can be answered by single-cell RNA-seq, a central focus is the ability to survey the diversity of cell types in a sample. Unbiased and reproducible cataloging of gene expression patterns in distinct cell types requires large numbers of cells. Technological developments and protocol improvements have fueled consistent and exponential increases in the number of cells that can be studied in single-cell RNA-seq analyses. In this Perspective, we highlight the key technological developments that have enabled this growth in the data obtained from single-cell RNA-seq experiments.

Journal ArticleDOI
04 May 2018-Science
TL;DR: Saturation-scale mutagenesis allows prioritization of intervention targets in the genome of the most important cause of malaria, and confirms the proteasome-degradation pathway is a high-value druggable target.
Abstract: INTRODUCTION Malaria remains a devastating global parasitic disease, with the majority of malaria deaths caused by the highly virulent Plasmodium falciparum . The extreme AT-bias of the P. falciparum genome has hampered genetic studies through targeted approaches such as homologous recombination or CRISPR-Cas9, and only a few hundred P. falciparum mutants have been experimentally generated in the past decades. In this study, we have used high-throughput piggyBac transposon insertional mutagenesis and quantitative insertion site sequencing (QIseq) to reach saturation-level mutagenesis of this parasite. RATIONALE Our study exploits the AT-richness of the P. falciparum genome, which provides numerous piggyBac transposon insertion targets within both gene coding and noncoding flanking sequences, to generate more than 38,000 P. falciparum mutants. At this level of mutagenesis, we could distinguish essential genes as nonmutable and dispensable genes as mutable. Subsequently, we identified 2680 genes essential for in vitro asexual blood-stage growth. RESULTS We calculated mutagenesis index scores (MISs) and mutagenesis fitness scores (MFSs) in order to functionally define the relative fitness cost of disruption for 5399 genes. A competitive growth phenotype screen confirmed that MIS and MFS were predictive of the fitness cost for in vitro asexual growth. Genes predicted to be essential included genes implicated in drug resistance—such as the “ K13 ” Kelch propeller, mdr , and dhfr-ts —as well as targets considered to be high value for drugs development, such as pkg and cdpk5 . The screen revealed essential genes that are specific to human Plasmodium parasites but absent from rodent-infective species, such as lipid metabolic genes that may be crucial to transmission commitment in human infections. MIS and MFS profiling provides a clear ranking of the relative essentiality of gene ontology (GO) functions in P. falciparum . GO pathways associated with translation, RNA metabolism, and cell cycle control are more essential, whereas genes associated with protein phosphorylation, virulence factors, and transcription are more likely to be dispensable. Last, we confirm that the proteasome-degradation pathway is a high-value druggable target on the basis of its high ratio of essential to dispensable genes, and by functionally confirming its link to the mode of action of artemisinin, the current front-line antimalarial. CONCLUSION Saturation-scale mutagenesis allows prioritization of intervention targets in the genome of the most important cause of malaria. The identification of more than 2680 essential genes, including ~1000 Plasmodium -conserved essential genes, will be valuable for antimalarial therapeutic research.

Journal ArticleDOI
Sagi Abelson1, Grace Collord2, Grace Collord3, Stanley W.K. Ng4, Omer Weissbrod5, Netta Mendelson Cohen5, Elisabeth Niemeyer5, Noam Barda, Philip C. Zuzarte6, Lawrence E. Heisler6, Yogi Sundaravadanam6, Robert Luben2, Shabina Hayat2, Ting Ting Wang1, Ting Ting Wang4, Zhen Zhao1, Iulia Cirlan1, Trevor J. Pugh1, Trevor J. Pugh4, Trevor J. Pugh6, David Soave6, Karen Ng6, Calli Latimer3, Claire Hardy3, Keiran Raine3, David T. Jones3, Diana Hoult2, Abigail Britten2, John Douglas Mcpherson6, Mattias Johansson7, Faridah Mbabaali6, Jenna Eagles6, Jessica Miller6, Danielle Pasternack6, Lee Timms6, Paul M. Krzyzanowski6, Philip Awadalla6, Rui Costa8, Eran Segal5, Scott V. Bratman1, Scott V. Bratman4, Scott V. Bratman6, Philip A. Beer3, Sam Behjati2, Sam Behjati3, Inigo Martincorena3, Jean C.Y. Wang9, Jean C.Y. Wang1, Jean C.Y. Wang4, Kristian M. Bowles10, Kristian M. Bowles11, J. Ramón Quirós, Anna Karakatsani12, Carlo La Vecchia13, Antonia Trichopoulou, Elena Salamanca-Fernández14, José María Huerta, Aurelio Barricarte, Ruth C. Travis15, Rosario Tumino, Giovanna Masala16, Heiner Boeing, Salvatore Panico17, Rudolf Kaaks18, Alwin Krämer18, Sabina Sieri, Elio Riboli19, Paolo Vineis19, Matthieu Foll7, James McKay7, Silvia Polidoro, Núria Sala, Kay-Tee Khaw2, Roel Vermeulen20, Peter J. Campbell2, Peter J. Campbell3, Elli Papaemmanuil21, Elli Papaemmanuil3, Mark D. Minden, Amos Tanay5, Ran D. Balicer, Nicholas J. Wareham2, Moritz Gerstung3, Moritz Gerstung8, John E. Dick4, John E. Dick1, Paul Brennan7, George S. Vassiliou2, George S. Vassiliou3, Liran I. Shlush5, Liran I. Shlush1 
09 Jul 2018-Nature
TL;DR: Deep sequencing is used to analyse genes that are recurrently mutated in AML to distinguish between individuals who have a high risk of developing AML and those with benign ARCH, providing proof-of-concept that it is possible to discriminate ARCH from pre-AML many years before malignant transformation.
Abstract: The incidence of acute myeloid leukaemia (AML) increases with age and mortality exceeds 90% when diagnosed after age 65. Most cases arise without any detectable early symptoms and patients usually present with the acute complications of bone marrow failure1. The onset of such de novo AML cases is typically preceded by the accumulation of somatic mutations in preleukaemic haematopoietic stem and progenitor cells (HSPCs) that undergo clonal expansion2,3. However, recurrent AML mutations also accumulate in HSPCs during ageing of healthy individuals who do not develop AML, a phenomenon referred to as age-related clonal haematopoiesis (ARCH)4–8. Here we use deep sequencing to analyse genes that are recurrently mutated in AML to distinguish between individuals who have a high risk of developing AML and those with benign ARCH. We analysed peripheral blood cells from 95 individuals that were obtained on average 6.3 years before AML diagnosis (pre-AML group), together with 414 unselected age- and gender-matched individuals (control group). Pre-AML cases were distinct from controls and had more mutations per sample, higher variant allele frequencies, indicating greater clonal expansion, and showed enrichment of mutations in specific genes. Genetic parameters were used to derive a model that accurately predicted AML-free survival; this model was validated in an independent cohort of 29 pre-AML cases and 262 controls. Because AML is rare, we also developed an AML predictive model using a large electronic health record database that identified individuals at greater risk. Collectively our findings provide proof-of-concept that it is possible to discriminate ARCH from pre-AML many years before malignant transformation. This could in future enable earlier detection and monitoring, and may help to inform intervention.

Journal ArticleDOI
TL;DR: A perspective on the Earth BioGenome Project (EBP), a moonshot for biology that aims to sequence, catalog, and characterize the genomes of all of Earth’s eukaryotic biodiversity over a period of 10 years, is presented.
Abstract: Increasing our understanding of Earth’s biodiversity and responsibly stewarding its resources are among the most crucial scientific and social challenges of the new millennium. These challenges require fundamental new knowledge of the organization, evolution, functions, and interactions among millions of the planet’s organisms. Herein, we present a perspective on the Earth BioGenome Project (EBP), a moonshot for biology that aims to sequence, catalog, and characterize the genomes of all of Earth’s eukaryotic biodiversity over a period of 10 years. The outcomes of the EBP will inform a broad range of major issues facing humanity, such as the impact of climate change on biodiversity, the conservation of endangered species and ecosystems, and the preservation and enhancement of ecosystem services. We describe hurdles that the project faces, including data-sharing policies that ensure a permanent, freely available resource for future scientific discovery while respecting access and benefit sharing guidelines of the Nagoya Protocol. We also describe scientific and organizational challenges in executing such an ambitious project, and the structure proposed to achieve the project’s goals. The far-reaching potential benefits of creating an open digital repository of genomic information for life on Earth can be realized only by a coordinated international effort.

Journal ArticleDOI
TL;DR: Multi‐Omics Factor Analysis (MOFA) infers a set of (hidden) factors that capture biological and technical sources of variability that disentangles axes of heterogeneity that are shared across multiple modalities and those specific to individual data modalities.
Abstract: Multi-omics studies promise the improved characterization of biological processes across molecular layers. However, methods for the unsupervised integration of the resulting heterogeneous data sets are lacking. We present Multi-Omics Factor Analysis (MOFA), a computational method for discovering the principal sources of variation in multi-omics data sets. MOFA infers a set of (hidden) factors that capture biological and technical sources of variability. It disentangles axes of heterogeneity that are shared across multiple modalities and those specific to individual data modalities. The learnt factors enable a variety of downstream analyses, including identification of sample subgroups, data imputation and the detection of outlier samples. We applied MOFA to a cohort of 200 patient samples of chronic lymphocytic leukaemia, profiled for somatic mutations, RNA expression, DNA methylation and ex vivo drug responses. MOFA identified major dimensions of disease heterogeneity, including immunoglobulin heavy-chain variable region status, trisomy of chromosome 12 and previously underappreciated drivers, such as response to oxidative stress. In a second application, we used MOFA to analyse single-cell multi-omics data, identifying coordinated transcriptional and epigenetic changes along cell differentiation.

Journal ArticleDOI
TL;DR: Scmap is presented, a method for projecting cells from an scRNA-seq data set onto cell types or individual cells from other experiments, as well as a guide for comparing data across experiments.
Abstract: Single-cell RNA-seq (scRNA-seq) allows researchers to define cell types on the basis of unsupervised clustering of the transcriptome. However, differences in experimental methods and computational analyses make it challenging to compare data across experiments. Here we present scmap (http://bioconductor.org/packages/scmap; web version at http://www.sanger.ac.uk/science/tools/scmap), a method for projecting cells from an scRNA-seq data set onto cell types or individual cells from other experiments.

Journal ArticleDOI
07 Mar 2018-Mbio
TL;DR: The first large-scale emergence and spread of a novel extensively drug-resistant S. Typhi clone in Sindh, Pakistan is reported, highlighting the evolving threat of antibiotic resistance in S. typhi and the value of antibiotic susceptibility testing and whole-genome sequencing in understanding emerging infectious diseases.
Abstract: Antibiotic resistance is a major problem in Salmonella enterica serovar Typhi, the causative agent of typhoid. Multidrug-resistant (MDR) isolates are prevalent in parts of Asia and Africa and are often associated with the dominant H58 haplotype. Reduced susceptibility to fluoroquinolones is also widespread, and sporadic cases of resistance to third-generation cephalosporins or azithromycin have also been reported. Here, we report the first large-scale emergence and spread of a novel S Typhi clone harboring resistance to three first-line drugs (chloramphenicol, ampicillin, and trimethoprim-sulfamethoxazole) as well as fluoroquinolones and third-generation cephalosporins in Sindh, Pakistan, which we classify as extensively drug resistant (XDR). Over 300 XDR typhoid cases have emerged in Sindh, Pakistan, since November 2016. Additionally, a single case of travel-associated XDR typhoid has recently been identified in the United Kingdom. Whole-genome sequencing of over 80 of the XDR isolates revealed remarkable genetic clonality and sequence conservation, identified a large number of resistance determinants, and showed that these isolates were of haplotype H58. The XDR S Typhi clone encodes a chromosomally located resistance region and harbors a plasmid encoding additional resistance elements, including the blaCTX-M-15 extended-spectrum β-lactamase, and carrying the qnrS fluoroquinolone resistance gene. This antibiotic resistance-associated IncY plasmid exhibited high sequence identity to plasmids found in other enteric bacteria isolated from widely distributed geographic locations. This study highlights three concerning problems: the receding antibiotic arsenal for typhoid treatment, the ability of S Typhi to transform from MDR to XDR in a single step by acquisition of a plasmid, and the ability of XDR clones to spread globally.IMPORTANCE Typhoid fever is a severe disease caused by the Gram-negative bacterium Salmonella enterica serovar Typhi. Antibiotic-resistant S Typhi strains have become increasingly common. Here, we report the first large-scale emergence and spread of a novel extensively drug-resistant (XDR) S Typhi clone in Sindh, Pakistan. The XDR S Typhi is resistant to the majority of drugs available for the treatment of typhoid fever. This study highlights the evolving threat of antibiotic resistance in S Typhi and the value of antibiotic susceptibility testing and whole-genome sequencing in understanding emerging infectious diseases. We genetically characterized the XDR S Typhi to investigate the phylogenetic relationship between these isolates and a global collection of S Typhi isolates and to identify multiple genes linked to antibiotic resistance. This S Typhi clone harbored a promiscuous antibiotic resistance plasmid previously identified in other enteric bacteria. The increasing antibiotic resistance in S Typhi observed here adds urgency to the need for typhoid prevention measures.

Journal ArticleDOI
10 Aug 2018-Science
TL;DR: It is determined that Wilms tumor, a pediatric kidney cancer, originates from aberrant fetal cells, whereas adult kidney cancers are likely derived from a specific subtype of proximal convoluted tubular cell.
Abstract: Messenger RNA encodes cellular function and phenotype. In the context of human cancer, it defines the identities of malignant cells and the diversity of tumor tissue. We studied 72,501 single-cell transcriptomes of human renal tumors and normal tissue from fetal, pediatric, and adult kidneys. We matched childhood Wilms tumor with specific fetal cell types, thus providing evidence for the hypothesis that Wilms tumor cells are aberrant fetal cells. In adult renal cell carcinoma, we identified a canonical cancer transcriptome that matched a little-known subtype of proximal convoluted tubular cell. Analyses of the tumor composition defined cancer-associated normal cells and delineated a complex vascular endothelial growth factor (VEGF) signaling circuit. Our findings reveal the precise cellular identities and compositions of human kidney tumors.

Journal ArticleDOI
TL;DR: This work reports the first single-cell method for parallel chromatin accessibility, DNA methylation and transcriptome profiling and validate scNMT-seq by applying it to differentiating mouse embryonic stem cells, finding links between all three molecular layers and revealing dynamic coupling between epigenomic layers during differentiation.
Abstract: Parallel single-cell sequencing protocols represent powerful methods for investigating regulatory relationships, including epigenome-transcriptome interactions. Here, we report a single-cell method for parallel chromatin accessibility, DNA methylation and transcriptome profiling. scNMT-seq (single-cell nucleosome, methylation and transcription sequencing) uses a GpC methyltransferase to label open chromatin followed by bisulfite and RNA sequencing. We validate scNMT-seq by applying it to differentiating mouse embryonic stem cells, finding links between all three molecular layers and revealing dynamic coupling between epigenomic layers during differentiation.

Journal ArticleDOI
TL;DR: Recent advances in understanding the molecular determinants of influenza virus immune escape, sources of evolutionary selection pressure, population dynamics of influenza viruses and prospects for better influenza virus control are discussed.
Abstract: Despite decades of surveillance and pharmaceutical and non-pharmaceutical interventions, seasonal influenza viruses continue to cause epidemics around the world each year. The key process underlying these recurrent epidemics is the evolution of the viruses to escape the immunity that is induced by prior infection or vaccination. Although we are beginning to understand the processes that underlie the evolutionary dynamics of seasonal influenza viruses, the timing and nature of emergence of new virus strains remain mostly unpredictable. In this Review, we discuss recent advances in understanding the molecular determinants of influenza virus immune escape, sources of evolutionary selection pressure, population dynamics of influenza viruses and prospects for better influenza virus control.

Journal ArticleDOI
19 Apr 2018-Cell
TL;DR: The insights reconcile the variable clinical behavior of ccRCC and suggest evolutionary potential as a biomarker for both intervention and surveillance and identify genetic diversity and chromosomal complexity as determinants of patient outcome.

Journal ArticleDOI
TL;DR: Vg as discussed by the authors is a toolkit of computational methods for creating, manipulating, and using these structures as references at the scale of the human genome, which provides an efficient approach to mapping reads onto arbitrary variation graphs using generalized compressed suffix arrays, with improved accuracy over alignment to a linear reference.
Abstract: Reference genomes guide our interpretation of DNA sequence data. However, conventional linear references represent only one version of each locus, ignoring variation in the population. Poor representation of an individual's genome sequence impacts read mapping and introduces bias. Variation graphs are bidirected DNA sequence graphs that compactly represent genetic variation across a population, including large-scale structural variation such as inversions and duplications. Previous graph genome software implementations have been limited by scalability or topological constraints. Here we present vg, a toolkit of computational methods for creating, manipulating, and using these structures as references at the scale of the human genome. vg provides an efficient approach to mapping reads onto arbitrary variation graphs using generalized compressed suffix arrays, with improved accuracy over alignment to a linear reference, and effectively removing reference bias. These capabilities make using variation graphs as references for DNA sequencing practical at a gigabase scale, or at the topological complexity of de novo assemblies.

Journal ArticleDOI
TL;DR: The association of genetically predicted Lp(a) concentration with CHD risk appeared to be independent of changes in LDL-C level owing to genetic variants that mimic the relationship of statins, PCSK9 inhibitors, and ezetimibe withCHD risk.
Abstract: Importance Human genetic studies have indicated that plasma lipoprotein(a) (Lp[a]) is causally associated with the risk of coronary heart disease (CHD), but randomized trials of several therapies that reduce Lp(a) levels by 25% to 35% have not provided any evidence that lowering Lp(a) level reduces CHD risk. Objective To estimate the magnitude of the change in plasma Lp(a) levels needed to have the same evidence of an association with CHD risk as a 38.67-mg/dL (ie, 1-mmol/L) change in low-density lipoprotein cholesterol (LDL-C) level, a change that has been shown to produce a clinically meaningful reduction in the risk of CHD. Design, Setting, and Participants A mendelian randomization analysis was conducted using individual participant data from 5 studies and with external validation using summarized data from 48 studies. Population-based prospective cohort and case-control studies featured 20 793 individuals with CHD and 27 540 controls with individual participant data, whereas summarized data included 62 240 patients with CHD and 127 299 controls. Data were analyzed from November 2016 to March 2018. Exposures GeneticLPA score and plasma Lp(a) mass concentration. Main Outcomes and Measures Coronary heart disease. Results Of the included study participants, 53% were men, all were of white European ancestry, and the mean age was 57.5 years. The association of genetically predicted Lp(a) with CHD risk was linearly proportional to the absolute change in Lp(a) concentration. A 10-mg/dL lower genetically predicted Lp(a) concentration was associated with a 5.8% lower CHD risk (odds ratio [OR], 0.942; 95% CI, 0.933-0.951;P = 3 × 10−37), whereas a 10-mg/dL lower genetically predicted LDL-C level estimated using an LDL-C genetic score was associated with a 14.5% lower CHD risk (OR, 0.855; 95% CI, 0.818-0.893;P = 2 × 10−12). Thus, a 101.5-mg/dL change (95% CI, 71.0-137.0) in Lp(a) concentration had the same association with CHD risk as a 38.67-mg/dL change in LDL-C level. The association of genetically predicted Lp(a) concentration with CHD risk appeared to be independent of changes in LDL-C level owing to genetic variants that mimic the relationship of statins, PCSK9 inhibitors, and ezetimibe with CHD risk. Conclusions and Relevance The clinical benefit of lowering Lp(a) is likely to be proportional to the absolute reduction in Lp(a) concentration. Large absolute reductions in Lp(a) of approximately 100 mg/dL may be required to produce a clinically meaningful reduction in the risk of CHD similar in magnitude to what can be achieved by lowering LDL-C level by 38.67 mg/dL (ie, 1 mmol/L).

Journal ArticleDOI
TL;DR: Phandango is an interactive application running in a web browser allowing fast exploration of large-scale population genomics datasets combining the output from multiple genomic analysis methods in an intuitive and interactive manner.
Abstract: Summary Fully exploiting the wealth of data in current bacterial population genomics datasets requires synthesizing and integrating different types of analysis across millions of base pairs in hundreds or thousands of isolates. Current approaches often use static representations of phylogenetic, epidemiological, statistical and evolutionary analysis results that are difficult to relate to one another. Phandango is an interactive application running in a web browser allowing fast exploration of large-scale population genomics datasets combining the output from multiple genomic analysis methods in an intuitive and interactive manner. Availability and implementation Phandango is a web application freely available for use at www.phandango.net and includes a diverse collection of datasets as examples. Source code together with a detailed wiki page is available on GitHub at https://github.com/jameshadfield/phandango.

Journal ArticleDOI
TL;DR: A comprehensive integrated genomic study of 74 MPMs provided a deeper understanding of histology-independent determinants of aggressive behavior, defined a novel genomic subtype with TP53 and SETDB1 mutations and extensive loss of heterozygosity, and discovered strong expression of the immune-checkpoint gene VISTA in epithelioid MPM.
Abstract: Malignant pleural mesothelioma (MPM) is a highly lethal cancer of the lining of the chest cavity. To expand our understanding of MPM, we conducted a comprehensive integrated genomic study, including the most detailed analysis of BAP1 alterations to date. We identified histology-independent molecular prognostic subsets, and defined a novel genomic subtype with TP53 and SETDB1 mutations and extensive loss of heterozygosity. We also report strong expression of the immune checkpoint gene VISTA in epithelioid MPM, strikingly higher than in other solid cancers, with implications for the immune response to MPM and for its immunotherapy. Our findings highlight new avenues for further investigation of MPM biology and novel therapeutic options.

Journal ArticleDOI
05 Sep 2018-Nature
TL;DR: Analysis of blood from a healthy human show that haematopoietic stem cells increase rapidly in numbers through early life, reaching a stable plateau in adulthood, and contribute to myeloid and B lymphocyte populations throughout life.
Abstract: Haematopoietic stem cells drive blood production, but their population size and lifetime dynamics have not been quantified directly in humans Here we identified 129,582 spontaneous, genome-wide somatic mutations in 140 single-cell-derived haematopoietic stem and progenitor colonies from a healthy 59-year-old man and applied population-genetics approaches to reconstruct clonal dynamics Cell divisions from early embryogenesis were evident in the phylogenetic tree; all blood cells were derived from a common ancestor that preceded gastrulation The size of the stem cell population grew steadily in early life, reaching a stable plateau by adolescence We estimate the numbers of haematopoietic stem cells that are actively making white blood cells at any one time to be in the range of 50,000-200,000 We observed adult haematopoietic stem cell clones that generate multilineage outputs, including granulocytes and B lymphocytes Harnessing naturally occurring mutations to report the clonal architecture of an organ enables the high-resolution reconstruction of somatic cell dynamics in humans

Journal ArticleDOI
19 Apr 2018-Cell
TL;DR: Analysis of whole genomes from 95 biopsies across 33 patients with clear cell renal cell carcinoma suggests that the number of cells with 3p loss capable of initiating sporadic tumors is no more than a few hundred.

Journal ArticleDOI
TL;DR: SpatialDE is described, a statistical test to identify genes with spatial patterns of expression variation from multiplexed imaging or spatial RNA-sequencing data and implements 'automatic expression histology', a spatial gene-clustering approach that enables expression-based tissue histology.
Abstract: Technological advances have made it possible to measure spatially resolved gene expression at high throughput. However, methods to analyze these data are not established. Here we describe SpatialDE, a statistical test to identify genes with spatial patterns of expression variation from multiplexed imaging or spatial RNA-sequencing data. SpatialDE also implements 'automatic expression histology', a spatial gene-clustering approach that enables expression-based tissue histology.