scispace - formally typeset
Search or ask a question

Showing papers on "Gene published in 2020"


Journal ArticleDOI
27 May 2020-Nature
TL;DR: A catalogue of predicted loss-of-function variants in 125,748 whole-exome and 15,708 whole-genome sequencing datasets from the Genome Aggregation Database (gnomAD) reveals the spectrum of mutational constraints that affect these human protein-coding genes.
Abstract: Genetic variants that inactivate protein-coding genes are a powerful source of information about the phenotypic consequences of gene disruption: genes that are crucial for the function of an organism will be depleted of such variants in natural populations, whereas non-essential genes will tolerate their accumulation. However, predicted loss-of-function variants are enriched for annotation errors, and tend to be found at extremely low frequencies, so their analysis requires careful variant annotation and very large sample sizes1. Here we describe the aggregation of 125,748 exomes and 15,708 genomes from human sequencing studies into the Genome Aggregation Database (gnomAD). We identify 443,769 high-confidence predicted loss-of-function variants in this cohort after filtering for artefacts caused by sequencing and annotation errors. Using an improved model of human mutation rates, we classify human protein-coding genes along a spectrum that represents tolerance to inactivation, validate this classification using data from model organisms and engineered human cells, and show that it can be used to improve the power of gene discovery for both common and rare diseases. A catalogue of predicted loss-of-function variants in 125,748 whole-exome and 15,708 whole-genome sequencing datasets from the Genome Aggregation Database (gnomAD) reveals the spectrum of mutational constraints that affect these human protein-coding genes.

4,913 citations


Journal ArticleDOI
TL;DR: ScVelo reconstructs transient cell states and differentiation pathways from single-cell RNA-sequencing data, and infer gene-specific rates of transcription, splicing and degradation, recover each cell’s position in the underlying differentiation processes and detect putative driver genes.
Abstract: RNA velocity has opened up new ways of studying cellular differentiation in single-cell RNA-sequencing data. It describes the rate of gene expression change for an individual gene at a given time point based on the ratio of its spliced and unspliced messenger RNA (mRNA). However, errors in velocity estimates arise if the central assumptions of a common splicing rate and the observation of the full splicing dynamics with steady-state mRNA levels are violated. Here we present scVelo, a method that overcomes these limitations by solving the full transcriptional dynamics of splicing kinetics using a likelihood-based dynamical model. This generalizes RNA velocity to systems with transient cell states, which are common in development and in response to perturbations. We apply scVelo to disentangling subpopulation kinetics in neurogenesis and pancreatic endocrinogenesis. We infer gene-specific rates of transcription, splicing and degradation, recover each cell's position in the underlying differentiation processes and detect putative driver genes. scVelo will facilitate the study of lineage decisions and gene regulation.

1,041 citations


Journal ArticleDOI
TL;DR: The findings suggest that the virus is evolving and European, North American and Asian strains might coexist, each of them characterized by a different mutation pattern.
Abstract: SARS-CoV-2 is a RNA coronavirus responsible for the pandemic of the Severe Acute Respiratory Syndrome (COVID-19). RNA viruses are characterized by a high mutation rate, up to a million times higher than that of their hosts. Virus mutagenic capability depends upon several factors, including the fidelity of viral enzymes that replicate nucleic acids, as SARS-CoV-2 RNA dependent RNA polymerase (RdRp). Mutation rate drives viral evolution and genome variability, thereby enabling viruses to escape host immunity and to develop drug resistance. We analyzed 220 genomic sequences from the GISAID database derived from patients infected by SARS-CoV-2 worldwide from December 2019 to mid-March 2020. SARS-CoV-2 reference genome was obtained from the GenBank database. Genomes alignment was performed using Clustal Omega. Mann–Whitney and Fisher-Exact tests were used to assess statistical significance. We characterized 8 novel recurrent mutations of SARS-CoV-2, located at positions 1397, 2891, 14408, 17746, 17857, 18060, 23403 and 28881. Mutations in 2891, 3036, 14408, 23403 and 28881 positions are predominantly observed in Europe, whereas those located at positions 17746, 17857 and 18060 are exclusively present in North America. We noticed for the first time a silent mutation in RdRp gene in England (UK) on February 9th, 2020 while a different mutation in RdRp changing its amino acid composition emerged on February 20th, 2020 in Italy (Lombardy). Viruses with RdRp mutation have a median of 3 point mutations [range: 2–5], otherwise they have a median of 1 mutation [range: 0–3] (p value < 0.001). These findings suggest that the virus is evolving and European, North American and Asian strains might coexist, each of them characterized by a different mutation pattern. The contribution of the mutated RdRp to this phenomenon needs to be investigated. To date, several drugs targeting RdRp enzymes are being employed for SARS-CoV-2 infection treatment. Some of them have a predicted binding moiety in a SARS-CoV-2 RdRp hydrophobic cleft, which is adjacent to the 14408 mutation we identified. Consequently, it is important to study and characterize SARS-CoV-2 RdRp mutation in order to assess possible drug-resistance viral phenotypes. It is also important to recognize whether the presence of some mutations might correlate with different SARS-CoV-2 mortality rates.

842 citations


Journal ArticleDOI
18 Nov 2020-Nature
TL;DR: Droplet- and plate-based single cell RNA sequencing applied to ~75,000 human cells across all lung tissue compartments and circulating blood, combined with a multi-pronged cell annotation approach, have allowed them to define the gene expression profiles and anatomical locations of 58 cell populations in the human lung.
Abstract: Although single-cell RNA sequencing studies have begun to provide compendia of cell expression profiles1–9, it has been difficult to systematically identify and localize all molecular cell types in individual organs to create a full molecular cell atlas. Here, using droplet- and plate-based single-cell RNA sequencing of approximately 75,000 human cells across all lung tissue compartments and circulating blood, combined with a multi-pronged cell annotation approach, we create an extensive cell atlas of the human lung. We define the gene expression profiles and anatomical locations of 58 cell populations in the human lung, including 41 out of 45 previously known cell types and 14 previously unknown ones. This comprehensive molecular atlas identifies the biochemical functions of lung cells and the transcription factors and markers for making and monitoring them; defines the cell targets of circulating hormones and predicts local signalling interactions and immune cell homing; and identifies cell types that are directly affected by lung disease genes and respiratory viruses. By comparing human and mouse data, we identified 17 molecular cell types that have been gained or lost during lung evolution and others with substantially altered expression profiles, revealing extensive plasticity of cell types and cell-type-specific gene expression during organ evolution including expression switches between cell types. This atlas provides the molecular foundation for investigating how lung cell identities, functions and interactions are achieved in development and tissue engineering and altered in disease and evolution. Expression profiling on 75,000 single cells creates a comprehensive cell atlas of the human lung that includes 41 out of 45 previously known cell types and 14 new ones.

795 citations


Journal ArticleDOI
TL;DR: NicheNet is presented, a method that predicts ligand–target links between interacting cells by combining their expression data with prior knowledge on signaling and gene regulatory networks, and can infer active ligands and their gene regulatory effects on interacting cells.
Abstract: Computational methods that model how gene expression of a cell is influenced by interacting cells are lacking. We present NicheNet (https://github.com/saeyslab/nichenetr), a method that predicts ligand-target links between interacting cells by combining their expression data with prior knowledge on signaling and gene regulatory networks. We applied NicheNet to tumor and immune cell microenvironment data and demonstrate that NicheNet can infer active ligands and their gene regulatory effects on interacting cells.

681 citations


Posted ContentDOI
26 Jan 2020-bioRxiv
TL;DR: A biological background for the epidemic investigation of the 2019-nCov infection disease is provided, and the result indicates that the ACE2 virus receptor expression is concentrated in a small population of type II alveolar cells (AT2).
Abstract: A novel coronavirus (2019-nCov) was identified in Wuhan, Hubei Province, China in December of 2019. This new coronavirus has resulted in thousands of cases of lethal disease in China, with additional patients being identified in a rapidly growing number internationally. 2019-nCov was reported to share the same receptor, Angiotensin-converting enzyme 2 (ACE2), with SARS-Cov. Here based on the public database and the state-of-the-art single-cell RNA-Seq technique, we analyzed the ACE2 RNA expression profile in the normal human lungs. The result indicates that the ACE2 virus receptor expression is concentrated in a small population of type II alveolar cells (AT2). Surprisingly, we found that this population of ACE2-expressing AT2 also highly expressed many other genes that positively regulating viral reproduction and transmission. A comparison between eight individual samples demonstrated that the Asian male one has an extremely large number of ACE2-expressing cells in the lung. This study provides a biological background for the epidemic investigation of the 2019-nCov infection disease, and could be informative for future anti-ACE2 therapeutic strategy development.

631 citations


Journal ArticleDOI
Yu Zhao1, Zixian Zhao1, Yujia Wang1, Yueqing Zhou1, Yu Ma, Wei Zuo 
TL;DR: The recently developed single-cell RNA-sequencing technology enables us to study the ACE2 expression in each cell type and provides quantitative information at a single- cell resolution, and shows that in the normal human lung, ACE2 is mainly expressed by type II alveolar (AT2) and type I alveolars (AT1) epithelial cells.
Abstract: A novel coronavirus SARS-CoV-2 was identified in Wuhan, Hubei Province, China in December of 2019. According to WHO report, this new coronavirus has resulted in 76,392 confirmed infections and 2,348 deaths in China by 22 February, 2020, with additional patients being identified in a rapidly growing number internationally. SARS-CoV-2 was reported to share the same receptor, Angiotensin-converting enzyme 2 (ACE2), with SARS-CoV. Here based on the public database and the state-of-the-art single-cell RNA-Seq technique, we analyzed the ACE2 RNA expression profile in the normal human lungs. The result indicates that the ACE2 virus receptor expression is concentrated in a small population of type II alveolar cells (AT2). Surprisingly, we found that this population of ACE2-expressing AT2 also highly expressed many other genes that positively regulating viral entry, reproduction and transmission. This study provides a biological background for the epidemic investigation of the COVID-19, and could be informative for future anti-ACE2 therapeutic strategy development.

610 citations


Journal ArticleDOI
TL;DR: This Review discusses the identification of G4s and evidence for their formation in cells using chemical biology, imaging and genomic technologies, and discusses the connection between G4 formation and synthetic lethality in cancer cells, and recent progress towards considering G 4s as therapeutic targets in human diseases.
Abstract: DNA and RNA can adopt various secondary structures. Four-stranded G-quadruplex (G4) structures form through self-recognition of guanines into stacked tetrads, and considerable biophysical and structural evidence exists for G4 formation in vitro. Computational studies and sequencing methods have revealed the prevalence of G4 sequence motifs at gene regulatory regions in various genomes, including in humans. Experiments using chemical, molecular and cell biology methods have demonstrated that G4s exist in chromatin DNA and in RNA, and have linked G4 formation with key biological processes ranging from transcription and translation to genome instability and cancer. In this Review, we first discuss the identification of G4s and evidence for their formation in cells using chemical biology, imaging and genomic technologies. We then discuss possible functions of DNA G4s and their interacting proteins, particularly in transcription, telomere biology and genome instability. Roles of RNA G4s in RNA biology, especially in translation, are also discussed. Furthermore, we consider the emerging relationships of G4s with chromatin and with RNA modifications. Finally, we discuss the connection between G4 formation and synthetic lethality in cancer cells, and recent progress towards considering G4s as therapeutic targets in human diseases.

543 citations


Journal ArticleDOI
29 Jul 2020-Nature
TL;DR: The spectrum of RBP binding throughout the transcriptome and the connections between these interactions and various aspects of RNA biology, including RNA stability, splicing regulation and RNA localization are described.
Abstract: Many proteins regulate the expression of genes by binding to specific regions encoded in the genome1. Here we introduce a new data set of RNA elements in the human genome that are recognized by RNA-binding proteins (RBPs), generated as part of the Encyclopedia of DNA Elements (ENCODE) project phase III. This class of regulatory elements functions only when transcribed into RNA, as they serve as the binding sites for RBPs that control post-transcriptional processes such as splicing, cleavage and polyadenylation, and the editing, localization, stability and translation of mRNAs. We describe the mapping and characterization of RNA elements recognized by a large collection of human RBPs in K562 and HepG2 cells. Integrative analyses using five assays identify RBP binding sites on RNA and chromatin in vivo, the in vitro binding preferences of RBPs, the function of RBP binding sites and the subcellular localization of RBPs, producing 1,223 replicated data sets for 356 RBPs. We describe the spectrum of RBP binding throughout the transcriptome and the connections between these interactions and various aspects of RNA biology, including RNA stability, splicing regulation and RNA localization. These data expand the catalogue of functional elements encoded in the human genome by the addition of a large set of elements that function at the RNA level by interacting with RBPs.

542 citations


Journal ArticleDOI
TL;DR: The sensitivity of the two viruses to three established inhibitors of coronavirus replication is very similar, but that SARS-CoV-2 infection was substantially more sensitive to pre-treatment of cells with pegylated interferon alpha.
Abstract: The sudden emergence of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) at the end of 2019 from the Chinese province of Hubei and its subsequent pandemic spread highlight the importance of understanding the full molecular details of coronavirus infection and pathogenesis Here, we compared a variety of replication features of SARS-CoV-2 and SARS-CoV and analysed the cytopathology caused by the two closely related viruses in the commonly used Vero E6 cell line Compared to SARS-CoV, SARS-CoV-2 generated higher levels of intracellular viral RNA, but strikingly about 50-fold less infectious viral progeny was recovered from the culture medium Immunofluorescence microscopy of SARS-CoV-2-infected cells established extensive cross-reactivity of antisera previously raised against a variety of non-structural proteins, membrane and nucleocapsid protein of SARS-CoV Electron microscopy revealed that the ultrastructural changes induced by the two SARS viruses are very similar and occur within comparable time frames after infection Furthermore, we determined that the sensitivity of the two viruses to three established inhibitors of coronavirus replication (remdesivir, alisporivir and chloroquine) is very similar, but that SARS-CoV-2 infection was substantially more sensitive to pre-treatment of cells with pegylated interferon alpha An important difference between the two viruses is the fact that - upon passaging in Vero E6 cells - SARS-CoV-2 apparently is under strong selection pressure to acquire adaptive mutations in its spike protein gene These mutations change or delete a putative furin-like cleavage site in the region connecting the S1 and S2 domains and result in a very prominent phenotypic change in plaque assays

445 citations


Journal ArticleDOI
TL;DR: Using a genotype-driven approach, this disorder is identified that connects seemingly unrelated adult-onset inflammatory syndromes and is named the VEXAS (vacuoles, E1 enzyme, X-linked, autoinflammatory, somatic) syndrome.
Abstract: Background Adult-onset inflammatory syndromes often manifest with overlapping clinical features. Variants in ubiquitin-related genes, previously implicated in autoinflammatory disease, may...

Posted Content
TL;DR: The philosophy behind ITensor, a system for programming tensor network calculations with an interface modeled on tensor diagram notation, and examples of each part of the interface including Index objects, the ITensor product operator, Tensor factorizations, tensor storage types, algorithms for matrix product state (MPS) and matrix product operator (MPO) tensor networks, and the NDTensors library are discussed.
Abstract: ITensor is a system for programming tensor network calculations with an interface modeled on tensor diagram notation, which allows users to focus on the connectivity of a tensor network without manually bookkeeping tensor indices. The ITensor interface rules out common programming errors and enables rapid prototyping of tensor network algorithms. After discussing the philosophy behind the ITensor approach, we show examples of each part of the interface including Index objects, the ITensor product operator, tensor factorizations, tensor storage types, algorithms for matrix product state (MPS) and matrix product operator (MPO) tensor networks, quantum number conserving block-sparse tensors, and the NDTensors library. We also review publications that have used ITensor for quantum many-body physics and for other areas where tensor networks are increasingly applied. To conclude we discuss promising features and optimizations to be added in the future.

Journal ArticleDOI
TL;DR: Phylogenetic analyses of virus hallmark genes combined with analyses of gene-sharing networks show that replication modules of five BCs evolved from a common ancestor that encoded an RNA-directed RNA polymerase or a reverse transcriptase, and propose a comprehensive hierarchical taxonomy of viruses.
Abstract: Viruses and mobile genetic elements are molecular parasites or symbionts that coevolve with nearly all forms of cellular life. The route of virus replication and protein expression is determined by the viral genome type. Comparison of these routes led to the classification of viruses into seven "Baltimore classes" (BCs) that define the major features of virus reproduction. However, recent phylogenomic studies identified multiple evolutionary connections among viruses within each of the BCs as well as between different classes. Due to the modular organization of virus genomes, these relationships defy simple representation as lines of descent but rather form complex networks. Phylogenetic analyses of virus hallmark genes combined with analyses of gene-sharing networks show that replication modules of five BCs (three classes of RNA viruses and two classes of reverse-transcribing viruses) evolved from a common ancestor that encoded an RNA-directed RNA polymerase or a reverse transcriptase. Bona fide viruses evolved from this ancestor on multiple, independent occasions via the recruitment of distinct cellular proteins as capsid subunits and other structural components of virions. The single-stranded DNA (ssDNA) viruses are a polyphyletic class, with different groups evolving by recombination between rolling-circle-replicating plasmids, which contributed the replication protein, and positive-sense RNA viruses, which contributed the capsid protein. The double-stranded DNA (dsDNA) viruses are distributed among several large monophyletic groups and arose via the combination of distinct structural modules with equally diverse replication modules. Phylogenomic analyses reveal the finer structure of evolutionary connections among RNA viruses and reverse-transcribing viruses, ssDNA viruses, and large subsets of dsDNA viruses. Taken together, these analyses allow us to outline the global organization of the virus world. Here, we describe the key aspects of this organization and propose a comprehensive hierarchical taxonomy of viruses.

Journal ArticleDOI
TL;DR: Antisense oligonucleotides offer promise to modulate cancer-relevant alternative splicing decisions, with proof of concept for this type of therapy demonstrated by Nusinersen, a first-in-class treatment for patients with spinal muscular atrophy.
Abstract: Removal of introns from messenger RNA precursors (pre-mRNA splicing) is an essential step for the expression of most eukaryotic genes. Alternative splicing enables the regulated generation of multiple mRNA and protein products from a single gene. Cancer cells have general as well as cancer type-specific and subtype-specific alterations in the splicing process that can have prognostic value and contribute to every hallmark of cancer progression, including cancer immune responses. These splicing alterations are often linked to the occurrence of cancer driver mutations in genes encoding either core components or regulators of the splicing machinery. Of therapeutic relevance, the transcriptomic landscape of cancer cells makes them particularly vulnerable to pharmacological inhibition of splicing. Small-molecule splicing modulators are currently in clinical trials and, in addition to splice site-switching antisense oligonucleotides, offer the promise of novel and personalized approaches to cancer treatment.

Journal ArticleDOI
TL;DR: These findings not only help to explain the poor interferon response in COVID-19 patients, but also describe the emergence of natural SARS-CoV-2 quasispecies with an extended ORF3b gene that may potentially affect CO VID-19 pathogenesis.

Posted ContentDOI
13 Sep 2020-medRxiv
TL;DR: This work identifies biological processes of pathophysiological relevance to schizophrenia, shows convergence of common and rare variant associations in schizophrenia and neurodevelopmental disorders, and provides a rich resource of priority genes and variants to advance mechanistic studies.
Abstract: Schizophrenia is a psychiatric disorder whose pathophysiology is largely unknown. It has a heritability of 60-80%, much of which is attributable to common risk alleles, suggesting genome-wide association studies can inform our understanding of aetiology. Here, in 69,369 people with schizophrenia and 236,642 controls, we report common variant associations at 270 distinct loci. Using fine-mapping and functional genomic data, we prioritise 19 genes based on protein-coding or UTR variation, and 130 genes in total as likely to explain these associations. Fine-mapped candidates were enriched for genes associated with rare disruptive coding variants in people with schizophrenia, including the glutamate receptor subunit GRIN2A and transcription factor SP4, and were also enriched for genes implicated by such variants in autism and developmental disorder. Associations were concentrated in genes expressed in CNS neurons, both excitatory and inhibitory, but not other tissues or cell types, and implicated fundamental processes related to neuronal function, particularly synaptic organisation, differentiation and transmission. We identify biological processes of pathophysiological relevance to schizophrenia, show convergence of common and rare variant associations in schizophrenia and neurodevelopmental disorders, and provide a rich resource of priority genes and variants to advance mechanistic studies.

Journal ArticleDOI
TL;DR: Smart-seq3 is introduced, which combines full-length transcriptome coverage with a 5′ unique molecular identifier RNA counting strategy that enables in silico reconstruction of thousands of RNA molecules per cell.
Abstract: Large-scale sequencing of RNA from individual cells can reveal patterns of gene, isoform and allelic expression across cell types and states1. However, current short-read single-cell RNA-sequencing methods have limited ability to count RNAs at allele and isoform resolution, and long-read sequencing techniques lack the depth required for large-scale applications across cells2,3. Here we introduce Smart-seq3, which combines full-length transcriptome coverage with a 5' unique molecular identifier RNA counting strategy that enables in silico reconstruction of thousands of RNA molecules per cell. Of the counted and reconstructed molecules, 60% could be directly assigned to allelic origin and 30-50% to specific isoforms, and we identified substantial differences in isoform usage in different mouse strains and human cell types. Smart-seq3 greatly increased sensitivity compared to Smart-seq2, typically detecting thousands more transcripts per cell. We expect that Smart-seq3 will enable large-scale characterization of cell types and states across tissues and organisms.

Journal ArticleDOI
TL;DR: Panaroo is introduced, a graph-based pangenome clustering tool that is able to account for many of the sources of error introduced during the annotation of prokaryotic genome assemblies.
Abstract: Population-level comparisons of prokaryotic genomes must take into account the substantial differences in gene content resulting from horizontal gene transfer, gene duplication and gene loss. However, the automated annotation of prokaryotic genomes is imperfect, and errors due to fragmented assemblies, contamination, diverse gene families and mis-assemblies accumulate over the population, leading to profound consequences when analysing the set of all genes found in a species. Here, we introduce Panaroo, a graph-based pangenome clustering tool that is able to account for many of the sources of error introduced during the annotation of prokaryotic genome assemblies. Panaroo is available at https://github.com/gtonkinhill/panaroo .

Journal ArticleDOI
11 Sep 2020-Science
TL;DR: A catalog of sex differences in gene expression and its genetic regulation across 44 human tissue sources surveyed by the GTEx project (v8 data release), analyzing 16,245 RNA-sequencing samples and genotypes of 838 adult individuals is generated.
Abstract: Many complex human phenotypes exhibit sex-differentiated characteristics. However, the molecular mechanisms underlying these differences remain largely unknown. We generated a catalog of sex differences in gene expression and in the genetic regulation of gene expression across 44 human tissue sources surveyed by the Genotype-Tissue Expression project (GTEx, v8 release). We demonstrate that sex influences gene expression levels and cellular composition of tissue samples across the human body. A total of 37% of all genes exhibit sex-biased expression in at least one tissue. We identify cis expression quantitative trait loci (eQTLs) with sex-differentiated effects and characterize their cellular origin. By integrating sex-biased eQTLs with genome-wide association study data, we identify 58 gene-trait associations that are driven by genetic regulation of gene expression in a single sex. These findings provide an extensive characterization of sex differences in the human transcriptome and its genetic regulation.

Journal ArticleDOI
TL;DR: The population structure of live microglia purified from human cerebral cortex samples obtained at autopsy and during neurosurgical procedures is investigated, and it is found that some subsets are enriched for disease-related genes and RNA signatures.
Abstract: The extent of microglial heterogeneity in humans remains a central yet poorly explored question in light of the development of therapies targeting this cell type. Here, we investigate the population structure of live microglia purified from human cerebral cortex samples obtained at autopsy and during neurosurgical procedures. Using single cell RNA sequencing, we find that some subsets are enriched for disease-related genes and RNA signatures. We confirm the presence of four of these microglial subpopulations histologically and illustrate the utility of our data by characterizing further microglial cluster 7, enriched for genes depleted in the cortex of individuals with Alzheimer's disease (AD). Histologically, these cluster 7 microglia are reduced in frequency in AD tissue, and we validate this observation in an independent set of single nucleus data. Thus, our live human microglia identify a range of subtypes, and we prioritize one of these as being altered in AD.

Journal ArticleDOI
TL;DR: A review of how challenges of integrating GWAS results with single-cell sequencing read-outs, designing functionally informed polygenic risk scores (PRS), and validating disease associated genes using genetic engineering have been addressed over the last decade are summarized.
Abstract: Genome-wide association studies (GWAS) have successfully mapped thousands of loci associated with complex traits. These associations could reveal the molecular mechanisms altered in common complex diseases and result in the identification of novel drug targets. However, GWAS have also left a number of outstanding questions. In particular, the majority of disease-associated loci lie in non-coding regions of the genome and, even though they are thought to play a role in gene expression regulation, it is unclear which genes they regulate and in which cell types or physiological contexts this regulation occurs. This has hindered the translation of GWAS findings into clinical interventions. In this review we summarize how these challenges have been addressed over the last decade, with a particular focus on the integration of GWAS results with functional genomics datasets. Firstly, we investigate how the tissues and cell types involved in diseases can be identified using methods that test for enrichment of GWAS variants in genomic annotations. Secondly, we explore how to find the genes regulated by GWAS loci using methods that test for colocalization of GWAS signals with molecular phenotypes such as quantitative trait loci (QTLs). Finally, we highlight potential future research avenues such as integrating GWAS results with single-cell sequencing read-outs, designing functionally informed polygenic risk scores (PRS), and validating disease associated genes using genetic engineering. These tools will be crucial to identify new drug targets for common complex diseases.

Journal ArticleDOI
06 Feb 2020-Nature
TL;DR: The most comprehensive catalogue of cancer-associated gene alterations to date, obtained by characterizing tumour transcriptomes from 1,188 donors of the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium of the International Cancer Genome Consortium (ICGC) and The Cancer Gome Atlas (TCGA) was presented in this article.
Abstract: Transcript alterations often result from somatic changes in cancer genomes1. Various forms of RNA alterations have been described in cancer, including overexpression2, altered splicing3 and gene fusions4; however, it is difficult to attribute these to underlying genomic changes owing to heterogeneity among patients and tumour types, and the relatively small cohorts of patients for whom samples have been analysed by both transcriptome and whole-genome sequencing. Here we present, to our knowledge, the most comprehensive catalogue of cancer-associated gene alterations to date, obtained by characterizing tumour transcriptomes from 1,188 donors of the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA)5. Using matched whole-genome sequencing data, we associated several categories of RNA alterations with germline and somatic DNA alterations, and identified probable genetic mechanisms. Somatic copy-number alterations were the major drivers of variations in total gene and allele-specific expression. We identified 649 associations of somatic single-nucleotide variants with gene expression in cis, of which 68.4% involved associations with flanking non-coding regions of the gene. We found 1,900 splicing alterations associated with somatic mutations, including the formation of exons within introns in proximity to Alu elements. In addition, 82% of gene fusions were associated with structural variants, including 75 of a new class, termed 'bridged' fusions, in which a third genomic location bridges two genes. We observed transcriptomic alteration signatures that differ between cancer types and have associations with variations in DNA mutational signatures. This compendium of RNA alterations in the genomic context provides a rich resource for identifying genes and mechanisms that are functionally implicated in cancer.

Journal ArticleDOI
TL;DR: An overview of the latest findings in IDH-mutated human malignancies is provided, with a focus on glioma, discussing unique biological signatures and proceedings in translational research.
Abstract: Isocitrate dehydrogenase (IDH) enzymes catalyse the oxidative decarboxylation of isocitrate and therefore play key roles in the Krebs cycle and cellular homoeostasis. Major advances in cancer genetics over the past decade have revealed that the genes encoding IDHs are frequently mutated in a variety of human malignancies, including gliomas, acute myeloid leukaemia, cholangiocarcinoma, chondrosarcoma and thyroid carcinoma. A series of seminal studies further elucidated the biological impact of the IDH mutation and uncovered the potential role of IDH mutants in oncogenesis. Notably, the neomorphic activity of the IDH mutants establishes distinctive patterns in cancer metabolism, epigenetic shift and therapy resistance. Novel molecular targeting approaches have been developed to improve the efficacy of therapeutics against IDH-mutated cancers. Here we provide an overview of the latest findings in IDH-mutated human malignancies, with a focus on glioma, discussing unique biological signatures and proceedings in translational research.

Journal ArticleDOI
TL;DR: In primary human T cells, ABE8s achieve 98–99% target modification, which is maintained when multiplexed across three loci, and in human CD34 + cells, Abe8 can recreate a natural allele at the promoter of the γ-globin genes HBG1 and HBG2 with up to 60% efficiency.
Abstract: The foundational adenine base editors (for example, ABE7.10) enable programmable A•T to G•C point mutations but editing efficiencies can be low at challenging loci in primary human cells. Here we further evolve ABE7.10 using a library of adenosine deaminase variants to create ABE8s. At NGG protospacer adjacent motif (PAM) sites, ABE8s result in ~1.5× higher editing at protospacer positions A5-A7 and ~3.2× higher editing at positions A3-A4 and A8-A10 compared with ABE7.10. Non-NGG PAM variants have a ~4.2-fold overall higher on-target editing efficiency than ABE7.10. In human CD34+ cells, ABE8 can recreate a natural allele at the promoter of the γ-globin genes HBG1 and HBG2 with up to 60% efficiency, causing persistence of fetal hemoglobin. In primary human T cells, ABE8s achieve 98-99% target modification, which is maintained when multiplexed across three loci. Delivered as messenger RNA, ABE8s induce no significant levels of single guide RNA (sgRNA)-independent off-target adenine deamination in genomic DNA and very low levels of adenine deamination in cellular mRNA.


Journal ArticleDOI
TL;DR: An integrated algorithm, ImmLnc, is introduced that can help prioritise immune-related lncRNAs in cancer immunotherapy research and serve as a valuable resource for understanding lncRNA function and to advance identification of immunotherapy targets.
Abstract: Long noncoding RNAs (lncRNAs) are emerging as critical regulators of gene expression and they play fundamental roles in immune regulation. Here we introduce an integrated algorithm, ImmLnc, for identifying lncRNA regulators of immune-related pathways. We comprehensively chart the landscape of lncRNA regulation in the immunome across 33 cancer types and show that cancers with similar tissue origin are likely to share lncRNA immune regulators. Moreover, the immune-related lncRNAs are likely to show expression perturbation in cancer and are significantly correlated with immune cell infiltration. ImmLnc can help prioritize cancer-related lncRNAs and further identify three molecular subtypes (proliferative, intermediate, and immunological) of non-small cell lung cancer. These subtypes are characterized by differences in mutation burden, immune cell infiltration, expression of immunomodulatory genes, response to chemotherapy, and prognosis. In summary, the ImmLnc pipeline and the resulting data serve as a valuable resource for understanding lncRNA function and to advance identification of immunotherapy targets. In cancer, long noncoding RNAs (lncRNAs) can regulate immune-related pathways. Here, the authors present ImmLnc, an algorithm that can help prioritise immune-related lncRNAs in cancer immunotherapy research

Journal ArticleDOI
TL;DR: This analysis presents the most definitive mutational landscape of mitochondrial genomes and identifies several hypermutated cases, frequent somatic nuclear transfer of mt DNA and high variability of mtDNA copy number in many cancers.
Abstract: Mitochondria are essential cellular organelles that play critical roles in cancer. Here, as part of the International Cancer Genome Consortium/The Cancer Genome Atlas Pan-Cancer Analysis of Whole Genomes Consortium, which aggregated whole-genome sequencing data from 2,658 cancers across 38 tumor types, we performed a multidimensional, integrated characterization of mitochondrial genomes and related RNA sequencing data. Our analysis presents the most definitive mutational landscape of mitochondrial genomes and identifies several hypermutated cases. Truncating mutations are markedly enriched in kidney, colorectal and thyroid cancers, suggesting oncogenic effects with the activation of signaling pathways. We find frequent somatic nuclear transfers of mitochondrial DNA, some of which disrupt therapeutic target genes. Mitochondrial copy number varies greatly within and across cancers and correlates with clinical variables. Co-expression analysis highlights the function of mitochondrial genes in oxidative phosphorylation, DNA repair and the cell cycle, and shows their connections with clinically actionable genes. Our study lays a foundation for translating mitochondrial biology into clinical applications.

Journal ArticleDOI
TL;DR: Targeting METTL3 and its pathway offer alternative rational therapeutic targets in CRC patients with high glucose metabolism, as well as exploring the molecular mechanism ofMETTL3 in CRC.
Abstract: Epigenetic alterations are involved in various aspects of colorectal carcinogenesis. N6-methyladenosine (m6A) modifications of RNAs are emerging as a new layer of epigenetic regulation. As the most abundant chemical modification of eukaryotic mRNA, m6A is essential for the regulation of mRNA stability, splicing, and translation. Alterations of m6A regulatory genes play important roles in the pathogenesis of a variety of human diseases. However, whether this mRNA modification participates in the glucose metabolism of colorectal cancer (CRC) remains uncharacterized. Transcriptome-sequencing and liquid chromatography-tandem mass spectrometry (LC-MS) were performed to evaluate the correlation between m6A modifications and glucose metabolism in CRC. Mass spectrometric metabolomics analysis, in vitro and in vivo experiments were conducted to investigate the effects of METTL3 on CRC glycolysis and tumorigenesis. RNA MeRIP-sequencing, immunoprecipitation and RNA stability assay were used to explore the molecular mechanism of METTL3 in CRC. A strong correlation between METTL3 and 18F-FDG uptake was observed in CRC patients from Xuzhou Central Hospital. METTL3 induced-CRC tumorigenesis depends on cell glycolysis in multiple CRC models. Mechanistically, METTL3 directly interacted with the 5′/3’UTR regions of HK2, and the 3’UTR region of SLC2A1 (GLUT1), then further stabilized these two genes and activated the glycolysis pathway. M6A-mediated HK2 and SLC2A1 (GLUT1) stabilization relied on the m6A reader IGF2BP2 or IGF2BP2/3, respectively. METTL3 is a functional and clinical oncogene in CRC. METTL3 stabilizes HK2 and SLC2A1 (GLUT1) expression in CRC through an m6A-IGF2BP2/3- dependent mechanism. Targeting METTL3 and its pathway offer alternative rational therapeutic targets in CRC patients with high glucose metabolism.

Journal ArticleDOI
TL;DR: The results suggest that the ISG-type induction of dACE2 in IFN-high conditions created by treatments, an inflammatory tumor microenvironment or viral co-infections is unlikely to increase the cellular entry of SARS-CoV-2 and promote infection.
Abstract: Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), which causes COVID-19, utilizes angiotensin-converting enzyme 2 (ACE2) for entry into target cells. ACE2 has been proposed as an interferon-stimulated gene (ISG). Thus, interferon-induced variability in ACE2 expression levels could be important for susceptibility to COVID-19 or its outcomes. Here, we report the discovery of a novel, transcriptionally independent truncated isoform of ACE2, which we designate as deltaACE2 (dACE2). We demonstrate that dACE2, but not ACE2, is an ISG. In The Cancer Genome Atlas, the expression of dACE2 was enriched in squamous tumors of the respiratory, gastrointestinal and urogenital tracts. In vitro, dACE2, which lacks 356 amino-terminal amino acids, was non-functional in binding the SARS-CoV-2 spike protein and as a carboxypeptidase. Our results suggest that the ISG-type induction of dACE2 in IFN-high conditions created by treatments, an inflammatory tumor microenvironment or viral co-infections is unlikely to increase the cellular entry of SARS-CoV-2 and promote infection.

Journal ArticleDOI
29 Jul 2020-Nature
TL;DR: A high-density DNase I cleavage map from 243 human cell and tissue types provides a genome-wide, nucleotide-resolution map of human transcription factor footprints, and shows that the enrichment of genetic variants associated with diseases or phenotypic traits in regulatory regions is almost entirely attributable to variants within footprints.
Abstract: Combinatorial binding of transcription factors to regulatory DNA underpins gene regulation in all organisms. Genetic variation in regulatory regions has been connected with diseases and diverse phenotypic traits1, but it remains challenging to distinguish variants that affect regulatory function2. Genomic DNase I footprinting enables the quantitative, nucleotide-resolution delineation of sites of transcription factor occupancy within native chromatin3–6. However, only a small fraction of such sites have been precisely resolved on the human genome sequence6. Here, to enable comprehensive mapping of transcription factor footprints, we produced high-density DNase I cleavage maps from 243 human cell and tissue types and states and integrated these data to delineate about 4.5 million compact genomic elements that encode transcription factor occupancy at nucleotide resolution. We map the fine-scale structure within about 1.6 million DNase I-hypersensitive sites and show that the overwhelming majority are populated by well-spaced sites of single transcription factor–DNA interaction. Cell-context-dependent cis-regulation is chiefly executed by wholesale modulation of accessibility at regulatory DNA rather than by differential transcription factor occupancy within accessible elements. We also show that the enrichment of genetic variants associated with diseases or phenotypic traits in regulatory regions1,7 is almost entirely attributable to variants within footprints, and that functional variants that affect transcription factor occupancy are nearly evenly partitioned between loss- and gain-of-function alleles. Unexpectedly, we find increased density of human genetic variation within transcription factor footprints, revealing an unappreciated driver of cis-regulatory evolution. Our results provide a framework for both global and nucleotide-precision analyses of gene regulatory mechanisms and functional genetic variation. A high-density DNase I cleavage map from 243 human cell and tissue types provides a genome-wide, nucleotide-resolution map of human transcription factor footprints.