scispace - formally typeset
Search or ask a question

Showing papers on "Gene published in 2019"


Journal ArticleDOI
TL;DR: The phylogenetic analysis complemented with synteny analyses suggests that Bmp2, -4 and -16 are remnants of a gene quartet that originated during the two rounds of whole-genome duplication (2R-WGD) early in vertebrate evolution.
Abstract: The vertebrate gene repertoire is characterized by “cryptic” genes whose identification has been hampered by their absence from the genomes of well-studied species. One example is the Bmp16 gene, a paralog of the developmental key genes Bmp2 and -4. We focus on the Bmp2/4/16 group of genes to study the evolutionary dynamics following gen(om)e duplications with special emphasis on the poorly studied Bmp16 gene. We reveal the presence of Bmp16 in chondrichthyans in addition to previously reported teleost fishes and reptiles. Using comprehensive, vertebrate-wide gene sampling, our phylogenetic analysis complemented with synteny analyses suggests that Bmp2, -4 and -16 are remnants of a gene quartet that originated during the two rounds of whole-genome duplication (2R-WGD) early in vertebrate evolution. We confirm that Bmp16 genes were lost independently in at least three lineages (mammals, archelosaurs and amphibians) and report that they have elevated rates of sequence evolution. This finding agrees with their more “flexible” deployment during development; while Bmp16 has limited embryonic expression domains in the cloudy catshark, it is broadly expressed in the green anole lizard. Our study illustrates the dynamics of gene family evolution by integrating insights from sequence diversification, gene repertoire changes, and shuffling of expression domains.

1,376 citations


Posted ContentDOI
Konrad J. Karczewski1, Konrad J. Karczewski2, Laurent C. Francioli1, Laurent C. Francioli2, Grace Tiao1, Grace Tiao2, Beryl B. Cummings1, Beryl B. Cummings2, Jessica Alföldi1, Jessica Alföldi2, Qingbo Wang2, Qingbo Wang1, Ryan L. Collins2, Ryan L. Collins1, Kristen M. Laricchia1, Kristen M. Laricchia2, Andrea Ganna3, Andrea Ganna1, Andrea Ganna2, Daniel P. Birnbaum1, Laura D. Gauthier1, Harrison Brand2, Harrison Brand1, Matthew Solomonson2, Matthew Solomonson1, Nicholas A. Watts1, Nicholas A. Watts2, Daniel R. Rhodes4, Moriel Singer-Berk1, Eleanor G. Seaby2, Eleanor G. Seaby1, Jack A. Kosmicki1, Jack A. Kosmicki2, Raymond K. Walters1, Raymond K. Walters2, Katherine Tashman2, Katherine Tashman1, Yossi Farjoun1, Eric Banks1, Timothy Poterba2, Timothy Poterba1, Arcturus Wang2, Arcturus Wang1, Cotton Seed2, Cotton Seed1, Nicola Whiffin1, Nicola Whiffin5, Jessica X. Chong6, Kaitlin E. Samocha7, Emma Pierce-Hoffman1, Zachary Zappala1, Zachary Zappala8, Anne H. O’Donnell-Luria1, Anne H. O’Donnell-Luria9, Anne H. O’Donnell-Luria2, Eric Vallabh Minikel1, Ben Weisburd1, Monkol Lek1, Monkol Lek10, James S. Ware5, James S. Ware1, Christopher Vittal2, Christopher Vittal1, Irina M. Armean11, Irina M. Armean2, Irina M. Armean1, Louis Bergelson1, Kristian Cibulskis1, Kristen M. Connolly1, Miguel Covarrubias1, Stacey Donnelly1, Steven Ferriera1, Stacey Gabriel1, Jeff Gentry1, Namrata Gupta1, Thibault Jeandet1, Diane Kaplan1, Christopher Llanwarne1, Ruchi Munshi1, Sam Novod1, Nikelle Petrillo1, David Roazen1, Valentin Ruano-Rubio1, Andrea Saltzman1, Molly Schleicher1, Jose Soto1, Kathleen Tibbetts1, Charlotte Tolonen1, Gordon Wade1, Michael E. Talkowski2, Michael E. Talkowski1, Benjamin M. Neale2, Benjamin M. Neale1, Mark J. Daly1, Daniel G. MacArthur2, Daniel G. MacArthur1 
30 Jan 2019-bioRxiv
TL;DR: Using an improved human mutation rate model, human protein-coding genes are classified along a spectrum representing tolerance to inactivation, validate this classification using data from model organisms and engineered human cells, and show that it can be used to improve gene discovery power for both common and rare diseases.
Abstract: Summary Genetic variants that inactivate protein-coding genes are a powerful source of information about the phenotypic consequences of gene disruption: genes critical for an organism’s function will be depleted for such variants in natural populations, while non-essential genes will tolerate their accumulation. However, predicted loss-of-function (pLoF) variants are enriched for annotation errors, and tend to be found at extremely low frequencies, so their analysis requires careful variant annotation and very large sample sizes. Here, we describe the aggregation of 125,748 exomes and 15,708 genomes from human sequencing studies into the Genome Aggregation Database (gnomAD). We identify 443,769 high-confidence pLoF variants in this cohort after filtering for sequencing and annotation artifacts. Using an improved model of human mutation, we classify human protein-coding genes along a spectrum representing intolerance to inactivation, validate this classification using data from model organisms and engineered human cells, and show that it can be used to improve gene discovery power for both common and rare diseases.

1,128 citations


Journal ArticleDOI
TL;DR: The mechanisms and functions of DNA methylation and demethylation in both mice and humans at CpG-rich promoters, gene bodies and transposable elements are discussed and the dynamic erasure and re-establishment in embryonic, germline and somatic cell development is highlighted.
Abstract: DNA methylation is of paramount importance for mammalian embryonic development. DNA methylation has numerous functions: it is implicated in the repression of transposons and genes, but is also associated with actively transcribed gene bodies and, in some cases, with gene activation per se. In recent years, sensitive technologies have been developed that allow the interrogation of DNA methylation patterns from a small number of cells. The use of these technologies has greatly improved our knowledge of DNA methylation dynamics and heterogeneity in embryos and in specific tissues. Combined with genetic analyses, it is increasingly apparent that regulation of DNA methylation erasure and (re-)establishment varies considerably between different developmental stages. In this Review, we discuss the mechanisms and functions of DNA methylation and demethylation in both mice and humans at CpG-rich promoters, gene bodies and transposable elements. We highlight the dynamic erasure and re-establishment of DNA methylation in embryonic, germline and somatic cell development. Finally, we provide insights into DNA methylation gained from studying genetic diseases. DNA methylation is essential for mammalian embryogenesis owing to its repression of transposons and genes, but it is also associated with gene activation. The recent use of sensitive technologies has revealed that DNA methylation dynamics vary considerably between embryonic, germline and somatic cell development, with implications for genetic diseases and cancer.

1,039 citations


Posted ContentDOI
29 Oct 2019-bioRxiv
TL;DR: ScVelo enables disentangling heterogeneous subpopulation kinetics with unprecedented resolution in hippocampal dentate gyrus neurogenesis and pancreatic endocrinogenesis and is anticipate that scVelo will greatly facilitate the study of lineage decisions, gene regulation, and pathway activity identification.
Abstract: The introduction of RNA velocity in single cells has opened up new ways of studying cellular differentiation. The originally proposed framework obtains velocities as the deviation of the observed ratio of spliced and unspliced mRNA from an inferred steady state. Errors in velocity estimates arise if the central assumptions of a common splicing rate and the observation of the full splicing dynamics with steady-state mRNA levels are violated. With scVelo (https://scvelo.org), we address these restrictions by solving the full transcriptional dynamics of splicing kinetics using a likelihood-based dynamical model. This generalizes RNA velocity to a wide variety of systems comprising transient cell states, which are common in development and in response to perturbations. We infer gene-specific rates of transcription, splicing and degradation, and recover the latent time of the underlying cellular processes. This latent time represents the cell’s internal clock and is based only on its transcriptional dynamics. Moreover, scVelo allows us to identify regimes of regulatory changes such as stages of cell fate commitment and, therein, systematically detects putative driver genes. We demonstrate that scVelo enables disentangling heterogeneous subpopulation kinetics with unprecedented resolution in hippocampal dentate gyrus neurogenesis and pancreatic endocrinogenesis. We anticipate that scVelo will greatly facilitate the study of lineage decisions, gene regulation, and pathway activity identification.

712 citations


Journal ArticleDOI
03 Apr 2019-Nature
TL;DR: Transcriptional adaptation, a genetic compensation process by which organisms respond to mutations by upregulating related genes, is triggered by mRNA decay and involves a sequence-dependent mechanism.
Abstract: Genetic robustness, or the ability of an organism to maintain fitness in the presence of harmful mutations, can be achieved via protein feedback loops. Previous work has suggested that organisms may also respond to mutations by transcriptional adaptation, a process by which related gene(s) are upregulated independently of protein feedback loops. However, the prevalence of transcriptional adaptation and its underlying molecular mechanisms are unknown. Here, by analysing several models of transcriptional adaptation in zebrafish and mouse, we uncover a requirement for mutant mRNA degradation. Alleles that fail to transcribe the mutated gene do not exhibit transcriptional adaptation, and these alleles give rise to more severe phenotypes than alleles displaying mutant mRNA decay. Transcriptome analysis in alleles displaying mutant mRNA decay reveals the upregulation of a substantial proportion of the genes that exhibit sequence similarity with the mutated gene's mRNA, suggesting a sequence-dependent mechanism. These findings have implications for our understanding of disease-causing mutations, and will help in the design of mutant alleles with minimal transcriptional adaptation-derived compensation.

679 citations


Journal ArticleDOI
TL;DR: This Review discusses how the interaction of p53 with DNA and chromatin affects gene expression, and how p53 post-translational modifications, its temporal expression dynamics and its interactions with chromatin regulators and transcription factors influence cell fate.
Abstract: The tumour suppressor p53 has a central role in the response to cellular stress. Activated p53 transcriptionally regulates hundreds of genes that are involved in multiple biological processes, including in DNA damage repair, cell cycle arrest, apoptosis and senescence. In the context of DNA damage, p53 is thought to be a decision-making transcription factor that selectively activates genes as part of specific gene expression programmes to determine cellular outcomes. In this Review, we discuss the multiple molecular mechanisms of p53 regulation and how they modulate the induction of apoptosis or cell cycle arrest following DNA damage. Specifically, we discuss how the interaction of p53 with DNA and chromatin affects gene expression, and how p53 post-translational modifications, its temporal expression dynamics and its interactions with chromatin regulators and transcription factors influence cell fate. These multiple layers of regulation enable p53 to execute cellular responses that are appropriate for specific cellular states and environmental conditions.

611 citations


Journal ArticleDOI
TL;DR: This review is the antiviral activities of the IFN/ISG system, which includes general paradigms of ISG function, supported by specific examples in the literature, as well as methodologies to identify and characterizeISG function.
Abstract: In the absence of an intact interferon (IFN) response, mammals may be susceptible to lethal viral infection. IFNs are secreted cytokines that activate a signal transduction cascade leading to the induction of hundreds of interferon-stimulated genes (ISGs). Remarkably, approximately 10% of the genes in the human genome have the potential to be regulated by IFNs. What do all of these genes do? It is a complex question without a simple answer. From decades of research, we know that many of the protein products encoded by these ISGs work alone or in concert to achieve one or more cellular outcomes, including antiviral defense, antiproliferative activities, and stimulation of adaptive immunity. The focus of this review is the antiviral activities of the IFN/ISG system. This includes general paradigms of ISG function, supported by specific examples in the literature, as well as methodologies to identify and characterize ISG function.

502 citations


Journal ArticleDOI
TL;DR: This work characterized 29 immune cell types within the peripheral blood mononuclear cell (PBMC) fraction of healthy donors using RNA-seq (RNA sequencing) and flow cytometry to identify sets of genes that are specific, are co-expressed, and have housekeeping roles across the 29 cell types.

497 citations


Journal ArticleDOI
TL;DR: Mendelian Inheritance in Man provides interactive access to the knowledge repository, including genomic coordinate searches of the gene map, views of genetic heterogeneity of phenotypes in Phenotypic Series, and side-by-side comparisons of clinical synopses.
Abstract: For over 50 years Mendelian Inheritance in Man has chronicled the collective knowledge of the field of medical genetics. It initially cataloged the known X-linked, autosomal recessive and autosomal dominant inherited disorders, but grew to be the primary repository of curated information on both genes and genetic phenotypes and the relationships between them. Each phenotype and gene is given a separate entry assigned a stable, unique identifier. The entries contain structured summaries of new and important information based on expert review of the biomedical literature. OMIM.org provides interactive access to the knowledge repository, including genomic coordinate searches of the gene map, views of genetic heterogeneity of phenotypes in Phenotypic Series, and side-by-side comparisons of clinical synopses. OMIM.org also supports computational queries via a robust API. All entries have extensive targeted links to other genomic resources and additional references. Updates to OMIM can be found on the update list or followed through the MIMmatch service. Updated user guides and tutorials are available on the website. As of September 2018, OMIM had over 24,600 entries, and the OMIM Morbid Map Scorecard had 6,259 molecularized phenotypes connected to 3,961 genes.

487 citations


Journal ArticleDOI
01 Nov 2019-Science
TL;DR: The results highlight that endophytic root microbiomes harbor a wealth of as yet unknown functional traits that, in concert, can protect the plant inside out.
Abstract: Microorganisms living inside plants can promote plant growth and health, but their genomic and functional diversity remain largely elusive. Here, metagenomics and network inference show that fungal infection of plant roots enriched for Chitinophagaceae and Flavobacteriaceae in the root endosphere and for chitinase genes and various unknown biosynthetic gene clusters encoding the production of nonribosomal peptide synthetases (NRPSs) and polyketide synthases (PKSs). After strain-level genome reconstruction, a consortium of Chitinophaga and Flavobacterium was designed that consistently suppressed fungal root disease. Site-directed mutagenesis then revealed that a previously unidentified NRPS-PKS gene cluster from Flavobacterium was essential for disease suppression by the endophytic consortium. Our results highlight that endophytic root microbiomes harbor a wealth of as yet unknown functional traits that, in concert, can protect the plant inside out.

482 citations


Journal ArticleDOI
TL;DR: A large-scale RNA sequencing study is performed to experimentally identify genes that are downregulated by 25 miRNAs and an improved computational model for genome-wide miRNA target prediction is developed and validated.
Abstract: We perform a large-scale RNA sequencing study to experimentally identify genes that are downregulated by 25 miRNAs. This RNA-seq dataset is combined with public miRNA target binding data to systematically identify miRNA targeting features that are characteristic of both miRNA binding and target downregulation. By integrating these common features in a machine learning framework, we develop and validate an improved computational model for genome-wide miRNA target prediction. All prediction data can be accessed at miRDB ( http://mirdb.org ).

Journal ArticleDOI
TL;DR: The Cistrome DB has a new Toolkit module with several features that allow users to better utilize the large-scale ChIP-seq, DNase-seq and ATAC-seq data, and the new tools will greatly benefit the biomedical research community.
Abstract: The Cistrome Data Browser (DB) is a resource of human and mouse cis-regulatory information derived from ChIP-seq, DNase-seq and ATAC-seq chromatin profiling assays, which map the genome-wide locations of transcription factor binding sites, histone post-translational modifications and regions of chromatin accessible to endonuclease activity. Currently, the Cistrome DB contains approximately 47,000 human and mouse samples with about 24,000 newly collected datasets compared to the previous release two years ago. Furthermore, the Cistrome DB has a new Toolkit module with several features that allow users to better utilize the large-scale ChIP-seq, DNase-seq, and ATAC-seq data. First, users can query the factors which are likely to regulate a specific gene of interest. Second, the Cistrome DB Toolkit facilitates searches for factor binding, histone modifications, and chromatin accessibility in any given genomic interval shorter than 2Mb. Third, the Toolkit can determine the most similar ChIP-seq, DNase-seq, and ATAC-seq samples in terms of genomic interval overlaps with user-provided genomic interval sets. The Cistrome DB is a user-friendly, up-to-date, and well maintained resource, and the new tools will greatly benefit the biomedical research community. The database is freely available at http://cistrome.org/db, and the Toolkit is at http://dbtoolkit.cistrome.org.

Journal ArticleDOI
TL;DR: A comprehensive landscape of different modes of gene duplication across the plant kingdom is identified by comparing 141 genomes, which provides a solid foundation for further investigation of the dynamic evolution of duplicate genes.
Abstract: The sharp increase of plant genome and transcriptome data provide valuable resources to investigate evolutionary consequences of gene duplication in a range of taxa, and unravel common principles underlying duplicate gene retention. We survey 141 sequenced plant genomes to elucidate consequences of gene and genome duplication, processes central to the evolution of biodiversity. We develop a pipeline named DupGen_finder to identify different modes of gene duplication in plants. Genes derived from whole-genome, tandem, proximal, transposed, or dispersed duplication differ in abundance, selection pressure, expression divergence, and gene conversion rate among genomes. The number of WGD-derived duplicate genes decreases exponentially with increasing age of duplication events—transposed duplication- and dispersed duplication-derived genes declined in parallel. In contrast, the frequency of tandem and proximal duplications showed no significant decrease over time, providing a continuous supply of variants available for adaptation to continuously changing environments. Moreover, tandem and proximal duplicates experienced stronger selective pressure than genes formed by other modes and evolved toward biased functional roles involved in plant self-defense. The rate of gene conversion among WGD-derived gene pairs declined over time, peaking shortly after polyploidization. To provide a platform for accessing duplicated gene pairs in different plants, we constructed the Plant Duplicate Gene Database. We identify a comprehensive landscape of different modes of gene duplication across the plant kingdom by comparing 141 genomes, which provides a solid foundation for further investigation of the dynamic evolution of duplicate genes.

Journal ArticleDOI
F. Kyle Satterstrom1, Jack A. Kosmicki1, Jiebiao Wang2, Michael S. Breen3  +150 moreInstitutions (45)
TL;DR: Using an enhanced Bayesian framework to integrate de novo and case-control rare variation, 102 risk genes are identified at a false discovery rate of ≤ 0.1, consistent with multiple paths to an excitatory/inhibitory imbalance underlying ASD.
Abstract: We present the largest exome sequencing study of autism spectrum disorder (ASD) to date (n=35,584 total samples, 11,986 with ASD). Using an enhanced Bayesian framework to integrate de novo and case-control rare variation, we identify 102 risk genes at a false discovery rate ≤ 0.1. Of these genes, 49 show higher frequencies of disruptive de novo variants in individuals ascertained for severe neurodevelopmental delay, while 53 show higher frequencies in individuals ascertained for ASD; comparing ASD cases with mutations in these groups reveals phenotypic differences. Expressed early in brain development, most of the risk genes have roles in regulation of gene expression or neuronal communication (i.e., mutations effect neurodevelopmental and neurophysiological changes), and 13 fall within loci recurrently hit by copy number variants. In human cortex single-cell gene expression data, expression of risk genes is enriched in both excitatory and inhibitory neuronal lineages, consistent with multiple paths to an excitatory/inhibitory imbalance underlying ASD.

Journal ArticleDOI
TL;DR: This study revealed that METTL3, acting as an oncogene, maintained SOX2 expression through an m6A-IGF2BP2-dependent mechanism in CRC cells, and indicated a potential biomarker panel for prognostic prediction in CRC.
Abstract: Colorectal carcinoma (CRC) is one of the most common malignant tumors, and its main cause of death is tumor metastasis. RNA N6-methyladenosine (m6A) is an emerging regulatory mechanism for gene expression and methyltransferase-like 3 (METTL3) participates in tumor progression in several cancer types. However, its role in CRC remains unexplored. Western blot, quantitative real-time PCR (RT-qPCR) and immunohistochemical (IHC) were used to detect METTL3 expression in cell lines and patient tissues. Methylated RNA immunoprecipitation sequencing (MeRIP-seq) and transcriptomic RNA sequencing (RNA-seq) were used to screen the target genes of METTL3. The biological functions of METTL3 were investigated in vitro and in vivo. RNA pull-down and RNA immunoprecipitation assays were conducted to explore the specific binding of target genes. RNA stability assay was used to detect the half-lives of the downstream genes of METTL3. Using TCGA database, higher METTL3 expression was found in CRC metastatic tissues and was associated with a poor prognosis. MeRIP-seq revealed that SRY (sex determining region Y)-box 2 (SOX2) was the downstream gene of METTL3. METTL3 knockdown in CRC cells drastically inhibited cell self-renewal, stem cell frequency and migration in vitro and suppressed CRC tumorigenesis and metastasis in both cell-based models and PDX models. Mechanistically, methylated SOX2 transcripts, specifically the coding sequence (CDS) regions, were subsequently recognized by the specific m6A “reader”, insulin-like growth factor 2 mRNA binding protein 2 (IGF2BP2), to prevent SOX2 mRNA degradation. Further, SOX2 expression positively correlated with METTL3 and IGF2BP2 in CRC tissues. The combined IHC panel, including “writer”, “reader”, and “target”, exhibited a better prognostic value for CRC patients than any of these components individually. Overall, our study revealed that METTL3, acting as an oncogene, maintained SOX2 expression through an m6A-IGF2BP2-dependent mechanism in CRC cells, and indicated a potential biomarker panel for prognostic prediction in CRC.

Journal ArticleDOI
29 Nov 2019-Science
TL;DR: The list of genes likely to be influenced by noncoding variants in AD is revised and expanded and the probable cell types in which they function are suggested to help better understand common genetic variation associated with brain diseases.
Abstract: Noncoding genetic variation is a major driver of phenotypic diversity, but functional interpretation is challenging. To better understand common genetic variation associated with brain diseases, we defined noncoding regulatory regions for major cell types of the human brain. Whereas psychiatric disorders were primarily associated with variants in transcriptional enhancers and promoters in neurons, sporadic Alzheimer's disease (AD) variants were largely confined to microglia enhancers. Interactome maps connecting disease-risk variants in cell-type-specific enhancers to promoters revealed an extended microglia gene network in AD. Deletion of a microglia-specific enhancer harboring AD-risk variants ablated BIN1 expression in microglia, but not in neurons or astrocytes. These findings revise and expand the list of genes likely to be influenced by noncoding variants in AD and suggest the probable cell types in which they function.

Journal ArticleDOI
TL;DR: The ability to perform spatially resolved, genome-wide RNA profiling with high detection efficiency and accuracy by MERFISH could help address a wide array of questions ranging from the regulation of gene expression in cells to the development of cell fate and organization in tissues.
Abstract: The expression profiles and spatial distributions of RNAs regulate many cellular functions. Image-based transcriptomic approaches provide powerful means to measure both expression and spatial information of RNAs in individual cells within their native environment. Among these approaches, multiplexed error-robust fluorescence in situ hybridization (MERFISH) has achieved spatially resolved RNA quantification at transcriptome scale by massively multiplexing single-molecule FISH measurements. Here, we increased the gene throughput of MERFISH and demonstrated simultaneous measurements of RNA transcripts from ∼10,000 genes in individual cells with ∼80% detection efficiency and ∼4% misidentification rate. We combined MERFISH with cellular structure imaging to determine subcellular compartmentalization of RNAs. We validated this approach by showing enrichment of secretome transcripts at the endoplasmic reticulum, and further revealed enrichment of long noncoding RNAs, RNAs with retained introns, and a subgroup of protein-coding mRNAs in the cell nucleus. Leveraging spatially resolved RNA profiling, we developed an approach to determine RNA velocity in situ using the balance of nuclear versus cytoplasmic RNA counts. We applied this approach to infer pseudotime ordering of cells and identified cells at different cell-cycle states, revealing ∼1,600 genes with putative cell cycle-dependent expression and a gradual transcription profile change as cells progress through cell-cycle stages. Our analysis further revealed cell cycle-dependent and cell cycle-independent spatial heterogeneity of transcriptionally distinct cells. We envision that the ability to perform spatially resolved, genome-wide RNA profiling with high detection efficiency and accuracy by MERFISH could help address a wide array of questions ranging from the regulation of gene expression in cells to the development of cell fate and organization in tissues.

Journal ArticleDOI
17 Apr 2019-Nature
TL;DR: It is shown that a CBE with rat APOBEC1 can cause extensive transcriptome-wide deamination of RNA cytosines in human cells, inducing tens of thousands of C-to-U edits and the need to more fully define and characterize the RNA off-target effects of deaminase enzymes in base editor platforms is suggested.
Abstract: CRISPR-Cas base-editor technology enables targeted nucleotide alterations, and is being increasingly used for research and potential therapeutic applications1,2. The most widely used cytosine base editors (CBEs) induce deamination of DNA cytosines using the rat APOBEC1 enzyme, which is targeted by a linked Cas protein-guide RNA complex3,4. Previous studies of the specificity of CBEs have identified off-target DNA edits in mammalian cells5,6. Here we show that a CBE with rat APOBEC1 can cause extensive transcriptome-wide deamination of RNA cytosines in human cells, inducing tens of thousands of C-to-U edits with frequencies ranging from 0.07% to 100% in 38-58% of expressed genes. CBE-induced RNA edits occur in both protein-coding and non-protein-coding sequences and generate missense, nonsense, splice site, and 5' and 3' untranslated region mutations. We engineered two CBE variants bearing mutations in rat APOBEC1 that substantially decreased the number of RNA edits (by more than 390-fold and more than 3,800-fold) in human cells. These variants also showed more precise on-target DNA editing than the wild-type CBE and, for most guide RNAs tested, no substantial reduction in editing efficiency. Finally, we show that an adenine base editor7 can also induce transcriptome-wide RNA edits. These results have implications for the use of base editors in both research and clinical settings, illustrate the feasibility of engineering improved variants with reduced RNA editing activities, and suggest the need to more fully define and characterize the RNA off-target effects of deaminase enzymes in base editor platforms.

Journal ArticleDOI
10 Jan 2019-Cell
TL;DR: A multiplex, expression quantitative trait locus (eQTL)-inspired framework for mapping enhancer-gene pairs by introducing random combinations of CRISPR/Cas9-mediated perturbations to each of many cells, followed by single-cell RNA sequencing (RNA-seq).

Journal ArticleDOI
TL;DR: The ChEA3 background database contains a collection of gene set libraries generated from multiple sources including TF–gene co-expression from RNA-seq studies, TF–target associations from ChIP-seq experiments, and TF-gree co-occurrence computed from crowd-submitted gene lists, which illuminate general transcription factor properties such as whether the TF behaves as an activator or a repressor.
Abstract: Identifying the transcription factors (TFs) responsible for observed changes in gene expression is an important step in understanding gene regulatory networks. ChIP-X Enrichment Analysis 3 (ChEA3) is a transcription factor enrichment analysis tool that ranks TFs associated with user-submitted gene sets. The ChEA3 background database contains a collection of gene set libraries generated from multiple sources including TF-gene co-expression from RNA-seq studies, TF-target associations from ChIP-seq experiments, and TF-gene co-occurrence computed from crowd-submitted gene lists. Enrichment results from these distinct sources are integrated to generate a composite rank that improves the prediction of the correct upstream TF compared to ranks produced by individual libraries. We compare ChEA3 with existing TF prediction tools and show that ChEA3 performs better. By integrating the ChEA3 libraries, we illuminate general transcription factor properties such as whether the TF behaves as an activator or a repressor. The ChEA3 web-server is available from https://amp.pharm.mssm.edu/ChEA3.

Journal ArticleDOI
TL;DR: Paddy trials showed that genome-edited SWEET promoters endow rice lines with robust, broad-spectrum resistance to all Xanthomonas bacterial blight strains tested.
Abstract: Bacterial blight of rice is an important disease in Asia and Africa. The pathogen, Xanthomonas oryzae pv. oryzae (Xoo), secretes one or more of six known transcription-activator-like effectors (TALes) that bind specific promoter sequences and induce, at minimum, one of the three host sucrose transporter genes SWEET11, SWEET13 and SWEET14, the expression of which is required for disease susceptibility. We used CRISPR-Cas9-mediated genome editing to introduce mutations in all three SWEET gene promoters. Editing was further informed by sequence analyses of TALe genes in 63 Xoo strains, which revealed multiple TALe variants for SWEET13 alleles. Mutations were also created in SWEET14, which is also targeted by two TALes from an African Xoo lineage. A total of five promoter mutations were simultaneously introduced into the rice line Kitaake and the elite mega varieties IR64 and Ciherang-Sub1. Paddy trials showed that genome-edited SWEET promoters endow rice lines with robust, broad-spectrum resistance.

Journal ArticleDOI
28 Aug 2019-Nature
TL;DR: Structural and microscopy studies of gene transcription underpin a model in which phosphorylation controls the shuttling of RNA polymerase II between promoter and gene-body condensates to regulate transcription initiation and elongation.
Abstract: The regulated transcription of genes determines cell identity and function. Recent structural studies have elucidated mechanisms that govern the regulation of transcription by RNA polymerases during the initiation and elongation phases. Microscopy studies have revealed that transcription involves the condensation of factors in the cell nucleus. A model is emerging for the transcription of protein-coding genes in which distinct transient condensates form at gene promoters and in gene bodies to concentrate the factors required for transcription initiation and elongation, respectively. The transcribing enzyme RNA polymerase II may shuttle between these condensates in a phosphorylation-dependent manner. Molecular principles are being defined that rationalize transcriptional organization and regulation, and that will guide future investigations. Structural and microscopy studies of gene transcription underpin a model in which phosphorylation controls the shuttling of RNA polymerase II between promoter and gene-body condensates to regulate transcription initiation and elongation.

Journal ArticleDOI
TL;DR: A tomato pan-genome constructed using genome sequences of 725 phylogenetically and geographically representative accessions captures 4,873 genes absent from the reference genome and identifies a rare allele of TomLoxC regulating fruit flavor.
Abstract: Modern tomatoes have narrow genetic diversity limiting their improvement potential. We present a tomato pan-genome constructed using genome sequences of 725 phylogenetically and geographically representative accessions, revealing 4,873 genes absent from the reference genome. Presence/absence variation analyses reveal substantial gene loss and intense negative selection of genes and promoters during tomato domestication and improvement. Lost or negatively selected genes are enriched for important traits, especially disease resistance. We identify a rare allele in the TomLoxC promoter selected against during domestication. Quantitative trait locus mapping and analysis of transgenic plants reveal a role for TomLoxC in apocarotenoid production, which contributes to desirable tomato flavor. In orange-stage fruit, accessions harboring both the rare and common TomLoxC alleles (heterozygotes) have higher TomLoxC expression than those homozygous for either and are resurgent in modern tomatoes. The tomato pan-genome adds depth and completeness to the reference genome, and is useful for future biological discovery and breeding.

Journal ArticleDOI
13 Jun 2019-Cell
TL;DR: This work systematically quantified ligand-induced interactions between 148 GPCRs and all 11 unique Gα subunit C termini, and identified sequence-based coupling specificity features, inside and outside the transmembrane domain, which were used to develop a coupling predictor that outperforms previous methods.

Journal ArticleDOI
TL;DR: Tumors with TP53 mutations differ from their non-mutated counterparts in RNA, miRNA, and protein expression patterns, with mutant TP53 tumors displaying enhanced expression of cell cycle progression genes and proteins.

Journal ArticleDOI
18 Mar 2019-Nature
TL;DR: In this paper, optical reconstruction of chromatin architecture (ORCA) is used to trace the DNA path in single cells with nanoscale accuracy and genomic resolution reaching two kilobases.
Abstract: The establishment of cell types during development requires precise interactions between genes and distal regulatory sequences. We have a limited understanding of how these interactions look in three dimensions, vary across cell types in complex tissue, and relate to transcription. Here we describe optical reconstruction of chromatin architecture (ORCA), a method that can trace the DNA path in single cells with nanoscale accuracy and genomic resolution reaching two kilobases. We used ORCA to study a Hox gene cluster in cryosectioned Drosophila embryos and labelled around 30 RNA species in parallel. We identified cell-type-specific physical borders between active and Polycomb-repressed DNA, and unexpected Polycomb-independent borders. Deletion of Polycomb-independent borders led to ectopic enhancer-promoter contacts, aberrant gene expression, and developmental defects. Together, these results illustrate an approach for high-resolution, single-cell DNA domain analysis in vivo, identify domain structures that change with cell identity, and show that border elements contribute to the formation of physical domains in Drosophila.

Journal ArticleDOI
TL;DR: A single-cell chromatin immunoprecipitation followed by sequencing approach paves the way to study the role of chromatin heterogeneity, not just in cancer but in other diseases and healthy systems, notably during cellular differentiation and development.
Abstract: Modulation of chromatin structure via histone modification is a major epigenetic mechanism and regulator of gene expression. However, the contribution of chromatin features to tumor heterogeneity and evolution remains unknown. Here we describe a high-throughput droplet microfluidics platform to profile chromatin landscapes of thousands of cells at single-cell resolution. Using patient-derived xenograft models of acquired resistance to chemotherapy and targeted therapy in breast cancer, we found that a subset of cells within untreated drug-sensitive tumors share a common chromatin signature with resistant cells, undetectable using bulk approaches. These cells, and cells from the resistant tumors, have lost chromatin marks-H3K27me3, which is associated with stable transcriptional repression-for genes known to promote resistance to treatment. This single-cell chromatin immunoprecipitation followed by sequencing approach paves the way to study the role of chromatin heterogeneity, not just in cancer but in other diseases and healthy systems, notably during cellular differentiation and development.

Journal ArticleDOI
01 May 2019-Nature
TL;DR: The number of codons used to encode the canonical amino acids can be reduced, through the genome-wide substitution of target codons by defined synonyms, through a high-fidelity convergent total synthesis.
Abstract: Nature uses 64 codons to encode the synthesis of proteins from the genome, and chooses 1 sense codon-out of up to 6 synonyms-to encode each amino acid. Synonymous codon choice has diverse and important roles, and many synonymous substitutions are detrimental. Here we demonstrate that the number of codons used to encode the canonical amino acids can be reduced, through the genome-wide substitution of target codons by defined synonyms. We create a variant of Escherichia coli with a four-megabase synthetic genome through a high-fidelity convergent total synthesis. Our synthetic genome implements a defined recoding and refactoring scheme-with simple corrections at just seven positions-to replace every known occurrence of two sense codons and a stop codon in the genome. Thus, we recode 18,214 codons to create an organism with a 61-codon genome; this organism uses 59 codons to encode the 20 amino acids, and enables the deletion of a previously essential transfer RNA.

Journal ArticleDOI
10 Jun 2019-Nature
TL;DR: In this paper, the deaminases that are integral to commonly used DNA base editors often bind to RNA, and the authors quantitatively evaluated RNA single nucleotide variations (SNVs) that were induced by CBEs or ABEs.
Abstract: Recently developed DNA base editing methods enable the direct generation of desired point mutations in genomic DNA without generating any double-strand breaks1-3, but the issue of off-target edits has limited the application of these methods. Although several previous studies have evaluated off-target mutations in genomic DNA4-8, it is now clear that the deaminases that are integral to commonly used DNA base editors often bind to RNA9-13. For example, the cytosine deaminase APOBEC1-which is used in cytosine base editors (CBEs)-targets both DNA and RNA12, and the adenine deaminase TadA-which is used in adenine base editors (ABEs)-induces site-specific inosine formation on RNA9,11. However, any potential RNA mutations caused by DNA base editors have not been evaluated. Adeno-associated viruses are the most common delivery system for gene therapies that involve DNA editing; these viruses can sustain long-term gene expression in vivo, so the extent of potential RNA mutations induced by DNA base editors is of great concern14-16. Here we quantitatively evaluated RNA single nucleotide variations (SNVs) that were induced by CBEs or ABEs. Both the cytosine base editor BE3 and the adenine base editor ABE7.10 generated tens of thousands of off-target RNA SNVs. Subsequently, by engineering deaminases, we found that three CBE variants and one ABE variant showed a reduction in off-target RNA SNVs to the baseline while maintaining efficient DNA on-target activity. This study reveals a previously overlooked aspect of off-target effects in DNA editing and also demonstrates that such effects can be eliminated by engineering deaminases.

Journal ArticleDOI
TL;DR: The first annotated chromosome-level reference genome assembly for pea, Gregor Mendel’s original genetic model, provides insights into legume genome evolution and the molecular basis of agricultural traits forpea improvement.
Abstract: We report the first annotated chromosome-level reference genome assembly for pea, Gregor Mendel’s original genetic model. Phylogenetics and paleogenomics show genomic rearrangements across legumes and suggest a major role for repetitive elements in pea genome evolution. Compared to other sequenced Leguminosae genomes, the pea genome shows intense gene dynamics, most likely associated with genome size expansion when the Fabeae diverged from its sister tribes. During Pisum evolution, translocation and transposition differentially occurred across lineages. This reference sequence will accelerate our understanding of the molecular basis of agronomically important traits and support crop improvement.