scispace - formally typeset
Search or ask a question

Showing papers on "Gene published in 2018"


Journal ArticleDOI
TL;DR: A significant expansion in the database size and inclusion of the new web tool for TF prioritization mean that TRRUST v2 will be a versatile database for the study of the transcriptional regulation involved in human diseases.
Abstract: Transcription factors (TFs) are major trans-acting factors in transcriptional regulation. Therefore, elucidating TF-target interactions is a key step toward understanding the regulatory circuitry underlying complex traits such as human diseases. We previously published a reference TF-target interaction database for humans-TRRUST (Transcriptional Regulatory Relationships Unraveled by Sentence-based Text mining)-which was constructed using sentence-based text mining, followed by manual curation. Here, we present TRRUST v2 (www.grnpedia.org/trrust) with a significant improvement from the previous version, including a significantly increased size of the database consisting of 8444 regulatory interactions for 800 TFs in humans. More importantly, TRRUST v2 also contains a database for TF-target interactions in mice, including 6552 TF-target interactions for 828 mouse TFs. TRRUST v2 is also substantially more comprehensive and less biased than other TF-target interaction databases. We also improved the web interface, which now enables prioritization of key TFs for a physiological condition depicted by a set of user-input transcriptional responsive genes. With the significant expansion in the database size and inclusion of the new web tool for TF prioritization, we believe that TRRUST v2 will be a versatile database for the study of the transcriptional regulation involved in human diseases.

1,055 citations


Journal ArticleDOI
17 Jan 2018-Nature
TL;DR: It is shown that chromosomally unstable tumour cells co-opt chronic activation of innate immune pathways to spread to distant organs by sustaining a tumour cell-autonomous response to cytosolic DNA.
Abstract: Chromosomal instability is a hallmark of cancer that results from ongoing errors in chromosome segregation during mitosis. Although chromosomal instability is a major driver of tumour evolution, its role in metastasis has not been established. Here we show that chromosomal instability promotes metastasis by sustaining a tumour cell-autonomous response to cytosolic DNA. Errors in chromosome segregation create a preponderance of micronuclei whose rupture spills genomic DNA into the cytosol. This leads to the activation of the cGAS-STING (cyclic GMP-AMP synthase-stimulator of interferon genes) cytosolic DNA-sensing pathway and downstream noncanonical NF-κB signalling. Genetic suppression of chromosomal instability markedly delays metastasis even in highly aneuploid tumour models, whereas continuous chromosome segregation errors promote cellular invasion and metastasis in a STING-dependent manner. By subverting lethal epithelial responses to cytosolic DNA, chromosomally unstable tumour cells co-opt chronic activation of innate immune pathways to spread to distant organs.

878 citations


Journal ArticleDOI
TL;DR: Long intergenic non-coding RNA genes have diverse features that distinguish them from mRNA-encoding genes and exercise functions such as remodelling chromatin and genome architecture, RNA stabilization and transcription regulation, including enhancer-associated activity.
Abstract: Long intergenic non-coding RNA (lincRNA) genes have diverse features that distinguish them from mRNA-encoding genes and exercise functions such as remodelling chromatin and genome architecture, RNA stabilization and transcription regulation, including enhancer-associated activity. Some genes currently annotated as encoding lincRNAs include small open reading frames (smORFs) and encode functional peptides and thus may be more properly classified as coding RNAs. lincRNAs may broadly serve to fine-tune the expression of neighbouring genes with remarkable tissue specificity through a diversity of mechanisms, highlighting our rapidly evolving understanding of the non-coding genome.

829 citations


Journal ArticleDOI
TL;DR: It is reported that genome editing by CRISPR–Cas9 induces a p53-mediated DNA damage response and cell cycle arrest in immortalized human retinal pigment epithelial cells, leading to a selection against cells with a functional p53 pathway, suggesting that p53 inhibition may improve the efficiency of genome editing of untransformed cells.
Abstract: Here, we report that genome editing by CRISPR–Cas9 induces a p53-mediated DNA damage response and cell cycle arrest in immortalized human retinal pigment epithelial cells, leading to a selection against cells with a functional p53 pathway. Inhibition of p53 prevents the damage response and increases the rate of homologous recombination from a donor template. These results suggest that p53 inhibition may improve the efficiency of genome editing of untransformed cells and that p53 function should be monitored when developing cell-based therapies utilizing CRISPR–Cas9. CRISPR–Cas9-induced DNA damage triggers p53 to limit the efficiency of gene editing in immortalized human retinal pigment epithelial cells.

793 citations


Journal ArticleDOI
TL;DR: Analysis of molecular interactions and changes in gene copy numbers modulate the activity of DNMTs in diverse gene regulatory functions, including transcriptional silencing, transcriptional activation and post-transcriptional regulation by DNMT2-dependent tRNA methylation enables the DNMT family to function as a versatile toolkit for epigenetic regulation.
Abstract: The DNA methyltransferase (DNMT) family comprises a conserved set of DNA-modifying enzymes that have a central role in epigenetic gene regulation. Recent studies have shown that the functions of the canonical DNMT enzymes - DNMT1, DNMT3A and DNMT3B - go beyond their traditional roles of establishing and maintaining DNA methylation patterns. This Review analyses how molecular interactions and changes in gene copy numbers modulate the activity of DNMTs in diverse gene regulatory functions, including transcriptional silencing, transcriptional activation and post-transcriptional regulation by DNMT2-dependent tRNA methylation. This mechanistic diversity enables the DNMT family to function as a versatile toolkit for epigenetic regulation.

792 citations


Journal ArticleDOI
26 Oct 2018-Science
TL;DR: These chromatin accessibility profiles identify cancer- and tissue-specific DNA regulatory elements that enable classification of tumor subtypes with newly recognized prognostic importance, and identify distinct TF activities in cancer based on differences in the inferred patterns of TF-DNA interaction and gene expression.
Abstract: INTRODUCTION Cancer is one of the leading causes of death worldwide. Although the 2% of the human genome that encodes proteins has been extensively studied, much remains to be learned about the noncoding genome and gene regulation in cancer. Genes are turned on and off in the proper cell types and cell states by transcription factor (TF) proteins acting on DNA regulatory elements that are scattered over the vast noncoding genome and exert long-range influences. The Cancer Genome Atlas (TCGA) is a global consortium that aims to accelerate the understanding of the molecular basis of cancer. TCGA has systematically collected DNA mutation, methylation, RNA expression, and other comprehensive datasets from primary human cancer tissue. TCGA has served as an invaluable resource for the identification of genomic aberrations, altered transcriptional networks, and cancer subtypes. Nonetheless, the gene regulatory landscapes of these tumors have largely been inferred through indirect means. RATIONALE A hallmark of active DNA regulatory elements is chromatin accessibility. Eukaryotic genomes are compacted in chromatin, a complex of DNA and proteins, and only the active regulatory elements are accessible by the cell’s machinery such as TFs. The assay for transposase-accessible chromatin using sequencing (ATAC-seq) quantifies DNA accessibility through the use of transposase enzymes that insert sequencing adapters at these accessible chromatin sites. ATAC-seq enables the genome-wide profiling of TF binding events that orchestrate gene expression programs and give a cell its identity. RESULTS We generated high-quality ATAC-seq data in 410 tumor samples from TCGA, identifying diverse regulatory landscapes across 23 cancer types. These chromatin accessibility profiles identify cancer- and tissue-specific DNA regulatory elements that enable classification of tumor subtypes with newly recognized prognostic importance. We identify distinct TF activities in cancer based on differences in the inferred patterns of TF-DNA interaction and gene expression. Genome-wide correlation of gene expression and chromatin accessibility predicts tens of thousands of putative interactions between distal regulatory elements and gene promoters, including key oncogenes and targets in cancer immunotherapy, such as MYC , SRC , BCL2 , and PDL1 . Moreover, these regulatory interactions inform known genetic risk loci linked to cancer predisposition, nominating biochemical mechanisms and target genes for many cancer-linked genetic variants. Lastly, integration with mutation profiling by whole-genome sequencing identifies cancer-relevant noncoding mutations that are associated with altered gene expression. A single-base mutation located 12 kilobases upstream of the FGD4 gene, a regulator of the actin cytoskeleton, generates a putative de novo binding site for an NKX TF and is associated with an increase in chromatin accessibility and a concomitant increase in FGD4 gene expression. CONCLUSION The accessible genome of primary human cancers provides a wealth of information on the susceptibility, mechanisms, prognosis, and potential therapeutic strategies of diverse cancer types. Prediction of interactions between DNA regulatory elements and gene promoters sets the stage for future integrative gene regulatory network analyses. The discovery of hundreds of noncoding somatic mutations that exhibit allele-specific regulatory effects suggests a pervasive mechanism for cancer cells to manipulate gene expression and increase cellular fitness. These data may serve as a foundational resource for the cancer research community.

774 citations


Journal ArticleDOI
27 Jul 2018-Science
TL;DR: Live-cell single-molecule imaging revealed that TF LCDs interact to form local high-concentration hubs at both synthetic DNA arrays and endogenous genomic loci, suggesting that under physiological conditions, rapid, reversible, and selective multivalent LCD-LCD interactions occur between TFs and the RNA Pol II machinery to activate transcription.
Abstract: Many eukaryotic transcription factors (TFs) contain intrinsically disordered low-complexity sequence domains (LCDs), but how these LCDs drive transactivation remains unclear. We used live-cell single-molecule imaging to reveal that TF LCDs form local high-concentration interaction hubs at synthetic and endogenous genomic loci. TF LCD hubs stabilize DNA binding, recruit RNA polymerase II (RNA Pol II), and activate transcription. LCD-LCD interactions within hubs are highly dynamic, display selectivity with binding partners, and are differentially sensitive to disruption by hexanediols. Under physiological conditions, rapid and reversible LCD-LCD interactions occur between TFs and the RNA Pol II machinery without detectable phase separation. Our findings reveal fundamental mechanisms underpinning transcriptional control and suggest a framework for developing single-molecule imaging screens for drugs targeting gene regulatory interactions implicated in disease.

710 citations


Journal ArticleDOI
02 Mar 2018-Science
TL;DR: This study comprehensively identify and experimentally verify new defense systems based on their enrichment within defense islands in an attempt to systematically map the arsenal of defense tools that are at the disposal of microbes in their fight against phages.
Abstract: The arms race between bacteria and phages led to the development of sophisticated antiphage defense systems, including CRISPR-Cas and restriction-modification systems. Evidence suggests that unknown defense systems are located in “defense islands” in microbial genomes. We comprehensively characterized the bacterial defensive arsenal by examining gene families that are clustered next to known defense genes in prokaryotic genomes. Candidate defense systems were systematically engineered and validated in model bacteria for their antiphage activities. We report nine previously unknown antiphage systems and one antiplasmid system that are widespread in microbes and strongly protect against foreign invaders. These include systems that adopted components of the bacterial flagella and condensin complexes. Our data also suggest a common, ancient ancestry of innate immunity components shared between animals, plants, and bacteria.

650 citations


Journal ArticleDOI
28 Sep 2018-Science
TL;DR: By applying sci-CAR to lung adenocarcinoma cells and mouse kidney tissue, the authors demonstrate precision in assessing expression and genome accessibility at a genome-wide scale and provide an improvement over bulk analysis, which can be confounded by differing cellular subgroups.
Abstract: Although we can increasingly measure transcription, chromatin, methylation, and other aspects of molecular biology at single-cell resolution, most assays survey only one aspect of cellular biology. Here we describe sci-CAR, a combinatorial indexing–based coassay that jointly profiles chromatin accessibility and mRNA (CAR) in each of thousands of single cells. As a proof of concept, we apply sci-CAR to 4825 cells, including a time series of dexamethasone treatment, as well as to 11,296 cells from the adult mouse kidney. With the resulting data, we compare the pseudotemporal dynamics of chromatin accessibility and gene expression, reconstruct the chromatin accessibility profiles of cell types defined by RNA profiles, and link cis-regulatory sites to their target genes on the basis of the covariance of chromatin accessibility and transcription across large numbers of single cells.

627 citations


Journal ArticleDOI
04 May 2018-Science
TL;DR: Saturation-scale mutagenesis allows prioritization of intervention targets in the genome of the most important cause of malaria, and confirms the proteasome-degradation pathway is a high-value druggable target.
Abstract: INTRODUCTION Malaria remains a devastating global parasitic disease, with the majority of malaria deaths caused by the highly virulent Plasmodium falciparum . The extreme AT-bias of the P. falciparum genome has hampered genetic studies through targeted approaches such as homologous recombination or CRISPR-Cas9, and only a few hundred P. falciparum mutants have been experimentally generated in the past decades. In this study, we have used high-throughput piggyBac transposon insertional mutagenesis and quantitative insertion site sequencing (QIseq) to reach saturation-level mutagenesis of this parasite. RATIONALE Our study exploits the AT-richness of the P. falciparum genome, which provides numerous piggyBac transposon insertion targets within both gene coding and noncoding flanking sequences, to generate more than 38,000 P. falciparum mutants. At this level of mutagenesis, we could distinguish essential genes as nonmutable and dispensable genes as mutable. Subsequently, we identified 2680 genes essential for in vitro asexual blood-stage growth. RESULTS We calculated mutagenesis index scores (MISs) and mutagenesis fitness scores (MFSs) in order to functionally define the relative fitness cost of disruption for 5399 genes. A competitive growth phenotype screen confirmed that MIS and MFS were predictive of the fitness cost for in vitro asexual growth. Genes predicted to be essential included genes implicated in drug resistance—such as the “ K13 ” Kelch propeller, mdr , and dhfr-ts —as well as targets considered to be high value for drugs development, such as pkg and cdpk5 . The screen revealed essential genes that are specific to human Plasmodium parasites but absent from rodent-infective species, such as lipid metabolic genes that may be crucial to transmission commitment in human infections. MIS and MFS profiling provides a clear ranking of the relative essentiality of gene ontology (GO) functions in P. falciparum . GO pathways associated with translation, RNA metabolism, and cell cycle control are more essential, whereas genes associated with protein phosphorylation, virulence factors, and transcription are more likely to be dispensable. Last, we confirm that the proteasome-degradation pathway is a high-value druggable target on the basis of its high ratio of essential to dispensable genes, and by functionally confirming its link to the mode of action of artemisinin, the current front-line antimalarial. CONCLUSION Saturation-scale mutagenesis allows prioritization of intervention targets in the genome of the most important cause of malaria. The identification of more than 2680 essential genes, including ~1000 Plasmodium -conserved essential genes, will be valuable for antimalarial therapeutic research.

622 citations


Journal ArticleDOI
TL;DR: It is found that CRISPR–Cas9-targeted disruption of the intron 4–exon 5 boundary aimed at blocking the formation of functional AgdsxF did not affect male development or fertility, whereas females homozygous for the disrupted allele showed an intersex phenotype and complete sterility.
Abstract: In the human malaria vector Anopheles gambiae, the gene doublesex (Agdsx) encodes two alternatively spliced transcripts, dsx-female (AgdsxF) and dsx-male (AgdsxM), that control differentiation of the two sexes. The female transcript, unlike the male, contains an exon (exon 5) whose sequence is highly conserved in all Anopheles mosquitoes so far analyzed. We found that CRISPR-Cas9-targeted disruption of the intron 4-exon 5 boundary aimed at blocking the formation of functional AgdsxF did not affect male development or fertility, whereas females homozygous for the disrupted allele showed an intersex phenotype and complete sterility. A CRISPR-Cas9 gene drive construct targeting this same sequence spread rapidly in caged mosquitoes, reaching 100% prevalence within 7-11 generations while progressively reducing egg production to the point of total population collapse. Owing to functional constraint of the target sequence, no selection of alleles resistant to the gene drive occurred in these laboratory experiments. Cas9-resistant variants arose in each generation at the target site but did not block the spread of the drive.

Journal ArticleDOI
TL;DR: GSCALite is a user-friendly web server for dynamic analysis and visualization of gene set in cancer and drug sensitivity correlation, which will be of broad utilities to cancer researchers.
Abstract: Summary The availability of cancer genomic data makes it possible to analyze genes related to cancer. Cancer is usually the result of a set of genes and the signal of a single gene could be covered by background noise. Here, we present a web server named Gene Set Cancer Analysis (GSCALite) to analyze a set of genes in cancers with the following functional modules. (i) Differential expression in tumor versus normal, and the survival analysis; (ii) Genomic variations and their survival analysis; (iii) Gene expression associated cancer pathway activity; (iv) miRNA regulatory network for genes; (v) Drug sensitivity for genes; (vi) Normal tissue expression and eQTL for genes. GSCALite is a user-friendly web server for dynamic analysis and visualization of gene set in cancer and drug sensitivity correlation, which will be of broad utilities to cancer researchers. Availability and implementation GSCALite is available on http://bioinfo.life.hust.edu.cn/web/GSCALite/. Supplementary information Supplementary data are available at Bioinformatics online.

Journal ArticleDOI
TL;DR: SAVER (single-cell analysis via expression recovery), an expression recovery method for unique molecule index (UMI)-based scRNA-seq data that borrows information across genes and cells to provide accurate expression estimates for all genes.
Abstract: In single-cell RNA sequencing (scRNA-seq) studies, only a small fraction of the transcripts present in each cell are sequenced. This leads to unreliable quantification of genes with low or moderate expression, which hinders downstream analysis. To address this challenge, we developed SAVER (single-cell analysis via expression recovery), an expression recovery method for unique molecule index (UMI)-based scRNA-seq data that borrows information across genes and cells to provide accurate expression estimates for all genes.

Journal ArticleDOI
TL;DR: The current understanding of biogenesis and gene regulatory mechanisms of circRNAs is provided, the recent studies oncircRNAs as potential diagnostic and prognostic biomarkers are summarized, and the major advantages and limitations of circ RNAs as novel biomarkers based on existing knowledge are highlighted.

Journal ArticleDOI
TL;DR: A meta-analysis of genome-wide association studies with ~16 million genetic variants in 62,892 T2D cases and 596,424 controls of European ancestry identifies 139 common and 4 rare variants associated with type 2 diabetes, 42 of which (39 common and 3 rare variants) are independent of the known variants.
Abstract: Type 2 diabetes (T2D) is a very common disease in humans. Here we conduct a meta-analysis of genome-wide association studies (GWAS) with ~16 million genetic variants in 62,892 T2D cases and 596,424 controls of European ancestry. We identify 139 common and 4 rare variants associated with T2D, 42 of which (39 common and 3 rare variants) are independent of the known variants. Integration of the gene expression data from blood (n = 14,115 and 2765) with the GWAS results identifies 33 putative functional genes for T2D, 3 of which were targeted by approved drugs. A further integration of DNA methylation (n = 1980) and epigenomic annotation data highlight 3 genes (CAMK1D, TP53INP1, and ATP5G1) with plausible regulatory mechanisms, whereby a genetic variant exerts an effect on T2D through epigenetic regulation of gene expression. Our study uncovers additional loci, proposes putative genetic regulatory mechanisms for T2D, and provides evidence of purifying selection for T2D-associated variants.

Journal ArticleDOI
29 Nov 2018-Cell
TL;DR: The DICE project identified cis-eQTLs for a total of 12,254 unique genes, which represent 61% of all protein-coding genes expressed in these cell types and found that biological sex is associated with major differences in immune cell gene expression in a highly cell-specific manner.

Journal ArticleDOI
TL;DR: A meta-analysis of 314 cloned R genes reveals nine molecular mechanisms by which R proteins can elevate or trigger disease resistance, and clearer understanding of mechanisms is emerging and will be crucial for rational engineering and deployment of novel R genes.
Abstract: Plants have many, highly variable resistance (R) gene loci, which provide resistance to a variety of pathogens. The first R gene to be cloned, maize (Zea mays) Hm1, was published over 25 years ago, and since then, many different R genes have been identified and isolated. The encoded proteins have provided clues to the diverse molecular mechanisms underlying immunity. Here, we present a meta-analysis of 314 cloned R genes. The majority of R genes encode cell surface or intracellular receptors, and we distinguish nine molecular mechanisms by which R proteins can elevate or trigger disease resistance: direct (1) or indirect (2) perception of pathogen-derived molecules on the cell surface by receptor-like proteins and receptor-like kinases; direct (3) or indirect (4) intracellular detection of pathogen-derived molecules by nucleotide binding, leucine-rich repeat receptors, or detection through integrated domains (5); perception of transcription activator-like effectors through activation of executor genes (6); and active (7), passive (8), or host reprogramming-mediated (9) loss of susceptibility. Although the molecular mechanisms underlying the functions of R genes are only understood for a small proportion of known R genes, a clearer understanding of mechanisms is emerging and will be crucial for rational engineering and deployment of novel R genes.

Journal ArticleDOI
16 Mar 2018-Science
TL;DR: The daily expression rhythms in >80% of protein-coding genes, encoding diverse biochemical and cellular functions, constitutes by far the largest regulatory mechanism that integrates diverse biochemical functions within and across cell types.
Abstract: INTRODUCTION The interaction among cell-autonomous circadian oscillators—daily cycles of activity–rest and feeding–fasting—produces diurnal rhythms in gene expression in almost all animal tissues These rhythms control the timing of a wide range of functions across different organs and brain regions, affording optimal fitness Chronic disruption of these rhythms predisposes to and are hallmarks of numerous diseases and affective disorders RATIONALE Time-series gene expression studies in a limited number of tissues from rodents have shown that 10 to 40% of the genome exhibits a ~24-hour rhythm in expression in a tissue-specific manner However, rhythmic expression data from diverse tissues and brain regions from humans or our closest primate relatives is rare Such multitissue diurnal gene expression data are necessary for gaining mechanistic understanding of how spatiotemporal orchestration of gene expression maintains normal physiology and behavior We used a RNA sequencing technique to assess gene expression in major tissues and brain regions from baboons (a primate closely related to humans) housed under a defined 24-hour light–dark and feeding–fasting schedule RESULTS We assessed gene expression in 64 different tissues and brain regions of male baboons, collected every 2 hours over the 24-hour day Tissue-specific transcriptomes in baboon were comparable with that from humans (Human GTEx data set) We detected >25,000 expressed transcripts, including protein-coding and -noncoding RNAs Nearly 11,000 genes were commonly expressed in all tissues These universally expressed genes (UEGs) encoded for basic cellular functions such as transcription, RNA processing, DNA repair, protein homeostasis, and cellular metabolism The remainders were expressed in distinct sets of tissues, with ~1500 genes expressed exclusively in a single tissue Rhythmic transcripts were found in all tissues, but the number of cycling transcripts varied from ~200 to >3000 in a given tissue, with only limited overlap in the repertoire of rhythmic transcripts between tissues Of the 11,000 UEGs, the vast majority (966%) showed 24-hour rhythmicity in at least one tissue A majority (>80%) of the 18,000 protein-coding genes detected also exhibited 24-hour rhythms in expression The most enriched rhythmic transcripts across tissues were core clock components and their immediate output targets However, their relative abundance and robustness of daily rhythms varied across tissues Considered at the organismal level, global rhythmic transcription in 64 tissues organized into bursts of peak transcription, during early morning and late afternoon (when 11,000 transcripts reach their peak level) By contrast, during a relative “quiescent phase” in early evening that coincides with the onset of sleep and no food intake, only 700 rhythmic transcripts reach their peak expression level CONCLUSION The daily expression rhythms in >80% of protein-coding genes, encoding diverse biochemical and cellular functions, constitutes by far the largest regulatory mechanism that integrates diverse biochemical functions within and across cell types From a translational point of view, rhythmicity may have a major impact in health because 822% of genes coding for proteins that are identified as druggable targets by the US Food and Drug Administration show cyclic changes in transcription

Journal ArticleDOI
TL;DR: Recon3D is presented, a computational resource that includes three-dimensional metabolite and protein structure data and enables integrated analyses of metabolic functions in humans and is used to functionally characterize mutations associated with disease, and identify metabolic response signatures that are caused by exposure to certain drugs.
Abstract: Genome-scale network reconstructions have helped uncover the molecular basis of metabolism. Here we present Recon3D, a computational resource that includes three-dimensional (3D) metabolite and protein structure data and enables integrated analyses of metabolic functions in humans. We use Recon3D to functionally characterize mutations associated with disease, and identify metabolic response signatures that are caused by exposure to certain drugs. Recon3D represents the most comprehensive human metabolic network model to date, accounting for 3,288 open reading frames (representing 17% of functionally annotated human genes), 13,543 metabolic reactions involving 4,140 unique metabolites, and 12,890 protein structures. These data provide a unique resource for investigating molecular mechanisms of human metabolism. Recon3D is available at http://vmh.life.

Journal ArticleDOI
TL;DR: The authors review the role of genetic structural variation in disease and the pathogenic potential of changes to the 3D genome.
Abstract: Structural and quantitative chromosomal rearrangements, collectively referred to as structural variation (SV), contribute to a large extent to the genetic diversity of the human genome and thus are of high relevance for cancer genetics, rare diseases and evolutionary genetics. Recent studies have shown that SVs can not only affect gene dosage but also modulate basic mechanisms of gene regulation. SVs can alter the copy number of regulatory elements or modify the 3D genome by disrupting higher-order chromatin organization such as topologically associating domains. As a result of these position effects, SVs can influence the expression of genes distant from the SV breakpoints, thereby causing disease. The impact of SVs on the 3D genome and on gene expression regulation has to be considered when interpreting the pathogenic potential of these variant types.

Journal ArticleDOI
31 Jan 2018-Nature
TL;DR: It is shown that the pervasive presence of multiple enhancers with similar activities near the same gene confers phenotypic robustness to loss-of-function mutations in individual enhancers.
Abstract: Distant-acting tissue-specific enhancers, which regulate gene expression, vastly outnumber protein-coding genes in mammalian genomes, but the functional importance of this regulatory complexity remains unclear Here we show that the pervasive presence of multiple enhancers with similar activities near the same gene confers phenotypic robustness to loss-of-function mutations in individual enhancers We used genome editing to create 23 mouse deletion lines and inter-crosses, including both single and combinatorial enhancer deletions at seven distinct loci required for limb development Unexpectedly, none of the ten deletions of individual enhancers caused noticeable changes in limb morphology By contrast, the removal of pairs of limb enhancers near the same gene resulted in discernible phenotypes, indicating that enhancers function redundantly in establishing normal morphology In a genetic background sensitized by reduced baseline expression of the target gene, even single enhancer deletions caused limb abnormalities, suggesting that functional redundancy is conferred by additive effects of enhancers on gene expression levels A genome-wide analysis integrating epigenomic and transcriptomic data from 29 developmental mouse tissues revealed that mammalian genes are very commonly associated with multiple enhancers that have similar spatiotemporal activity Systematic exploration of three representative developmental structures (limb, brain and heart) uncovered more than one thousand cases in which five or more enhancers with redundant activity patterns were found near the same gene Together, our data indicate that enhancer redundancy is a remarkably widespread feature of mammalian genomes that provides an effective regulatory buffer to prevent deleterious phenotypic consequences upon the loss of individual enhancers

Journal ArticleDOI
TL;DR: An overview of molecular mechanisms underlying the function and regulation of core promoters and their emerging functional diversity, which defines distinct transcription programmes and can explain the nature and outcome of transcription initiation at gene start sites and at enhancers is provided.
Abstract: RNA polymerase II (Pol II) core promoters are specialized DNA sequences at transcription start sites of protein-coding and non-coding genes that support the assembly of the transcription machinery and transcription initiation. They enable the highly regulated transcription of genes by selectively integrating regulatory cues from distal enhancers and their associated regulatory proteins. In this Review, we discuss the defining properties of gene core promoters, including their sequence features, chromatin architecture and transcription initiation patterns. We provide an overview of molecular mechanisms underlying the function and regulation of core promoters and their emerging functional diversity, which defines distinct transcription programmes. On the basis of the established properties of gene core promoters, we discuss transcription start sites within enhancers and integrate recent results obtained from dedicated functional assays to propose a functional model of transcription initiation. This model can explain the nature and function of transcription initiation at gene starts and at enhancers and can explain the different roles of core promoters, of Pol II and its associated factors and of the activating cues provided by enhancers and the transcription factors and cofactors they recruit.


Journal ArticleDOI
TL;DR: In this article, the authors report transcription of genes involved in aerobic and anaerobic benzene degradation pathways in a benzene-degrading denitrifying continuous culture.
Abstract: In this study, we report transcription of genes involved in aerobic and anaerobic benzene degradation pathways in a benzene-degrading denitrifying continuous culture. Transcripts associated with the family Peptococcaceae dominated all samples (21-36% relative abundance) indicating their key role in the community. We found a highly transcribed gene cluster encoding a presumed anaerobic benzene carboxylase (AbcA and AbcD) and a benzoate-coenzyme A ligase (BzlA). Predicted gene products showed >96% amino acid identity and similar gene order to the corresponding benzene degradation gene cluster described previously, providing further evidence for anaerobic benzene activation via carboxylation. For subsequent benzoyl-CoA dearomatization, bam-like genes analogous to the ones found in other strict anaerobes were transcribed, whereas gene transcripts involved in downstream benzoyl-CoA degradation were mostly analogous to the ones described in facultative anaerobes. The concurrent transcription of genes encoding enzymes involved in oxygenase-mediated aerobic benzene degradation suggested oxygen presence in the culture, possibly formed via a recently identified nitric oxide dismutase (Nod). Although we were unable to detect transcription of Nod-encoding genes, addition of nitrite and formate to the continuous culture showed indication for oxygen production. Such an oxygen production would enable aerobic microbes to thrive in oxygen-depleted and nitrate-containing subsurface environments contaminated with hydrocarbons.

Journal ArticleDOI
16 May 2018-Nature
TL;DR: A large-scale mutagenesis screen identifies mutant phenotypes for over 11,000 protein-coding genes in bacteria that had previously not been assigned a specific function, demonstrating the scalability of microbial genetics and its utility for improving gene annotations.
Abstract: One-third of all protein-coding genes from bacterial genomes cannot be annotated with a function. Here, to investigate the functions of these genes, we present genome-wide mutant fitness data from 32 diverse bacteria across dozens of growth conditions. We identified mutant phenotypes for 11,779 protein-coding genes that had not been annotated with a specific function. Many genes could be associated with a specific condition because the gene affected fitness only in that condition, or with another gene in the same bacterium because they had similar mutant phenotypes. Of the poorly annotated genes, 2,316 had associations that have high confidence because they are conserved in other bacteria. By combining these conserved associations with comparative genomics, we identified putative DNA repair proteins; in addition, we propose specific functions for poorly annotated enzymes and transporters and for uncharacterized protein families. Our study demonstrates the scalability of microbial genetics and its utility for improving gene annotations.

Journal ArticleDOI
TL;DR: This study uniformly analyzed whole-exome sequencing of 249 tumors and matched normal tissue from patients with clinically annotated outcomes to immune checkpoint therapy across multiple cancer types to examine additional tumor genomic features that contribute to selective response.
Abstract: Tumor mutational burden correlates with response to immune checkpoint blockade in multiple solid tumors, although in microsatellite-stable tumors this association is of uncertain clinical utility. Here we uniformly analyzed whole-exome sequencing (WES) of 249 tumors and matched normal tissue from patients with clinically annotated outcomes to immune checkpoint therapy, including radiographic response, across multiple cancer types to examine additional tumor genomic features that contribute to selective response. Our analyses identified genomic correlates of response beyond mutational burden, including somatic events in individual driver genes, certain global mutational signatures, and specific HLA-restricted neoantigens. However, these features were often interrelated, highlighting the complexity of identifying genetic driver events that generate an immunoresponsive tumor environment. This study lays a path forward in analyzing large clinical cohorts in an integrated and multifaceted manner to enhance the ability to discover clinically meaningful predictive features of response to immune checkpoint blockade.

Journal ArticleDOI
07 Nov 2018-Nature
TL;DR: This study establishes an approach for precise, template-free genome editing using a machine-learning algorithm to predict the spectrum of CRISPR–Cas9-nuclease-mediated DNA repair outcomes at human genomic target sites.
Abstract: Following Cas9 cleavage, DNA repair without a donor template is generally considered stochastic, heterogeneous and impractical beyond gene disruption. Here, we show that template-free Cas9 editing is predictable and capable of precise repair to a predicted genotype, enabling correction of disease-associated mutations in humans. We constructed a library of 2,000 Cas9 guide RNAs paired with DNA target sites and trained inDelphi, a machine learning model that predicts genotypes and frequencies of 1- to 60-base-pair deletions and 1-base-pair insertions with high accuracy (r = 0.87) in five human and mouse cell lines. inDelphi predicts that 5–11% of Cas9 guide RNAs targeting the human genome are ‘precise-50’, yielding a single genotype comprising greater than or equal to 50% of all major editing products. We experimentally confirmed precise-50 insertions and deletions in 195 human disease-relevant alleles, including correction in primary patient-derived fibroblasts of pathogenic alleles to wild-type genotype for Hermansky–Pudlak syndrome and Menkes disease. This study establishes an approach for precise, template-free genome editing. The authors use a machine-learning algorithm to predict the spectrum of CRISPR–Cas9-nuclease-mediated DNA repair outcomes at human genomic target sites.

Journal ArticleDOI
TL;DR: This article summarizes the current knowledge about the “splicing mutations” and methods that help to identify such changes in clinical diagnosis and recommends bioinformatic algorithms as a tool to assess the possible effect of the identified changes.
Abstract: Precise pre-mRNA splicing, essential for appropriate protein translation, depends on the presence of consensus “cis” sequences that define exon-intron boundaries and regulatory sequences recognized by splicing machinery. Point mutations at these consensus sequences can cause improper exon and intron recognition and may result in the formation of an aberrant transcript of the mutated gene. The splicing mutation may occur in both introns and exons and disrupt existing splice sites or splicing regulatory sequences (intronic and exonic splicing silencers and enhancers), create new ones, or activate the cryptic ones. Usually such mutations result in errors during the splicing process and may lead to improper intron removal and thus cause alterations of the open reading frame. Recent research has underlined the abundance and importance of splicing mutations in the etiology of inherited diseases. The application of modern techniques allowed to identify synonymous and nonsynonymous variants as well as deep intronic mutations that affected pre-mRNA splicing. The bioinformatic algorithms can be applied as a tool to assess the possible effect of the identified changes. However, it should be underlined that the results of such tests are only predictive, and the exact effect of the specific mutation should be verified in functional studies. This article summarizes the current knowledge about the “splicing mutations” and methods that help to identify such changes in clinical diagnosis.

Journal ArticleDOI
24 Jan 2018-Nature
TL;DR: The sequencing and assembly of the 32-gigabase-pair axolotl genome is reported using an approach that combined long-read sequencing, optical mapping and development of a new genome assembler (MARVEL).
Abstract: Salamanders serve as important tetrapod models for developmental, regeneration and evolutionary studies. An extensive molecular toolkit makes the Mexican axolotl (Ambystoma mexicanum) a key representative salamander for molecular investigations. Here we report the sequencing and assembly of the 32-gigabase-pair axolotl genome using an approach that combined long-read sequencing, optical mapping and development of a new genome assembler (MARVEL). We observed a size expansion of introns and intergenic regions, largely attributable to multiplication of long terminal repeat retroelements. We provide evidence that intron size in developmental genes is under constraint and that species-restricted genes may contribute to limb regeneration. The axolotl genome assembly does not contain the essential developmental gene Pax3. However, mutation of the axolotl Pax3 paralogue Pax7 resulted in an axolotl phenotype that was similar to those seen in Pax3-/- and Pax7-/- mutant mice. The axolotl genome provides a rich biological resource for developmental and evolutionary studies.

Journal ArticleDOI
14 Jun 2018-Cell
TL;DR: Using integrative genomic analysis of 360 metastatic castration-resistant prostate cancer samples, a novel subtype of prostate cancer typified by biallelic loss of CDK12 is identified that is mutually exclusive with tumors driven by DNA repair deficiency, ETS fusions, and SPOP mutations.