scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Widespread and extensive lengthening of 3′ UTRs in the mammalian brain

01 May 2013-Genome Research (Genome Res)-Vol. 23, Iss: 5, pp 812-825
TL;DR: Deep mammalian RNA-seq data was analyzed using conservative criteria, and 2035 mouse and 1847 human genes that utilize substantially distal novel 3' UTRs were identified, which greatly expand the scope of post-transcriptional regulatory networks in mammals, and have particular impact on the central nervous system.
Abstract: Remarkable advances in techniques for gene expression profiling have radically changed our knowledge of the transcriptome. Recently, the mammalian brain was reported to express many long intergenic noncoding (lincRNAs) from loci downstream from protein-coding genes. Our experimental tests failed to validate specific accumulation of lincRNA transcripts, and instead revealed strongly distal 3' UTRs generated by alternative cleavage and polyadenylation (APA). With this perspective in mind, we analyzed deep mammalian RNA-seq data using conservative criteria, and identified 2035 mouse and 1847 human genes that utilize substantially distal novel 3' UTRs. Each of these extends at least 500 bases past the most distal 3' termini available in Ensembl v65, and collectively they add 6.6 Mb and 5.1 Mb to the mRNA space of mouse and human, respectively. Extensive Northern analyses validated stable accumulation of distal APA isoforms, including transcripts bearing exceptionally long 3' UTRs (many >10 kb and some >18 kb in length). The Northern data further illustrate that the extensions we annotated were not due to unprocessed transcriptional run-off events. Global tissue comparisons revealed that APA events yielding these extensions were most prevalent in the mouse and human brain. Finally, these extensions collectively contain thousands of conserved miRNA binding sites, and these are strongly enriched for many well-studied neural miRNAs. Altogether, these new 3' UTR annotations greatly expand the scope of post-transcriptional regulatory networks in mammals, and have particular impact on the central nervous system.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
12 Aug 2015-eLife
TL;DR: It is shown that recently reported non-canonical sites do not mediate repression despite binding the miRNA, which indicates that the vast majority of functional sites are canonical.
Abstract: Proteins are built by using the information contained in molecules of messenger RNA (mRNA). Cells have several ways of controlling the amounts of different proteins they make. For example, a so-called ‘microRNA’ molecule can bind to an mRNA molecule to cause it to be more rapidly degraded and less efficiently used, thereby reducing the amount of protein built from that mRNA. Indeed, microRNAs are thought to help control the amount of protein made from most human genes, and biologists are working to predict the amount of control imparted by each microRNA on each of its mRNA targets. All RNA molecules are made up of a sequence of bases, each commonly known by a single letter—‘A’, ‘U’, ‘C’ or ‘G’. These bases can each pair up with one specific other base—‘A’ pairs with ‘U’, and ‘C’ pairs with ‘G’. To direct the repression of an mRNA molecule, a region of the microRNA known as a ‘seed’ binds to a complementary sequence in the target mRNA. ‘Canonical sites’ are regions in the mRNA that contain the exact sequence of partner bases for the bases in the microRNA seed. Some canonical sites are more effective at mRNA control than others. ‘Non-canonical sites’ also exist in which the pairing between the microRNA seed and mRNA does not completely match. Previous work has suggested that many non-canonical sites can also control mRNA degradation and usage. Agarwal et al. first used large experimental datasets from many sources to investigate microRNA activity in more detail. As expected, when mRNAs had canonical sites that matched the microRNA, mRNA levels and usage tended to drop. However, no effect was observed when the mRNAs only had recently identified non-canonical sites. This suggests that microRNAs primarily bind to canonical sites to control protein production. Based on these results, Agarwal et al. further developed a statistical model that predicts the effects of microRNAs binding to canonical sites. The updated model considers 14 different features of the microRNA, microRNA site, or mRNA—including the mRNA sequence around the site—to predict which sites within mRNAs are most effectively targeted by microRNAs. Tests showed that Agarwal et al.'s model was as good as experimental approaches at identifying the effective target sites, and was better than existing computational models. The model has been used to power the latest version of a freely available resource called TargetScan, and so could prove a valuable resource for researchers investigating the many important roles of microRNAs in controlling protein production.

5,365 citations


Cites background from "Widespread and extensive lengthenin..."

  • ...…miRNA isoforms that have different seeds, either because both strands of the miRNA duplex load into Argonaute with nearequal efficiencies or because processing heterogeneity gives rise to alternative 5′ termini (AzumaMukai et al., 2008; Morin et al., 2008; Wu et al., 2009; Chiang et al., 2010)....

    [...]

Journal ArticleDOI
TL;DR: This work exploits massive Drosophila total RNA-sequencing data, >5 billion paired-end reads from >100 libraries covering diverse developmental stages, tissues, and cultured cells, to rigorously annotate >2,500 fruit fly circular RNAs, which exhibit commonalities and distinctions from mammalian circles.

791 citations


Cites methods from "Widespread and extensive lengthenin..."

  • ...Northern blots, cDNA preparation, and RT-PCRs were performed as described (Miura et al., 2013)....

    [...]

  • ...To degrade linear RNA, we treated 60 mg of total RNA with 120 U RNase R (Epicenter) for 45 min at 37 C. Northern blots, cDNA preparation, and RT-PCRs were performed as described (Miura et al., 2013)....

    [...]

Journal ArticleDOI
TL;DR: How to determine whether any given ncRNA has a function is discussed, and it is advocated that in the absence of any such data, the appropriate null hypothesis is that the RNA in question is junk.
Abstract: The genomes of large multicellular eukaryotes are mostly comprised of non-protein coding DNA. Although there has been much agreement that a small fraction of these genomes has important biological functions, there has been much debate as to whether the rest contributes to development and/or homeostasis. Much of the speculation has centered on the genomic regions that are transcribed into RNA at some low level. Unfortunately these RNAs have been arbitrarily assigned various names, such as “intergenic RNA”, “long non-coding RNAs” etc., which have led to some confusion in the field. Many researchers believe that these transcripts represent a vast, unchartered world of functional non-coding RNAs (ncRNAs), simply because they exist. However, there are reasons to question this Panglossian view because it ignores our current understanding of how evolution shapes eukaryotic genomes and how the gene expression machinery works in eukaryotic cells. Although there are undoubtedly many more functional ncRNAs yet to be discovered and characterized, it is also likely that many of these transcripts are simply junk. Here we discuss how to determine whether any given ncRNA has a function. Importantly, we advocate that in the absence of any such data, the appropriate null hypothesis is that the RNA in question is junk.

580 citations


Cites background from "Widespread and extensive lengthenin..."

  • ...Finally, it is also worth pointing out that a significant fraction of these lncRNAs may actually be misannotated untranslated regions of known mRNAs (Miura et al., 2013)....

    [...]

Journal ArticleDOI
TL;DR: A novel bioinformatics algorithm (DaPars) is developed for the de novo identification of dynamic APAs from standard RNA-seq that implicate CstF64, an essential polyadenylation factor, as a master regulator of 3'-UTR shortening across multiple tumour types.
Abstract: Alternative polyadenylation (APA) is a pervasive mechanism in the regulation of most human genes, and its implication in diseases including cancer is only beginning to be appreciated. Since conventional APA profiling has not been widely adopted, global cancer APA studies are very limited. Here we develop a novel bioinformatics algorithm (DaPars) for the de novo identification of dynamic APAs from standard RNA-seq. When applied to 358 TCGA Pan-Cancer tumour/normal pairs across seven tumour types, DaPars reveals 1,346 genes with recurrent and tumour-specific APAs. Most APA genes (91%) have shorter 3'-untranslated regions (3' UTRs) in tumours that can avoid microRNA-mediated repression, including glutaminase (GLS), a key metabolic enzyme for tumour proliferation. Interestingly, selected APA events add strong prognostic power beyond common clinical and molecular variables, suggesting their potential as novel prognostic biomarkers. Finally, our results implicate CstF64, an essential polyadenylation factor, as a master regulator of 3'-UTR shortening across multiple tumour types.

393 citations

Journal ArticleDOI
19 Jun 2014-Nature
TL;DR: CFIm25 is identified as a broad repressor of proximal poly(A) site usage that, when depleted, increases cell proliferation and is revealed as a previously unknown connection between CFIm25 and glioblastoma tumorigenicity.
Abstract: CFIm25 is identified as a factor that prevents messenger RNAs being shortened due to altered 3′ polyadenylation, which typically occurs when cells undergo high proliferation and correlates with increased tumorigenic activity in glioblastoma tumours. Cells undergoing high proliferation display mRNAs that are shortened as a result of decreased 3′ polyadenylation. Eric Wagner and colleagues have identified CFIm25 (a 25-kilodalton component of the cleavage factor Im complex involved in pre-mRNA 3′-processing) as a factor that prevents polyadenylation shortening. In its absence, poly(A) tails are shorter and proliferation is enhanced in about 11% of expressed mRNAs in HeLa cells. This polyadenylation shortening correlates with the upregulation of several oncogenes, and in glioblastoma cells with higher tumorigenic activity. The global shortening of messenger RNAs through alternative polyadenylation (APA) that occurs during enhanced cellular proliferation represents an important, yet poorly understood mechanism of regulated gene expression1,2. The 3′ untranslated region (UTR) truncation of growth-promoting mRNA transcripts that relieves intrinsic microRNA- and AU-rich-element-mediated repression has been observed to correlate with cellular transformation3; however, the importance to tumorigenicity of RNA 3′-end-processing factors that potentially govern APA is unknown. Here we identify CFIm25 as a broad repressor of proximal poly(A) site usage that, when depleted, increases cell proliferation. Applying a regression model on standard RNA-sequencing data for novel APA events, we identified at least 1,450 genes with shortened 3′ UTRs after CFIm25 knockdown, representing 11% of significantly expressed mRNAs in human cells. Marked increases in the expression of several known oncogenes, including cyclin D1, are observed as a consequence of CFIm25 depletion. Importantly, we identified a subset of CFIm25-regulated APA genes with shortened 3′ UTRs in glioblastoma tumours that have reduced CFIm25 expression. Downregulation of CFIm25 expression in glioblastoma cells enhances their tumorigenic properties and increases tumour size, whereas CFIm25 overexpression reduces these properties and inhibits tumour growth. These findings identify a pivotal role of CFIm25 in governing APA and reveal a previously unknown connection between CFIm25 and glioblastoma tumorigenicity.

344 citations

References
More filters
Journal ArticleDOI
TL;DR: This protocol begins with raw sequencing reads and produces a transcriptome assembly, lists of differentially expressed and regulated genes and transcripts, and publication-quality visualizations of analysis results, which takes less than 1 d of computer time for typical experiments and ∼1 h of hands-on time.
Abstract: Recent advances in high-throughput cDNA sequencing (RNA-seq) can reveal new genes and splice variants and quantify expression genome-wide in a single assay. The volume and complexity of data from RNA-seq experiments necessitate scalable, fast and mathematically principled analysis software. TopHat and Cufflinks are free, open-source software tools for gene discovery and comprehensive expression analysis of high-throughput mRNA sequencing (RNA-seq) data. Together, they allow biologists to identify new genes and new splice variants of known ones, as well as compare gene and transcript expression under two or more conditions. This protocol describes in detail how to use TopHat and Cufflinks to perform such analyses. It also covers several accessory tools and utilities that aid in managing data, including CummeRbund, a tool for visualizing RNA-seq analysis results. Although the procedure assumes basic informatics skills, these tools assume little to no background with RNA-seq analysis and are meant for novices and experts alike. The protocol begins with raw sequencing reads and produces a transcriptome assembly, lists of differentially expressed and regulated genes and transcripts, and publication-quality visualizations of analysis results. The protocol's execution time depends on the volume of transcriptome sequencing data and available computing resources but takes less than 1 d of computer time for typical experiments and ∼1 h of hands-on time.

10,913 citations

Journal ArticleDOI
27 Nov 2008-Nature
TL;DR: An in-depth analysis of 15 diverse human tissue and cell line transcriptomes on the basis of deep sequencing of complementary DNA fragments yielding a digital inventory of gene and mRNA isoform expression suggested common involvement of specific factors in tissue-level regulation of both splicing and polyadenylation.
Abstract: Through alternative processing of pre-messenger RNAs, individual mammalian genes often produce multiple mRNA and protein isoforms that may have related, distinct or even opposing functions. Here we report an in-depth analysis of 15 diverse human tissue and cell line transcriptomes on the basis of deep sequencing of complementary DNA fragments, yielding a digital inventory of gene and mRNA isoform expression. Analyses in which sequence reads are mapped to exon-exon junctions indicated that 92-94% of human genes undergo alternative splicing, 86% with a minor isoform frequency of 15% or more. Differences in isoform-specific read densities indicated that most alternative splicing and alternative cleavage and polyadenylation events vary between tissues, whereas variation between individuals was approximately twofold to threefold less common. Extreme or 'switch-like' regulation of splicing between tissues was associated with increased sequence conservation in regulatory regions and with generation of full-length open reading frames. Patterns of alternative splicing and alternative cleavage and polyadenylation were strongly correlated across tissues, suggesting coordinated regulation of these processes, and sequence conservation of a subset of known regulatory motifs in both alternative introns and 3' untranslated regions suggested common involvement of specific factors in tissue-level regulation of both splicing and polyadenylation.

4,711 citations


"Widespread and extensive lengthenin..." refers background or methods in this paper

  • ...Microarray analysis of assorted tissues indicated that the mammalian brain broadly utilizes distal 39 UTR species (Sandberg et al. 2008; Wang et al. 2008)....

    [...]

  • ...Signal-to-noise ratios of all 7-mers were calculated according to the method previously described (Wang et al. 2008)....

    [...]

  • ...average, compared with other tissues (Stark et al. 2005; Wang et al. 2008; Ramskold et al. 2009)....

    [...]

Journal ArticleDOI
29 Jun 2007-Cell
TL;DR: The transcriptional landscape of the four human HOX loci is characterized at five base pair resolution in 11 anatomic sites and 231 HOX ncRNAs are identified that extend known transcribed regions by more than 30 kilobases, suggesting transcription of ncRNA may demarcate chromosomal domains of gene silencing at a distance.

4,003 citations


"Widespread and extensive lengthenin..." refers background in this paper

  • ...We are confident that the myriad and unexpected roles for long noncoding RNAs are just beginning to be unraveled (Rinn et al. 2007; Khalil et al. 2009; Gendrel and Heard 2011; Guttman et al. 2011)....

    [...]

Journal ArticleDOI
TL;DR: A model in which some lincRNAs guide chromatin-modifying complexes to specific genomic loci to regulate gene expression is proposed, and it is shown that siRNA-mediated depletion of certain linc RNAs associated with PRC2 leads to changes in gene expression.
Abstract: We recently showed that the mammalian genome encodes >1,000 large intergenic noncoding (linc)RNAs that are clearly conserved across mammals and, thus, functional. Gene expression patterns have implicated these lincRNAs in diverse biological processes, including cell-cycle regulation, immune surveillance, and embryonic stem cell pluripotency. However, the mechanism by which these lincRNAs function is unknown. Here, we expand the catalog of human lincRNAs to ≈3,300 by analyzing chromatin-state maps of various human cell types. Inspired by the observation that the well-characterized lincRNA HOTAIR binds the polycomb repressive complex (PRC)2, we tested whether many lincRNAs are physically associated with PRC2. Remarkably, we observe that ≈20% of lincRNAs expressed in various cell types are bound by PRC2, and that additional lincRNAs are bound by other chromatin-modifying complexes. Also, we show that siRNA-mediated depletion of certain lincRNAs associated with PRC2 leads to changes in gene expression, and that the up-regulated genes are enriched for those normally silenced by PRC2. We propose a model in which some lincRNAs guide chromatin-modifying complexes to specific genomic loci to regulate gene expression.

2,738 citations


"Widespread and extensive lengthenin..." refers background in this paper

  • ...We are confident that the myriad and unexpected roles for long noncoding RNAs are just beginning to be unraveled (Rinn et al. 2007; Khalil et al. 2009; Gendrel and Heard 2011; Guttman et al. 2011)....

    [...]

Journal ArticleDOI
15 Sep 2011-Nature
TL;DR: It is shown that Knockdown of lincRNAs has major consequences on gene expression patterns, comparable to knockdown of well-known ES cell regulators.
Abstract: Although thousands of large intergenic non-coding RNAs (lincRNAs) have been identified in mammals, few have been functionally characterized, leading to debate about their biological role. To address this, we performed loss-of-function studies on most lincRNAs expressed in mouse embryonic stem (ES) cells and characterized the effects on gene expression. Here we show that knockdown of lincRNAs has major consequences on gene expression patterns, comparable to knockdown of well-known ES cell regulators. Notably, lincRNAs primarily affect gene expression in trans. Knockdown of dozens of lincRNAs causes either exit from the pluripotent state or upregulation of lineage commitment programs. We integrate lincRNAs into the molecular circuitry of ES cells and show that lincRNA genes are regulated by key transcription factors and that lincRNA transcripts bind to multiple chromatin regulatory proteins to affect shared gene expression programs. Together, the results demonstrate that lincRNAs have key roles in the circuitry controlling ES cell state.

1,790 citations


"Widespread and extensive lengthenin..." refers background in this paper

  • ...We are confident that the myriad and unexpected roles for long noncoding RNAs are just beginning to be unraveled (Rinn et al. 2007; Khalil et al. 2009; Gendrel and Heard 2011; Guttman et al. 2011)....

    [...]