Rapid and Efficient Co-Transcriptional Splicing Enhances Mammalian Gene Expression
Summary (6 min read)
INTRODUCTION
- Transcription and pre-mRNA processing steps – 5′ end capping, splicing, base modification, and 3′ end cleavage – required for eukaryotic gene expression are each carried out by macromolecular machines.
- Spliceosomes may not assemble on all of the introns at the same time, because promoter-proximal introns are synthesized before promoterdistal introns.
- Co-transcriptional splicing also demands that the constellation of splicing factors capable of regulating a splicing event bind the nascent RNA coordinately with the timing imposed by transcription and in a relevant spatial window.
- Those studies did not explore coupling to 3′ end formation.
- Whether 3′ end cleavage efficiency contributes to gene expression levels in mammalian cells is currently unknown.
PacBio Long-read Sequencing of Nascent RNA Yields High Read Coverage
- Murine erythroleukemia (MEL) cells are immortalized at the proerythroblast stage and can be induced to enter terminal erythroid differentiation by treatment with 2% DMSO for five days (Antoniou, 1991).
- Phenotypic changes include decreased cell volume, increased levels of β-globin, and visible hemoglobinization .
- Chromatin purification under stringent washing conditions allows release of contaminating RNAs and retains the stable ternary complex formed by elongating Pol II, DNA, and nascent RNA .
- To generate libraries for LRS, the authors established the protocol outlined in Figure 1A.
- More than 7,500 genes were represented by more than 10 reads per gene in each condition .
LRS Reveals Rapid and Efficient Co-transcriptional Splicing
- Each long-read provides two critical pieces of information: the 3′ end reveals the position of Pol II when the RNA was isolated; the splice junctions reveal if splicing has occurred and which splice sites were chosen.
- To validate this finding, the authors examined the read length distribution for reads of each splicing status .
- One explanation for the relatively short distances observed between splice junctions and Pol II may be that Pol II pauses just downstream of an intron, allowing time for splicing to occur before elongation continues.
- To control for the possibility that high PRO-seq density from TSS peaks might bleed through to the first 5′SS, first introns were independently analyzed.
- To determine what features of specific introns might lead to increased splicing intermediates, the authors counted and normalized the number of splicing intermediates observed for each intron.
Unspliced Transcripts Display Poor Cleavage at Gene Ends
- Consistent with physiological terminal erythroid differentiation, their induced MEL cells shifted to maximal expression of a- and β-globin genes, each containing two introns.
- To their surprise, a large fraction of individual β-globin long-reads in the induced condition had 3′ ends that were up to 2.5 kb downstream of the annotated polyA site (PAS), indicating that these transcripts failed to undergo 3′ end cleavage at the PAS.
- Notably, PRO-seq reads are commonly detected well past the gene 3′ ends due to transcription termination (Core et al., 2008).
- Coverage of all unspliced reads was globally higher in the region downstream of a PAS than it was for partially spliced or all spliced reads .
- This genome-wide decrease in splicing efficiency associated with impaired 3′ end cleavage confirmed the coordination between splicing and 3′ end processing prominently observed in the globin genes.
A β-thalassemia Mutation Enhances Splicing and 3′ End Cleavage Efficiencies
- To investigate how mutations in splice sites alter co-transcriptional splicing efficiency, the authors took advantage of a known β-thalassemia allele.
- This thalassemia-causing mutation, known as IVS-110, generates an HBB mRNA with an in-frame stop codon, resulting in a 90% reduction in functional HBB protein through nonsense-mediated decay (Spritz et al., 1981; Vadolas et al., 2006).
- To rigorously test the possibility that changes in co-transcriptional splicing efficiency determine 3′ end cleavage, read coverage downstream of the HBB PAS was used to detect uncleaved long-reads for each category of splicing status .
- All-unspliced HBB reads were detected up to 4 kb past the PAS, similar to endogenous mouse globin genes.
- When only intron 2 was spliced, cleavage in MEL-HBB WT and MEL-HBB IVS-110(G>A) cells was similar .
DISCUSSION
- This study reveals functional relationships between co-transcriptional RNA processing events through genome-wide analysis of individual nascent transcripts purified from differentiating mammalian erythroid cells.
- Thus, spliceosome assembly and the transition to catalysis often occur when the spliceosome is physically close to Pol II.
- The authors conclude that splicing more typically occurs when Pol is close to the intron.
- The authors identified spliced reads within the PRO-seq data, validating the observations made with LRS of purified nascent RNA with an independent method.
- The fraction of efficiently spliced -globin transcripts increased in the thalassemia allele the authors studied, even though the cryptic 3′SS yields an out of frame mRNA that will – like many thalassemia alleles of -globin – be degraded by nonsense-mediated decay (Kurosaki et al., 2019).
LIMITATIONS
- First, the length of long-reads are dependent on reverse transcriptase processivity when copying RNA into cDNA.
- While the authors have taken steps to enrich for full-length transcripts in their library generation, some RNAs are likely not fully reverse transcribed and captured in this dataset.
- Second, the authors have not addressed directly what the ultimate fate of unspliced and uncleaved nascent RNA is in these cells.
- Finally, a more rigorous test of their proposed mechanism linking splicing and 3′ end cleavage would require tools to probe inhibition of both processes.
ACKNOWLEDGMENTS
- The authors thank P Patsali for sharing the MEL-HBB WT and MEL-HBB IVS-110(G>A) cell lines, M Antoniou for sharing an annotation of the GLOBE vector, and J Conboy for advice on erythroblast fractionation.
- The authors thank E Brown for help with preparation of LRS figures, J Gordon for technical assistance, and H Tilgner, T Carrocci, D Phizicky, T Alpert, T Henriques, and B Martin for helpful discussions and comments on the manuscript.
- This work was initiated through pilot funding from NIDDK under Grant U54DK106857 to the Yale Cooperative Center of Excellence in Hematology (to K.M.N.).
- Its contents are solely the responsibility of the authors and do not necessarily represent the official views of the NIH.
- K.A.R. is supported by a Postgraduate Scholarship from the Natural Sciences and Engineering Research Council of Canada and a Gruber Science Fellowship, and C.M. is supported by a National Science Foundation Graduate Research Fellowship (DGE1745303).
DECLARATION OF INTERESTS
- Solid line represents the mean coverage of three biological replicates, and shaded windows represent standard error of the mean.
- (B) PRO-seq 3′ end coverage aligned to 5′SSs for all introns from active transcripts (dark purple), first introns only (light purple), middle introns (light orange), and terminal intron (dark orange).
Data and Code Availability
- Raw and processed long-read sequencing and PRO-seq data generated in this study are deposited in NCBI’s Gene Expression Omnibus and are accessible through GEO Series accession number GSE144205.
- Raw image data associated with this manuscript are available on Mendeley (http://dx.doi.org/10.17632/5vrtbpnj4k.1).
- All code supporting the long-read sequencing data analysis in this manuscript is available at https://github.com/NeugebauerLab/MEL_LRS.
- NanoCOP data from Drexler et al. 2020 analyzed in this manuscript can be found at GEO with accession number GSE123191, and total RNA-seq from MEL cells analyzed in this study can be found at Mouse ENCODE (http://www.mouseencode.org/) with accession number ENCSR000CWE.
Subcellular Fractionation
- Subcellular fractionation was adapted from previously published protocols (Mayer and Churchman, 2017; Pandya-Jones and Black, 2009), with modifications to centrifugation speeds in order to retain intact nuclei (Reimer and Neugebauer, 2020).
- All steps were performed on ice, and all buffers contained 25 uM αamanitin, 40 U/ml SUPERase.IN, and 1x Roche cOmplete protease inhibitor mix.
- The supernatant (cytoplasm fraction) was removed, and the pellet were rinsed once with 500 μl PBS/1 mM EDTA.
- Chromatin was immediately dissolved in 100 μl PBS and 300 μl TRIzol Reagent .
Nascent RNA Isolation
- RNA was purified from chromatin pellets in TRIzol Reagent using the RNeasy Mini kit according to the manufacturer’s protocol, including the on-column DNase I digestion.
- For genome-wide nascent RNA-seq, samples were depleted three times of polyA(+) RNA using the Dynabeads mRNA DIRECT Micro Purification Kit , each time keeping the supernatant, then depleted of ribosomal RNA using the Ribo-Zero Gold rRNA Removal Kit .
- For targeted nascent RNA-seq, polyA(+) and rRNA depletion were omitted.
Western Blotting
- Cytoplasm, nucleoplasm, and chromatin fractions from cell fractionation were adjusted to an equal volume with PBS.
- Nucleoplasm and chromatin fractions were homogenized by sonication, and all samples were spun at 14,000 rpm for 10 min at 4°C before gel loading.
- For primers used to amplify Hbb-b1 and Gapdh, see Table S2.
- QPCR reactions were assembled using iQ SYBR Green Supermix and quantified on a Stratagene MX3000P qPCR machine.
- Expression fold changes were calculated using the ΔΔCt method.
Microscopy
- Live cells were imaged in bright field on an Olympus CKX41 microscope.
- For total RNA samples, RNA was extracted from approximately 5 million cells treated with Pladienolide B as described above and using TRIzol Reagent according to the manufacturer’s protocol.
- RNA was further depleted from this sample as described above.
- PCR was performed using Phusion High-Fidelity DNA Polymerase (NEB) according to the manufacturer’s protocol.
- For the list of intron-flanking primers used in these experiments, see Table S2.
Genome-wide nascent RNA sequencing
- Mapped reads in SAM format were filtered to remove reads that contained a polyA tail using a custom script (available on Github).
- Briefly, mapped reads that had soft-clipped bases at the 3′ end were discarded if the soft-clipped region of the read contained 4 or more A’s and the fraction of A’s was greater than 0.9.
- Similarly, reads with soft-clipped bases at the 5′ end (resulting from minus strand reads) containing at least 4 T’s and having a fraction of T’s greater than 0.9 were discarded.
HBB targeted nascent RNA sequencing
- Additional parameters were added to the above criteria for removing polyA-containing reads from targeted data mapped to the HBB locus based on empirical observation.
- Since the HBB locus is integrated randomly in the MEL genome, long uncleaved transcripts that have coverage past the annotated HBB locus read into random genomic regions and cause long stretches of mismatched softclipped bases.
- A custom script was used to filter polyA-containing reads but retain uncleaved transcripts (available on Github).
- Uncleaved reads with long stretches of soft-clipped bases that passed this filtering were then recoded to contain a match in the CIGAR string downstream of the PAS in order to include these regions of the long-reads in coverage calculations.
PRO-seq Data Preprocessing
- Cutadapt was used to trim paired-end reads to 40 nt, removing adapter sequence and low quality 3′ ends, and discarding reads that were shorter than 20 nt (-m20 -q 1).
- Trimmed paired-end reads were first mapped to the Drosophila dm3 reference genome using Bowtie, and subsequent uniquely mapped reads to the dm3 genome were used to determine percent spike-in return across all samples.
- Paired-end reads that failed to align to the dm3 genome were mapped to the mm10 reference genome.
- Due to the “forward/reverse” orientation of Illumina paired-end sequencing, “+” and “-“ stranded bedGraph files were switched at the end of the pipeline (Mahat et al., 2016).
- Since the spike-in return was comparable between biological replicates within a treatment type, and no comparisons were made between the two treatment conditions, no further normalizations were performed.
PRO-seq and total RNA-seq Data Analysis
- A list of active transcripts in MEL cells was first generated using PRO-seq signal within a 300 nt window around annotated TSSs in the GENCODE mm10 vM20 annotation.
- Additionally, if two intron annotations shared a 5′SS or 3′SS, the annotation with the most spliced reads was kept.
- Violin plots evaluating PRO-seq 3′ end or RNA-seq read coverage were generated by summing the signal at the indicated positions with respect to the 5′SS, 3′SS or PAS.
- P-values were calculated using either the Mann-Whitney or the Wilcoxon matched-pairs signed rank test Resulting reads were filtered to discard reads with an “N” size > 10,000 using pysam to remove poorly mapped reads or reads mapped across very large introns.
Splicing Status Classification and Co-transcriptional Splicing Efficiency (CoSE) Calculation
- The annotation of introns contained in active transcripts (described above for PRO-seq), was first filtered for unique intron start and end coordinates.
- If the junction was not present in the read, a 10 nt window was included in the search for the junction to allow for slight mismatches in alignments.
- If the junction was not found, the intron was classified as unspliced.
- To classify splicing status of each read, the number of spliced introns was compared to the total number of introns that was overlapped.
- Introns with identical 5′SS or 3′SS were filtered to keep only the intron with the most total reads.
Distance from Splice Junction to 3′ End Calculation
- Splicing intermediates (defined below), were filtered out from the long-read data in this analysis, since their 3′ ends do not represent the position of Pol II, but rather an upstream exon between step I and step II of splicing.
- For all remaining reads, data in were filtered for reads that contained at least 1 splice junction, and then the last “block size”, which represents the distance from the most distal splice junction to the 3′ end of the read, was calculated.
- Coordinates of the last spliced intron were also recorded, and each intron was matched to a transcript and categorized by gene biotype using mygene in python.
- To determine if certain genes exhibited a longer or shorter distance from 3′ end to Pol II, the distance was split into three equal size categories and transcript IDs from each category were entered into the online PANTHER classification system: no significant enrichment was obtained.
- Introns considered in this analysis were the same set of introns considered for CoSE as described above.
Long-read Coverage
- Transcript coordinates associated with active TSSs (as described above) were obtained from UCSC.
- Transcripts were then grouped by the parent Gene ID, and the largest range of start and end coordinates from the grouped transcripts was kept.
- Library depth was then calculated using bedtools coverage across this file of collapsed active gene coordinates.
- For coverage downstream of the PAS, long-reads were separated by splicing status (see below), then coverage was calucated using bedtools within a window around PASs that corresponded to active TSSs or specifically to a window around the HBB PAS.
- Coverage at all positions was normalized to the coverage at the position 100 nt upstream of the PAS.
Uncleaved Transcripts Analysis
- Bedtools intersect was used to identify long-reads with 5′ ends originating in a gene body of active transcripts (as described above).
- Reads were then categorized as being uncleaved transcripts if their 3′ ends were greater than 50 nt downstream of the PAS of the gene which the 5′ end overlapped with.
- Splicing status classification of uncleaved transcripts was carried out as described above.
- For long-reads derived from HBBIVS-110(G>A) cells, only reads that were spliced at intron 1 using the cryptic splice site were analyzed, and the rare reads with a splice junction using the canonical splice site were discarded.
- Splicing status classification, counting of splicing intermediates, and calculating coverage downstream of the PAS were performed as described above but with the custom HBB annotation coordinates.
QUANTIFICATION AND STATISTICAL ANALYSIS
- All information about statistical testing for individual experiments can be found in figure legends, including statistical tests used, number of replicates, and number of observations.
- Sample Sequencing Protocol Raw read number Mapped read number PolyA-filtered read number MEL_LRS_uninduced PacBio LRS 583,632 545,477 538,452 MEL_LRS_induced PacBio LRS.
- RT primer for targeted first strand synthesis barcode 1 AAGCAGTGGTATCAACGCAGAGTACCACATATCAGAGTGCGGAT RT-PCR primer F C1qbp GACGTGTGCTCTTCCGATCTCACAGATTCCCTGGACTGG.
KEY RESOURCES TABLE
- Antibodies Rabbit polyclonal anti-GAPDH Santa Cruz Biotechnology FL-335/sc-25778 Mouse monoclonal anti-Pol II Santa Cruz Biotechnology CTD4H8/sc-47701 Bacterial and Virus Strains Biological Samples Chemicals, Peptides, and Recombinant Proteins DMEM + GlutaMAX Gibco 10569-010 Fetal Bovine Serum (FBS) Gibco 16000-044 Penicillin Streptomycin Gibco 15140-122 α-Amanitin Sigma A2263 SUPERase.
- This paper N/A Recombinant DNA Software and Algorithms Porechop v0.2.4 N/A.
Did you find this useful? Give us your feedback
Citations
47 citations
20 citations
Cites background from "Rapid and Efficient Co-Transcriptio..."
...However, in mammals, the U2 and U1 snRNPs from adjacent introns are thought to interact across an exon in a process called exon definition [17], which has been proposed to control alternative splicing decisions [18], though some genome-wide studies suggest exon definition may only affect a subpopulation of introns [19,20]....
[...]
...com of the 30SS through interactions with the polypyrimidine tract (Figure 3h) and a recent genome-wide study implicates the polypyrimidine tract in controlling exon ligation of a subset of human introns [20]....
[...]
9 citations
5 citations
4 citations
References
47,038 citations
45,957 citations
30,684 citations
20,335 citations
20,255 citations
Related Papers (5)
Frequently Asked Questions (2)
Q2. What are the future works mentioned in the paper "Co-transcriptional splicing regulates 3′ end cleavage during mammalian erythropoiesis" ?
Future studies of these enigmatic new players may reveal a role for 3′SS diversity in the regulation of splicing by stalling between catalytic steps. Investigation of these mechanisms awaits future studies that would afford single transcript evaluation of the residence time of intron-bound inhibitory factors ( e. g. U1 snRNP ) coupled with splicing and cleavage outcome. Less efficient splicing can inhibit 3′ end cleavage ( Cooke et al., 1999 ; Davidson and West, 2013 ; Martins et al., 2011 ), suggesting that introns retained in transcripts that display readthrough harbor an inhibitory activity that represses 3′ end cleavage ( Figure 7E ). The authors speculate that this inhibitory activity persists longer on inefficiently spliced transcripts, potentially binding and inactivating 3′ end cleavage factors ( Deng et al., 2020 ; So et al., 2019 ).