scispace - formally typeset
Open AccessPosted ContentDOI

Rapid and Efficient Co-Transcriptional Splicing Enhances Mammalian Gene Expression

Reads0
Chats0
TLDR
A patient-derived mutation in β-globin that causes thalassemia improves splicing efficiency and proper termination, revealing co-transcriptionalsplicing efficiency is a determinant of productive gene output.
Abstract
Pre-mRNA splicing is tightly coordinated with transcription in yeasts, and introns can be removed soon after they emerge from RNA polymerase II (Pol II). To determine if splicing is similarly rapid and efficient in mammalian cells, we performed long read sequencing of nascent RNA during mouse erythropoiesis. Remarkably, 50% of splicing occurred while Pol II was within 150 nucleotides of 3′ splice sites. PRO-seq revealed that Pol II does not pause around splice sites, confirming that mammalian and yeast spliceosomes can act equally rapidly. Two exceptions were observed. First, several hundred introns displayed abundant splicing intermediates, suggesting that the spliceosome can stall after the first catalytic step. Second, some genes – notably globins – displayed poor splicing coupled to readthrough transcription. Remarkably, a patient-derived mutation in β-globin that causes thalassemia improves splicing efficiency and proper termination, revealing co-transcriptional splicing efficiency is a determinant of productive gene output.

read more

Content maybe subject to copyright    Report

1
Co-transcriptional splicing regulates 3 end cleavage during mammalian erythropoiesis
Kirsten A. Reimer
1
, Claudia Mimoso
2
, Karen Adelman
2
, and Karla M. Neugebauer
1*
1
Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, 06520, USA
2
Department of Biological Chemistry and Molecular Pharmacology, Blavatnik Institute, Harvard Medical
School, Boston, MA 02115, USA
*Correspondence: karla.neugebauer@yale.edu
ABSTRACT
Pre-mRNA processing steps are tightly coordinated with transcription in many organisms. To determine
how co-transcriptional splicing is integrated with transcription elongation and 3 end formation in
mammalian cells, we performed long-read sequencing of individual nascent RNAs and PRO-seq during
mouse erythropoiesis. Splicing was not accompanied by transcriptional pausing and was detected when
RNA polymerase II (Pol II) was within 75 300 nucleotides of 3 splice sites (3SSs), often during
transcription of the downstream exon. Interestingly, several hundred introns displayed abundant splicing
intermediates, suggesting that splicing delays can take place between the two catalytic steps. Overall,
splicing efficiencies were correlated among introns within the same transcript, and intron retention was
associated with inefficient 3 end cleavage. Remarkably, a thalassemia patient-derived mutation
introducing a cryptic 3SS improves both splicing and 3 end cleavage of individual β-globin transcripts,
demonstrating functional coupling between the two co-transcriptional processes as a determinant of
productive gene output.
Keywords: nascent RNA, erythropoiesis, globin, co-transcriptional splicing, PacBio, long read sequencing
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 14, 2020. ; https://doi.org/10.1101/2020.02.11.944595doi: bioRxiv preprint

2
INTRODUCTION
Transcription and pre-mRNA processing steps – 5 end capping, splicing, base modification, and 3 end
cleavage required for eukaryotic gene expression are each carried out by macromolecular machines.
The spliceosome assembles de novo on each intron, recognizing the 5 and 3 splice sites (SSs) that
demarcate intron boundaries and then catalyzing two transesterification reactions to excise introns and
ligate exons together (Wilkinson et al., 2020). In mammalian cells, genes typically encode pre-mRNAs
containing 8-10 introns of variable lengths, creating a high cellular demand for spliceosomes relative to
all of the other machineries, which only act once per transcript. Splicing is also a highly-regulated process;
it is influenced by environmental factors, developmental cues, and factors in the local pre-messenger
RNA (pre-mRNA) environment, such as RNA secondary structure and RNA-binding protein occupancy
(Baralle and Giudice, 2017; Jeong, 2017; Lin et al., 2016; Pai and Luca, 2019). The influence of trans-
acting factors on the selection of 5 and 3SSs is thought to explain how constitutive and alternative splice
sites are chosen. These working models still largely rely on in vitro biochemistry and often do not explain
changes in alternative splicing or overall gene expression observed upon experimental perturbation or
disease-associated mutations of splicing factors (Joshi et al., 2017; Manning and Cooper, 2017). Thus,
despite detailed knowledge of modulatory factors, the mechanisms underlying the gene regulatory
potential of pre-mRNA splicing are not fully understood in vivo.
Across species, tissues, and cell types, splicing occurs during pre-mRNA synthesis by Pol II (Custodio
and Carmo-Fonseca, 2016; Neugebauer, 2019). Thus, spliceosome assembly occurs as the nascent
RNA is growing longer and more diverse in sequence and structure. Spliceosomes may not assemble on
all of the introns at the same time, because promoter-proximal introns are synthesized before promoter-
distal introns. The questions of whether introns are spliced in the order they are transcribed and how
splicing of individual introns within a given transcript might be coordinated are currently the subject of
intense investigation. Co-transcriptional splicing also demands that the constellation of splicing factors
capable of regulating a splicing event bind the nascent RNA coordinately with the timing imposed by
transcription and in a relevant spatial window. For example, a splicing inhibitor element in a given nascent
RNA would only be influential if it were transcribed before the target intron was removed.
Recently, the Neugebauer
lab has used single-molecule sequencing approaches to determine how
splicing progresses as a function of transcription in budding and fission yeasts, where introns are
removed shortly after synthesis (Alpert et al., 2020; Carrillo Oesterreich et al., 2016; Herzel et al., 2018).
The approaches mark the nascent RNA’s 3 end, which is present in the catalytic center of Pol II, to
determine the position of Pol II when splicing occurs and define the sequence of the pre-mRNA substrate
acted on by the spliceosome. These data show that only a small portion of the downstream exon may be
needed for 3SS identification and splicing. Interestingly, altering the rate of Pol II elongation affects
splicing outcomes, including widespread changes in alternative splicing (Aslanzadeh et al., 2018; Braberg
et al., 2013; Carrillo Oesterreich et al., 2016; de la Mata et al., 2003; Fong et al., 2014; Ip et al., 2011;
Jonkers and Lis, 2015; Schor et al., 2013). Taken together, these findings suggest that transcription
elongation rate may govern the amount of downstream RNA available for cis regulation at the time that
splicing takes place. This in turn would determine which trans-acting regulatory factors could be recruited
to the nascent RNA to modulate splicing. To obtain mechanistic insights into these processes, we need
to understand how mammalian cells with many more introns per gene and vastly increased levels of
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 14, 2020. ; https://doi.org/10.1101/2020.02.11.944595doi: bioRxiv preprint

3
alternative splicing compared to yeast coordinate co-transcriptional splicing with transcription
elongation.
Another issue raised by co-transcriptional RNA processing is how splicing is coordinated with other pre-
mRNA processing steps (Bentley, 2014; Herzel et al., 2017). In recent long-read sequencing studies in
budding and fission yeasts (Alpert et al., 2020; Herzel et al., 2018), “all or none” splicing of individual
nascent transcripts was discovered, suggesting positive and negative cooperativity among neighboring
introns and polyA cleavage sites. Indeed, crosstalk among introns was observed in human cells at the
same time by others (Kim et al., 2017; Tilgner et al., 2018). However, those studies did not explore
coupling to 3 end formation. Cleavage of the nascent RNA by the cleavage and polyadenylation
machinery at polyA sites (PAS) releases the RNA from Pol II and the RNA is subsequently polyadenylated
(Kumar et al., 2019). Coupling between splicing and 3 end cleavage is important, because uncleaved
transcripts are degraded by the nuclear exosome in S. pombe (Herzel et al., 2018; Meola et al., 2016;
Zhou et al., 2015). Whether 3 end cleavage efficiency contributes to gene expression levels in
mammalian cells is currently unknown.
Here we report our analysis of nascent RNA transcription and splicing in murine erythroleukemia (MEL)
cells undergoing erythroid differentiation, a developmental program that exhibits well-known, drastic
changes in gene expression (An et al., 2014; Reimer and Neugebauer, 2018). We have employed two
single-molecule sequencing approaches to directly measure co-transcriptional splicing of nascent RNA:
(i) Long-read sequencing (LRS), which enables genome-wide analysis of splicing with respect to Pol II
position and (ii) Precision Run-On sequencing (PRO-seq), enabling the assessment of Pol II density at
these sites. We rigorously determine the spatial window in which co-transcriptional splicing occurs and
define co-transcriptional splicing efficiency for thousands of mouse introns, Pol II elongation behavior
across splice junctions, and the effects of efficient co-transcriptional splicing on 3 end cleavage. These
findings identify the pre-mRNA substrates of splicing and show that splicing of multiple introns within
individual transcripts is coordinated with 3 end cleavage. In particular, the demonstration of highly
efficient splicing in the absence of transcriptional pausing causes us to rethink key features of splicing
regulation in mammalian cells.
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 14, 2020. ; https://doi.org/10.1101/2020.02.11.944595doi: bioRxiv preprint

4
RESULTS
PacBio Long-read Sequencing of Nascent RNA Yields High Read Coverage
Murine erythroleukemia (MEL) cells are immortalized at the proerythroblast stage and can be induced to
enter terminal erythroid differentiation by treatment with 2% DMSO for five days (Antoniou, 1991).
Phenotypic changes include decreased cell volume, increased levels of β -globin, and visible
hemoglobinization (Figures S1A-C). We used chromatin purification of uninduced and induced MEL cells
to enrich for nascent RNA (Figure 1A). Chromatin purification under stringent washing conditions allows
release of contaminating RNAs and retains the stable ternary complex formed by elongating Pol II, DNA,
and nascent RNA (Figure S1D; (Wuarin and Schibler, 1994). Importantly, spliceosome assembly does
not continue during chromatin fractionation or RNA isolation, because the presence of the splicing
inhibitor Pladienolide B throughout the purification process does not change splicing levels (Figure S2).
To generate libraries for LRS, we established the protocol outlined in Figure 1A. Two biological
replicates, each with two technical replicates, were sequenced using PacBio RSII and Sequel flow cells,
yielding a total of 1,155,629 mappable reads (Table S1). Reads containing a non-templated polyA tail
comprised only 1.7% of the total reads (Table S1) and were removed bioinformatically along with
abundant 7SK RNA reads. Of the remaining reads, the average read length was 710 and 733 nucleotides
(nt), and the average coverage in reads per gene was 8.4 and 4.8 for uninduced and induced samples,
respectively (Figure 1B-C). More than 7,500 genes were represented by more than 10 reads per gene
in each condition (Figure 1C). Coverage of 5 ends was focused at annotated transcription start sites
(TSSs), with 18.3% of 5 ends within 50 bp of an active TSS across all samples. As expected, 3 end
coverage was distributed more evenly throughout gene bodies, with an increase just upstream of
annotated transcription end sites (TESs) and a drop after TESs (Figure S1E).
LRS Reveals Rapid and Efficient Co-transcriptional Splicing
Each long-read provides two critical pieces of information: the 3 end reveals the position of Pol II when
the RNA was isolated; the splice junctions reveal if splicing has occurred and which splice sites were
chosen. Here, we present our LRS data in a format that highlights 3 end position and the associated
splicing status (Figure 2A&B; Figure S3A). Each transcript was categorized and colored according to
its splicing status, which can be either “all spliced”, “partially spliced”, “all unspliced”, or “NA” (transcripts
that did not span an entire intron or a 3SS). For each gene, we calculated the fraction of long-reads that
were all spliced, partially spliced, or all unspliced (Figure 2A; bar plot far right), enabling a survey of
splicing behaviors within individual transcripts (Alpert et al., 2020; Herzel et al., 2018; Kim et al., 2017).
Splicing status of individual transcripts varied from gene to gene. For example, the gene Actb had mostly
all spliced reads (78% and 75% of reads in uninduced and induced cells respectively), while Calr and
Eif1 had a greater fraction of all unspliced reads (Figure 2B). Genome-wide, the majority of long-reads
were all spliced (Figure 2C; 68.0% and 73.8% for uninduced and induced cells, respectively), with an
average of 88% of all introns being spliced. Therefore, the majority of introns are removed co-
transcriptionally. To validate this finding, we examined the read length distribution for reads of each
splicing status (Figure S3B). As expected, partially spliced and all unspliced reads were longer than all
spliced reads due to the presence of introns, suggesting that the efficient shortening of nascent RNA due
to splicing limits the lengths of long-reads.
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 14, 2020. ; https://doi.org/10.1101/2020.02.11.944595doi: bioRxiv preprint

5
To quantify co-transcriptional splicing for each intron detected by at least 10 long-reads, we defined a
metric termed the Co-transcriptional Splicing Efficiency (CoSE), tabulated as the number of spliced reads
that span the intron divided by the total number of reads (spliced + unspliced) that span the intron (Figure
2D). A higher CoSE value indicates a higher fraction of co-transcriptional splicing. To validate this metric,
we analyzed an independently generated total RNA-seq dataset in uninduced MEL cells (downloaded
from ENCODE; (Davis et al., 2018)). Although nascent RNA is rare in total RNA, the density of reads
mapping to a given intron is expected to be inversely proportional to splicing efficiency. The ratio of intron-
mapping reads relative to the flanking exon-mapping reads was calculated for each intron and compared
to CoSE levels. As expected, higher CoSE corresponded to lower relative intron coverage in the total-
RNA seq data (Figure S3C). Thus, this independent data set validates the CoSE metric. CoSE values
also remained stable across all levels of read coverage (Figure S3D).
To determine if intron splicing events are coordinated within the same transcript, we asked how similar
CoSE values were between introns in the same transcript. To do so, transcripts containing at least 3
introns with recorded CoSE values (n = 2,028) were compiled. We found that the variance in CoSE
between introns within the same transcript was significantly smaller than the variance in CoSE for the
same number of randomly assorted introns (Figure 2E); these differences persisted when we analyzed
transcripts containing 3, 4, or 5 introns supported by long-reads (Figure S3E). Taken together, these
results suggest that most introns are well-spliced co-transcriptionally, and that splicing is coordinated in
mammalian multi-intron transcripts expressed by both uninduced and induced MEL cells.
The frequency of all-spliced nascent transcripts implies that splicing in mammalian cells is rapid enough
to match the rate of transcription. A direct way to address this is to measure the position of Pol II on
nascent RNA when ligated exons are observed. Observing Pol II downstream of a spliced junction
indicates that the active spliceosome has assembled and catalyzed splicing in the time it took for Pol II
to translocate the measured distance. Therefore, we determined the distance in nucleotides between the
3 end of each read and the nearest spliced exon-exon junction (Figure 3A). To eliminate 3 ends that
arise from splicing intermediates and not from active transcription, reads with 3 ends mapping precisely
to the last nt of exons were removed from this analysis. Although the longest distances between splice
junctions and elongating Pol II were just over 6 kb, these were rare. Instead, 75% of splice junctions were
within ~300 nt of a 3 end, and the median distance was 154 nt in uninduced cells and 128 nt in induced
cells (Figure 3B) Therefore, changes in the gene expression program during erythropoiesis did not alter
the dynamic relationship between transcription and splicing. Consistent with this, CoSE values were
similar when comparing induced to uninduced cells (Figure 3C; Spearman’s rho = 0.56). In fact, only 66
introns with improved splicing, and 42 introns with reduced splicing displayed > 2-fold change in CoSE
upon induction. Taken together, this analysis shows that although global changes in gene expression
take place between these two timepoints, the relationship between transcription and splicing remains the
same. Overall, these two measurements do not support major changes in splicing efficiency during
erythroid differentiation. Moreover, the distance from Pol II to the nearest splice junction was independent
of GO category or intron length (Figure S4B; GO analysis not shown). Because median exon size in the
mouse genome is 151 nt (Waterston et al., 2002), our data indicate that active spliceosomes can be fully
assembled and functional when Pol II is within or just downstream of the next transcribed exon. Recent
direct sequencing of nascent RNA seemed to reveal less rapid splicing (Drexler et al., 2020). However,
when we analyzed this dataset in the same manner as our own, the cumulative distance from Pol II to
the nearest splice junction is similarly close across organisms and cell types (median distance in human
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted December 14, 2020. ; https://doi.org/10.1101/2020.02.11.944595doi: bioRxiv preprint

Figures
Citations
More filters
Journal ArticleDOI

Anything but Ordinary - Emerging Splicing Mechanisms in Eukaryotic Gene Regulation.

TL;DR: The current principles of splicing regulation are summarized, including the impact of cis and trans regulatory elements, as well as the influence of chromatin structure, transcription, and RNA modifications.
Journal ArticleDOI

Cryo-EM snapshots of the human spliceosome reveal structural adaptions for splicing regulation.

TL;DR: The emerging molecular picture highlights how, compared to its yeast counterpart, the human spliceosome has coopted additional protein factors to allow increased plasticity of splice site recognition and remodeling, and potentially to regulate alternative splicing.
Journal ArticleDOI

Elements at the 5' end of Xist harbor SPEN-independent transcriptional antiterminator activity.

TL;DR: Xist requires sequence elements beyond its first two kilobases to robustly silence transcription, and the 5′ end of Xist harbors SPEN-independent transcriptional antiterminator activity that can repress proximal cleavage and polyadenylation.
Posted ContentDOI

Two-pass alignment using machine-learning-filtered splice junctions increases the accuracy of intron detection in long-read RNA sequencing

TL;DR: A two-pass approach, combining alignment metrics and machine-learning-derived sequence information to filter spurious examples from splice junctions identified in long-read alignments, improves the accuracy of spliced alignment and transcriptome annotation without requiring orthogonal information from short read RNAseq or existing annotations.
Posted ContentDOI

Transcript-specific determinants of pre-mRNA splicing revealed through in vivo kinetic analyses of the 1st and 2nd chemical steps

TL;DR: It is shown that ribosomal protein genes (RPGs) are spliced faster than non-RPGs at each step, and that RPGs share evolutionarily conserved cis-features which facilitate their splicing.
References
More filters
Journal ArticleDOI

A Complex of U1 snRNP with Cleavage and Polyadenylation Factors Controls Telescripting, Regulating mRNA Transcription in Human Cells

TL;DR: This work captured a complex, comprising U1 and CPA factors (U1-CPAFs), that binds intronic PASs and suppresses PCPA, demonstrating U1's unique role as central regulator of pre-mRNA processing and transcription.
Journal ArticleDOI

Nascent RNA and the Coordination of Splicing with Transcription.

TL;DR: Three major methods that enable us to track the conversion of precursor messenger RNA (pre-mRNA) to messengerRNA (m RNA) products in vivo are discussed: live-cell imaging, metabolic labeling of RNA, and RNA-seq of purified nascent RNA.
Journal ArticleDOI

Mechanistic insights into mRNA 3'-end processing

TL;DR: In this paper, the authors describe new molecular insights into pre-mRNA recognition, cleavage, and polyadenylation in the 3'-UTR of eukaryotic mRNAs.
Journal ArticleDOI

The kinetics of pre-mRNA splicing in the Drosophila genome and the influence of gene architecture.

TL;DR: The data suggest that developmental and stress response genes may have preferentially evolved exon definition in order to enhance the rate or accuracy of splicing, and multiple gene level variables associated with splicing rate are identified.
Journal ArticleDOI

Microfluidic isoform sequencing shows widespread splicing coordination in the human transcriptome.

TL;DR: A new droplet-based method, sparse isoform sequencing (spISO-seq), sequences 100k-200k partitions of 10-200 molecules at a time, enabling analysis of 10 to 100 million RNA molecules, providing a more comprehensive understanding of the human transcriptome and a general, cost-effective method to analyze it.
Related Papers (5)
Frequently Asked Questions (2)
Q1. What are the contributions in "Co-transcriptional splicing regulates 3′ end cleavage during mammalian erythropoiesis" ?

To determine how co-transcriptional splicing is integrated with transcription elongation and 3′ end formation in mammalian cells, the authors performed long-read sequencing of individual nascent RNAs and PRO-seq during mouse erythropoiesis. Interestingly, several hundred introns displayed abundant splicing intermediates, suggesting that splicing delays can take place between the two catalytic steps. 

Future studies of these enigmatic new players may reveal a role for 3′SS diversity in the regulation of splicing by stalling between catalytic steps. Investigation of these mechanisms awaits future studies that would afford single transcript evaluation of the residence time of intron-bound inhibitory factors ( e. g. U1 snRNP ) coupled with splicing and cleavage outcome. Less efficient splicing can inhibit 3′ end cleavage ( Cooke et al., 1999 ; Davidson and West, 2013 ; Martins et al., 2011 ), suggesting that introns retained in transcripts that display readthrough harbor an inhibitory activity that represses 3′ end cleavage ( Figure 7E ). The authors speculate that this inhibitory activity persists longer on inefficiently spliced transcripts, potentially binding and inactivating 3′ end cleavage factors ( Deng et al., 2020 ; So et al., 2019 ).