Long read sequencing reveals novel isoforms and insights into splicing regulation during cell state changes
Abstract: Alternative splicing (AS) is a key mechanism underlying cellular differentiation and a driver of complexity in mammalian neuronal tissues. However, understanding of which isoforms are differentially used or expressed and how this affects cellular differentiation remains unclear. Long read sequencing allows full-length transcript recovery and quantification, enabling transcript-level analysis of AS processes and how these change with cell state. Here, we utilise Oxford Nanopore Technologies sequencing to produce a custom annotation of a well-studied human neuroblastoma cell line and to characterise isoform expression and usage across differentiation. We identify many previously unannotated features, including a novel transcript of the voltage-gated calcium channel subunit gene, CACNA2D2. We show differential expression and usage of transcripts during differentiation, and identify a putative molecular regulator underlying this state change. Our work highlights the potential of long read sequencing to uncover previously unknown transcript diversity and mechanisms influencing alternative splicing.
Summary (2 min read)
- The complex suite of processes that occur during transcription gives rise to a staggering diversity of protein structures, molecular interactions and cell fates.
- The authors identified novel transcriptomic features and performed differential expression and usage analyses to identify transcripts that show variation during differentiation, as well as identifying a novel putative molecular regulator underlying this state change.
RESULTS AND DISCUSSION
- ONT reads accurately detect differential isoform expression Using the Oxford Nanopore GridION platform, the authors generated on average 10,691,538 QC-passed reads per sample (± 1,751,518.6 SD).
- It is therefore important to assess the performance of long read vs short read sequencing in both transcript quantification and its application to differential expression studies (Sessegolo et al. 2019).
- This suggests that RBM5 may play a role in splicing regulation during differentiation of SH-SY5Y cells.
Sampling and Sequencing
- Cell culture and neuronal differentiation A total of 10 technical replicates of human neuroblastoma SH-SY5Y cells were cultured in neurobasal media (Gibco 21103-049) supplemented with B-27 Plus .
- Retinoic acid was added to five replicates to a final concentration of 10mM, to induce cell differentiation to a neuronlike state; whilst five replicates were cultured to confluence in standard media.
- Cells were washed with phosphate buffered saline and harvested in QIAzol to preserve RNA, before being stored at - 80°C until RNA extraction.
RNA extraction and spike-in control
- Total RNA was purified from the 10 replicate cell cultures using a Direct-zol RNA Miniprep Plus kit (Zymo Research), according to the manufacturer’s instructions.
- The whole second-strand reaction was then mixed and incubated at 42°C for 90 minutes.
- The cDNA was quantified using High Sensitivity Qubit assays (ThermoFisher, Q32854) and sized using the 2100 Bioanalyzer instrument (Agilent Technologies, cat. no. G2939BA) High Sensitivity DNA assay (Agilent, 5067-4626).
- The TALON custom gtf contains only features detected with reads present in the dataset, so a complete custom transcriptome annotation was compiled by merging the reference and TALON gtfs.
Differential expression analyses
- Sequin spike-in detection & ONT DE sensitivity Sensitivity in detecting isoform DE using ONT was assessed by a) finding the threshold of detection for each Sequin mix, b) comparing observed vs expected logFC and c) comparing with short read data.
- This was also performed for both the full short read data and a version downsampled to equivalent ONT average nucleotide coverage using bedtools.
- The authors then utilised a standard differential expression pipeline (detailed below).
- The differential expression regression model was specified by splitting the data into Sequin MixA and MixB accordingly.
- Transcript-level counts were then obtained by importing Salmon results with the EdgeR function catchSalmon, using the bootstrap replicates to calculate and apply an overdispersion correction for each count.
Differential usage analyses
- Differential transcript usage (DTU) was assessed using the R package IsoformSwitchAnalyzeR v.1.11.3 (Vitting-Seerup & Sandelin 2019) on the same transcript quantification input used for DTE and DGE.
- TPM abundances were imported using the scaledTPM function in tximport and imported into IsoformSwitchAnalyzeR.
- The DTU analysis was run in two parts; first non-expressed isoforms were removed, and switches calculated for each gene using DEXseq (Anders et al. 2012) and nucleotide and peptide outputs for each gene were created for protein assessment.
- Transcripts were assessed for coding potential with CPAT (Wang et al. 2013), protein domain assignment with PFam (Punta et al. 2012), signal peptide prediction with SingalP v.5.0 (Armenteros et al. 2019) and intrinsically disordered regions and binding regions with IUPred2A (Mészáros et al. 2018), using default parameters according to the IsoformSwitchAnalyzeR workflow.
- The second part of the IsoformSwitchAnalyzeR DTU analysis then leveraged these data to identify isoforms switches with potential functional consequences and provide visualisation using default functions.
Hypergeometric enrichment tests
- The set of putative functionally consequential DTUs was checked for RBPs by intersection with a set of known RBPs assayed as part of the ENCODE project (Van Nostrand et al. 2020), revealing the presence of RBM5.
- Corresponding narrow-peak eCLIP bed data for RBM5 were accessed using the ENCODE portal (Davis et al. 2018) for both HepG2 isogenic replicates in the ENCODE repository (RBM5 accessions ENCFF176RGG and ENCFF998ACW downloaded 29/09/2020) and intersected using bedtools to find the most supported subset of binding targets.
- The intersection for both significant DTUs (N=104) and for total genes assessed (N=32325) with the eCLIP binding targets were then obtained by intersection with this subsetted list of ENCODE targets.
- A hypergeometric test for enrichment was performed using the phyper functionality in the R core package ‘stats’ v.4.0.2 (R Core Team 2016).
Ontology and functional association
- To interpret the differentially expressed or used gene sets, the authors assessed gene ontology and known associations with neurologically relevant biology.
- The authors used the GENE2FUNC function in FUMA (Watanabe et al. 2017) to annotate the gene sets within a biological context.
- For transcripts, the corresponding Ensembl gene ID was used.
- In each case, the default thresholds of significance and ontology enrichment were applied.
- Analyses focused on tissue specificity analyses in GTEx v.8 30 tissue types and Gene Ontogeny (GO) Biological Processes.
Did you find this useful? Give us your feedback
Related Papers (5)
Frequently Asked Questions (2)
Q1. What contributions have the authors mentioned in the paper "Long read sequencing reveals novel isoforms and insights into splicing regulation during cell state changes" ?
Here, the authors utilise Oxford Nanopore Technologies sequencing to produce a custom annotation of a well-studied human neuroblastoma cell line and to characterise isoform expression and usage across differentiation. The authors show differential expression and usage of transcripts during differentiation, and identify a putative molecular regulator underlying this state change. Alternative splicing ( AS ) allows different ( which was not certified by peer review ) is the author/funder. Their work highlights the potential of long read sequencing to uncover previously unknown transcript diversity and mechanisms influencing alternative splicing.
Q2. What are the future works mentioned in the paper "Long read sequencing reveals novel isoforms and insights into splicing regulation during cell state changes" ?
Future work needs to further investigate how changes in RBM5 regulation impacts isoform expression through experimental confirmation of RBM5 binding targets in SH-SY5Y cells, and mutagenesis of RBM5 and its binding sites. Finally, their findings indicate that changes in RBM5 expression profiles may act as a molecular mechanism for the coordination of these changes, paving the way for future functional studies.