scispace - formally typeset
Search or ask a question

Showing papers on "Exon published in 2014"


Journal ArticleDOI
01 Jan 2014-Nature
TL;DR: In this paper, the authors report molecular profiling of 230 resected lung adnocarcinomas using messenger RNA, microRNA and DNA sequencing integrated with copy number, methylation and proteomic analyses.
Abstract: Adenocarcinoma of the lung is the leading cause of cancer death worldwide. Here we report molecular profiling of 230 resected lung adenocarcinomas using messenger RNA, microRNA and DNA sequencing integrated with copy number, methylation and proteomic analyses. High rates of somatic mutation were seen (mean 8.9 mutations per megabase). Eighteen genes were statistically significantly mutated, including RIT1 activating mutations and newly described loss-of-function MGA mutations which are mutually exclusive with focal MYC amplification. EGFR mutations were more frequent in female patients, whereas mutations in RBM10 were more common in males. Aberrations in NF1, MET, ERBB2 and RIT1 occurred in 13% of cases and were enriched in samples otherwise lacking an activated oncogene, suggesting a driver role for these events in certain tumours. DNA and mRNA sequence from the same tumour highlighted splicing alterations driven by somatic genomic changes, including exon 14 skipping in MET mRNA in 4% of cases. MAPK and PI(3)K pathway activity, when measured at the protein level, was explained by known mutations in only a fraction of cases, suggesting additional, unexplained mechanisms of pathway activation. These data establish a foundation for classification and further investigations of lung adenocarcinoma molecular pathogenesis.

4,104 citations


01 Jul 2014
TL;DR: High rates of somatic mutation were seen, including RIT1 activating mutations and newly described loss-of-function MGA mutations which are mutually exclusive with focal MYC amplification, and MAPK and PI(3)K pathway activity was explained by known mutations in only a fraction of cases, suggesting additional, unexplained mechanisms of pathway activation.
Abstract: Adenocarcinoma of the lung is the leading cause of cancer death worldwide. Here we report molecular profiling of 230 resected lung adenocarcinomas using messenger RNA, microRNA and DNA sequencing integrated with copy number, methylation and proteomic analyses. High rates of somatic mutation were seen(mean 8.9 mutations per megabase). Eighteen genes were statistically significantly mutated, including RIT1 activating mutations and newly described loss-of-function MGA mutations which are mutually exclusive with focal MYC amplification. EGFR mutations were more frequent in female patients, whereas mutations in RBM10 were more common in males. Aberrations in NF1, MET, ERBB2 and RIT1 occurred in 13% of cases and were enriched in samples otherwise lacking an activated oncogene, suggesting a driver role for these events in certain tumours. DNA and mRNA sequence from the same tumour highlighted splicing alterations driven by somatic genomic changes, including exon 14 skipping in MET mRNA in 4% of cases. MAPK and PI(3)K pathway activity, when measured at the protein level, was explained by known mutations in only a fraction of cases, suggesting additional, unexplained mechanisms of pathway activation. These data establish a foundation for classification and further investigations of lung adenocarcinoma molecular pathogenesis.

2,847 citations


Journal ArticleDOI
TL;DR: Evidence that animal circRNAs are generated cotranscriptionally and that their production rate is mainly determined by intronic sequences is provided and it is demonstrated that circularization and splicing compete against each other.

2,225 citations


Journal ArticleDOI
25 Sep 2014-Cell
TL;DR: It is demonstrated that exon circularization is dependent on flanking intronic complementary sequences in human introns and that alternative formation of inverted repeated Alu pairs can lead to alternative circularization, resulting in multiple circular RNA transcripts produced from a single gene.

1,451 citations


Journal ArticleDOI
TL;DR: It is concluded that SAD1 dynamically controls splicing efficiency and splice-site recognition in Arabidopsis, and it is proposed that this may contribute to S AD1-mediated stress tolerance through the metabolism of transcripts expressed from stress-responsive genes.
Abstract: Sm-like proteins are highly conserved proteins that form the core of the U6 ribonucleoprotein and function in several mRNA metabolism processes, including pre-mRNA splicing. Despite their wide occurrence in all eukaryotes, little is known about the roles of Sm-like proteins in the regulation of splicing. Here, through comprehensive transcriptome analyses, we demonstrate that depletion of the Arabidopsis supersensitive to abscisic acid and drought 1 gene (SAD1), which encodes Sm-like protein 5 (LSm5), promotes an inaccurate selection of splice sites that leads to a genome-wide increase in alternative splicing. In contrast, overexpression of SAD1 strengthens the precision of splice-site recognition and globally inhibits alternative splicing. Further, SAD1 modulates the splicing of stress-responsive genes, particularly under salt-stress conditions. Finally, we find that overexpression of SAD1 in Arabidopsis improves salt tolerance in transgenic plants, which correlates with an increase in splicing accuracy and efficiency for stress-responsive genes. We conclude that SAD1 dynamically controls splicing efficiency and splice-site recognition in Arabidopsis, and propose that this may contribute to SAD1-mediated stress tolerance through the metabolism of transcripts expressed from stress-responsive genes. Our study not only provides novel insights into the function of Sm-like proteins in splicing, but also uncovers new means to improve splicing efficiency and to enhance stress tolerance in a higher eukaryote.

1,160 citations


Journal ArticleDOI
TL;DR: Some of the emerging rules that govern the highly context-dependent and combinatorial nature of alternative splicing regulation are described.
Abstract: Sequence-specific RNA-binding proteins (RBPs) bind to pre-mRNA to control alternative splicing, but it is not yet possible to read the 'splicing code' that dictates splicing regulation on the basis of genome sequence. Each alternative splicing event is controlled by multiple RBPs, the combined action of which creates a distribution of alternatively spliced products in a given cell type. As each cell type expresses a distinct array of RBPs, the interpretation of regulatory information on a given RNA target is exceedingly dependent on the cell type. RBPs also control each other's functions at many levels, including by mutual modulation of their binding activities on specific regulatory RNA elements. In this Review, we describe some of the emerging rules that govern the highly context-dependent and combinatorial nature of alternative splicing regulation.

820 citations


Journal ArticleDOI
TL;DR: Findings provide compelling evidence that FTO-dependent m6A demethylation functions as a novel regulatory mechanism of RNA processing and plays a critical role in the regulation of adipogenesis.
Abstract: The role of Fat Mass and Obesity-associated protein (FTO) and its substrate N6-methyladenosine (m6A) in mRNA processing and adipogenesis remains largely unknown. We show that FTO expression and m6A levels are inversely correlated during adipogenesis. FTO depletion blocks differentiation and only catalytically active FTO restores adipogenesis. Transcriptome analyses in combination with m6A-seq revealed that gene expression and mRNA splicing of grouped genes are regulated by FTO. M6A is enriched in exonic regions flanking 5′- and 3′-splice sites, spatially overlapping with mRNA splicing regulatory serine/arginine-rich (SR) protein exonic splicing enhancer binding regions. Enhanced levels of m6A in response to FTO depletion promotes the RNA binding ability of SRSF2 protein, leading to increased inclusion of target exons. FTO controls exonic splicing of adipogenic regulatory factor RUNX1T1 by regulating m6A levels around splice sites and thereby modulates differentiation. These findings provide compelling evidence that FTO-dependent m6A demethylation functions as a novel regulatory mechanism of RNA processing and plays a critical role in the regulation of adipogenesis.

805 citations


Journal ArticleDOI
TL;DR: Detailed and generalizable models that explain how the splicing machinery determines whether to produce a circular noncoding RNA or a linear mRNA are suggested.
Abstract: Recent deep sequencing studies have revealed thousands of circular noncoding RNAs generated from protein-coding genes. These RNAs are produced when the precursor messenger RNA (pre-mRNA) splicing machinery "backsplices" and covalently joins, for example, the two ends of a single exon. However, the mechanism by which the spliceosome selects only certain exons to circularize is largely unknown. Using extensive mutagenesis of expression plasmids, we show that miniature introns containing the splice sites along with short (∼ 30- to 40-nucleotide) inverted repeats, such as Alu elements, are sufficient to allow the intervening exons to circularize in cells. The intronic repeats must base-pair to one another, thereby bringing the splice sites into close proximity to each other. More than simple thermodynamics is clearly at play, however, as not all repeats support circularization, and increasing the stability of the hairpin between the repeats can sometimes inhibit circular RNA biogenesis. The intronic repeats and exonic sequences must collaborate with one another, and a functional 3' end processing signal is required, suggesting that circularization may occur post-transcriptionally. These results suggest detailed and generalizable models that explain how the splicing machinery determines whether to produce a circular noncoding RNA or a linear mRNA.

729 citations


Journal ArticleDOI
07 Mar 2014-PLOS ONE
TL;DR: It is reported that circular RNA isoforms are found in diverse species whose most recent common ancestor existed more than one billion years ago: fungi, plants, a plant, and protists, including S. pombe, which may be an ancient, conserved feature of eukaryotic gene expression programs.
Abstract: An unexpectedly large fraction of genes in metazoans (human, mouse, zebrafish, worm, fruit fly) express high levels of circularized RNAs containing canonical exons. Here we report that circular RNA isoforms are found in diverse species whose most recent common ancestor existed more than one billion years ago: fungi (Schizosaccharomyces pombe and Saccharomyces cerevisiae), a plant (Arabidopsis thaliana), and protists (Plasmodium falciparum and Dictyostelium discoideum). For all species studied to date, including those in this report, only a small fraction of the theoretically possible circular RNA isoforms from a given gene are actually observed. Unlike metazoans, Arabidopsis, D. discoideum, P. falciparum, S. cerevisiae, and S. pombe have very short introns (∼100 nucleotides or shorter), yet they still produce circular RNAs. A minority of genes in S. pombe and P. falciparum have documented examples of canonical alternative splicing, making it unlikely that all circular RNAs are by-products of alternative splicing or ‘piggyback’ on signals used in alternative RNA processing. In S. pombe, the relative abundance of circular to linear transcript isoforms changed in a gene-specific pattern during nitrogen starvation. Circular RNA may be an ancient, conserved feature of eukaryotic gene expression programs.

595 citations


Journal ArticleDOI
30 Jan 2014-Nature
TL;DR: The initial landscape and variation of RNA secondary structures (RSSs) in a human family trio (mother, father and their child) is reported, which provides a comprehensive RSS map of human coding and non-coding RNAs.
Abstract: In parallel to the genetic code for protein synthesis, a second layer of information is embedded in all RNA transcripts in the form of RNA structure. RNA structure influences practically every step in the gene expression program. However, the nature of most RNA structures or effects of sequence variation on structure are not known. Here we report the initial landscape and variation of RNA secondary structures (RSSs) in a human family trio (mother, father and their child). This provides a comprehensive RSS map of human coding and non-coding RNAs. We identify unique RSS signatures that demarcate open reading frames and splicing junctions, and define authentic microRNA-binding sites. Comparison of native deproteinized RNA isolated from cells versus refolded purified RNA suggests that the majority of the RSS information is encoded within RNA sequence. Over 1,900 transcribed single nucleotide variants (approximately 15% of all transcribed single nucleotide variants) alter local RNA structure. We discover simple sequence and spacing rules that determine the ability of point mutations to impact RSSs. Selective depletion of 'riboSNitches' versus structurally synonymous variants at precise locations suggests selection for specific RNA shapes at thousands of sites, including 3' untranslated regions, binding sites of microRNAs and RNA-binding proteins genome-wide. These results highlight the potentially broad contribution of RNA structure and its variation to gene regulation.

512 citations


Journal ArticleDOI
TL;DR: The data show that the path from genetic variation (SNP) to gene expression is more complex than hitherto often assumed, and that genetic variation can also influence function of a gene by influencing exon usage or splice isoforms (sQTL), allelic imbalance, RNA editing, and expression of noncoding RNAs.
Abstract: Genetic variation can modulate gene expression, and thereby phenotypic variation and susceptibility to complex diseases such as type 2 diabetes (T2D). Here we harnessed the potential of DNA and RNA sequencing in human pancreatic islets from 89 deceased donors to identify genes of potential importance in the pathogenesis of T2D. We present a catalog of genetic variants regulating gene expression (eQTL) and exon use (sQTL), including many long noncoding RNAs, which are enriched in known T2D-associated loci. Of 35 eQTL genes, whose expression differed between normoglycemic and hyperglycemic individuals, siRNA of tetraspanin 33 (TSPAN33), 5′-nucleotidase, ecto (NT5E), transmembrane emp24 protein transport domain containing 6 (TMED6), and p21 protein activated kinase 7 (PAK7) in INS1 cells resulted in reduced glucose-stimulated insulin secretion. In addition, we provide a genome-wide catalog of allelic expression imbalance, which is also enriched in known T2D-associated loci. Notably, allelic imbalance in paternally expressed gene 3 (PEG3) was associated with its promoter methylation and T2D status. Finally, RNA editing events were less common in islets than previously suggested in other tissues. Taken together, this study provides new insights into the complexity of gene regulation in human pancreatic islets and better understanding of how genetic variation can influence glucose metabolism.

Journal ArticleDOI
TL;DR: It is found that integrating a number of computational methods to detect genes with differentially retained introns provides a strategy to enrich for alternatively spliced exons in mammalian RNA-seq data, when complemented by RNA- sequencing analysis of purified cells with experimentally perturbed RNA-binding proteins.
Abstract: Retention of a subset of introns in spliced polyadenylated mRNA is emerging as a frequent, unexplained finding from RNA deep sequencing in mammalian cells. Here we analyze intron retention in T lymphocytes by deep sequencing polyadenylated RNA. We show a developmentally regulated RNA-binding protein, hnRNPLL, induces retention of specific introns by sequencing RNA from T cells with an inactivating Hnrpll mutation and from B lymphocytes that physiologically downregulate Hnrpll during their differentiation. In Ptprc mRNA encoding the tyrosine phosphatase CD45, hnRNPLL induces selective retention of introns flanking exons 4 to 6; these correspond to the cassette exons containing hnRNPLL binding sites that are skipped in cells with normal, but not mutant or low, hnRNPLL. We identify similar patterns of hnRNPLL-induced differential intron retention flanking alternative exons in 14 other genes, representing novel elements of the hnRNPLL-induced splicing program in T cells. Retroviral expression of a normally spliced cDNA for one of these targets, Senp2, partially corrects the survival defect of Hnrpll-mutant T cells. We find that integrating a number of computational methods to detect genes with differentially retained introns provides a strategy to enrich for alternatively spliced exons in mammalian RNA-seq data, when complemented by RNA-seq analysis of purified cells with experimentally perturbed RNA-binding proteins. Our findings demonstrate that intron retention in mRNA is induced by specific RNA-binding proteins and suggest a biological significance for this process in marking exons that are poised for alternative splicing.

Journal ArticleDOI
04 Sep 2014-Nature
TL;DR: Measurement of the functional consequences of large numbers of mutations with saturation genome editing will potentially facilitate high-resolution functional dissection of both cis-regulatory elements and trans-acting factors, as well as the interpretation of variants of uncertain significance observed in clinical sequencing.
Abstract: Saturation mutagenesis--coupled to an appropriate biological assay--represents a fundamental means of achieving a high-resolution understanding of regulatory and protein-coding nucleic acid sequences of interest. However, mutagenized sequences introduced in trans on episomes or via random or "safe-harbour" integration fail to capture the native context of the endogenous chromosomal locus. This shortcoming markedly limits the interpretability of the resulting measurements of mutational impact. Here, we couple CRISPR/Cas9 RNA-guided cleavage with multiplex homology-directed repair using a complex library of donor templates to demonstrate saturation editing of genomic regions. In exon 18 of BRCA1, we replace a six-base-pair (bp) genomic region with all possible hexamers, or the full exon with all possible single nucleotide variants (SNVs), and measure strong effects on transcript abundance attributable to nonsense-mediated decay and exonic splicing elements. We similarly perform saturation genome editing of a well-conserved coding region of an essential gene, DBR1, and measure relative effects on growth that correlate with functional impact. Measurement of the functional consequences of large numbers of mutations with saturation genome editing will potentially facilitate high-resolution functional dissection of both cis-regulatory elements and trans-acting factors, as well as the interpretation of variants of uncertain significance observed in clinical sequencing.

Journal ArticleDOI
25 Sep 2014-Cell
TL;DR: It is demonstrated that exon circularization and linear splicing compete with each other in a tissue-specific fashion, and several types of circular RNA transcripts can be produced from a single gene.

Journal ArticleDOI
TL;DR: The data suggest that thousands of neurexin isoforms are physiologically generated, consistent with the notion that α-neurexins represent transsynaptic protein-interaction scaffolds that mediate diverse functions and are regulated by alternative splicing at multiple independent sites.
Abstract: Neurexins are evolutionarily conserved presynaptic cell-adhesion molecules that are essential for normal synapse formation and synaptic transmission. Indirect evidence has indicated that extensive alternative splicing of neurexin mRNAs may produce hundreds if not thousands of neurexin isoforms, but no direct evidence for such diversity has been available. Here we use unbiased long-read sequencing of full-length neurexin (Nrxn)1α, Nrxn1β, Nrxn2β, Nrxn3α, and Nrxn3β mRNAs to systematically assess how many sites of alternative splicing are used in neurexins with a significant frequency, and whether alternative splicing events at these sites are independent of each other. In sequencing more than 25,000 full-length mRNAs, we identified a novel, abundantly used alternatively spliced exon of Nrxn1α and Nrxn3α (referred to as alternatively spliced sequence 6) that encodes a 9-residue insertion in the flexible hinge region between the fifth LNS (laminin-α, neurexin, sex hormone-binding globulin) domain and the third EGF-like sequence. In addition, we observed several larger-scale events of alternative splicing that deleted multiple domains and were much less frequent than the canonical six sites of alternative splicing in neurexins. All of the six canonical events of alternative splicing appear to be independent of each other, suggesting that neurexins may exhibit an even larger isoform diversity than previously envisioned and comprise thousands of variants. Our data are consistent with the notion that α-neurexins represent extracellular protein-interaction scaffolds in which different LNS and EGF domains mediate distinct interactions that affect diverse functions and are independently regulated by independent events of alternative splicing.

Journal ArticleDOI
TL;DR: It is suggested that an optimal rate of transcriptional elongation is required for normal cotranscriptional pre-mRNA splicing.
Abstract: Alternative splicing modulates expression of most human genes. The kinetic model of cotranscriptional splicing suggests that slow elongation expands and that fast elongation compresses the “window of opportunity” for recognition of upstream splice sites, thereby increasing or decreasing inclusion of alternative exons. We tested the model using RNA polymerase II mutants that change average elongation rates genome-wide. Slow and fast elongation affected constitutive and alternative splicing, frequently altering exon inclusion and intron retention in ways not predicted by the model. Cassette exons included by slow and excluded by fast elongation (type I) have weaker splice sites, shorter flanking introns, and distinct sequence motifs relative to “slow-excluded” and “fast-included” exons (type II). Many rate-sensitive exons are misspliced in tumors. Unexpectedly, slow and fast elongation often both increased or both decreased inclusion of a particular exon or retained intron. These results suggest that an optimal rate of transcriptional elongation is required for normal cotranscriptional pre-mRNA splicing.


Journal ArticleDOI
TL;DR: This work sequenced the lymphoblastoid transcriptomes of three family members by using a Pacific Biosciences long-read approach complemented with Illumina 101-bp sequencing and found that reads representing all splice sites of a transcript are evident for most sufficiently expressed genes ≤3 kb and often for genes longer than that.
Abstract: Personal transcriptomes in which all of an individual’s genetic variants (e.g., single nucleotide variants) and transcript isoforms (transcription start sites, splice sites, and polyA sites) are defined and quantified for full-length transcripts are expected to be important for understanding individual biology and disease, but have not been described previously. To obtain such transcriptomes, we sequenced the lymphoblastoid transcriptomes of three family members (GM12878 and the parents GM12891 and GM12892) by using a Pacific Biosciences long-read approach complemented with Illumina 101-bp sequencing and made the following observations. First, we found that reads representing all splice sites of a transcript are evident for most sufficiently expressed genes ≤3 kb and often for genes longer than that. Second, we added and quantified previously unidentified splicing isoforms to an existing annotation, thus creating the first personalized annotation to our knowledge. Third, we determined SNVs in a de novo manner and connected them to RNA haplotypes, including HLA haplotypes, thereby assigning single full-length RNA molecules to their transcribed allele, and demonstrated Mendelian inheritance of RNA molecules. Fourth, we show how RNA molecules can be linked to personal variants on a one-by-one basis, which allows us to assess differential allelic expression (DAE) and differential allelic isoforms (DAI) from the phased full-length isoform reads. The DAI method is largely independent of the distance between exon and SNV—in contrast to fragmentation-based methods. Overall, in addition to improving eukaryotic transcriptome annotation, these results describe, to our knowledge, the first large-scale and full-length personal transcriptome.

Journal ArticleDOI
TL;DR: This study reveals that a large number of genes are alternatively spliced in the soybean genome and that variations in gene structure, genomic environment, and gene transcriptional level may play important roles in regulating alternative splicing.
Abstract: Alternative splicing (AS) is common in higher eukaryotes and plays an important role in gene posttranscriptional regulation. It has been suggested that AS varies dramatically among species, tissues, and duplicated gene families of different sizes. However, the genomic forces that govern AS variation remain poorly understood. Here, through genome-wide identification of AS events in the soybean (Glycine max) genome using high-throughput RNA sequencing of 28 samples from different developmental stages, we found that more than 63% of multiexonic genes underwent AS. More AS events occurred in the younger developmental stages than in the older developmental stages for the same type of tissue, and the four main AS types, exon skipping, intron retention, alternative donor sites, and alternative acceptor sites, exhibited different characteristics. Global computational analysis demonstrated that the variations of AS frequency and AS types were significantly correlated with the changes of gene features and gene transcriptional level. Further investigation suggested that the decrease of AS within the genome-wide duplicated genes were due to the diminution of intron length, exon number, and transcriptional level. Altogether, our study revealed that a large number of genes were alternatively spliced in the soybean genome and that variations in gene structure and transcriptional level may play important roles in regulating AS.

Journal ArticleDOI
TL;DR: This study provided a comprehensive view of AS under salt stress and revealed novel insights into the potential roles of AS in plant response to salt stress, suggesting a complex loop in AS regulation for stress adaptation.
Abstract: Alternative splicing (AS) of precursor mRNA (pre-mRNA) is an important gene regulation process that potentially regulates many physiological processes in plants, including the response to abiotic stresses such as salt stress. To analyze global changes in AS under salt stress, we obtained high-coverage (~200 times) RNA sequencing data from Arabidopsis thaliana seedlings that were treated with different concentrations of NaCl. We detected that ~49% of all intron-containing genes were alternatively spliced under salt stress, 10% of which experienced significant differential alternative splicing (DAS). Furthermore, AS increased significantly under salt stress compared with under unstressed conditions. We demonstrated that most DAS genes were not differentially regulated by salt stress, suggesting that AS may represent an independent layer of gene regulation in response to stress. Our analysis of functional categories suggested that DAS genes were associated with specific functional pathways, such as the pathways for the responses to stresses and RNA splicing. We revealed that serine/arginine-rich (SR) splicing factors were frequently and specifically regulated in AS under salt stresses, suggesting a complex loop in AS regulation for stress adaptation. We also showed that alternative splicing site selection (SS) occurred most frequently at 4 nucleotides upstream or downstream of the dominant sites and that exon skipping tended to link with alternative SS. Our study provided a comprehensive view of AS under salt stress and revealed novel insights into the potential roles of AS in plant response to salt stress.

Journal ArticleDOI
TL;DR: It is anticipated that iSS-PseDNC may become a useful tool for identifying splice sites and that the six DNA local structural properties described in this paper may provide novel insights for in-depth investigations into the mechanism of RNA splicing.
Abstract: In eukaryotic genes, exons are generally interrupted by introns. Accurately removing introns and joining exons together are essential processes in eukaryotic gene expression. With the avalanche of genome sequences generated in the postgenomic age, it is highly desired to develop automated methods for rapid and effective detection of splice sites that play important roles in gene structure annotation and even in RNA splicing. Although a series of computational methods were proposed for splice site identification, most of them neglected the intrinsic local structural properties. In the present study, a predictor called “iSS-PseDNC” was developed for identifying splice sites. In the new predictor, the sequences were formulated by a novel feature-vector called “pseudo dinucleotide composition” (PseDNC) into which six DNA local structural properties were incorporated. It was observed by the rigorous cross-validation tests on two benchmark datasets that the overall success rates achieved by iSS-PseDNC in identifying splice donor site and splice acceptor site were 85.45% and 87.73%, respectively. It is anticipated that iSS-PseDNC may become a useful tool for identifying splice sites and that the six DNA local structural properties described in this paper may provide novel insights for in-depth investigations into the mechanism of RNA splicing.

Journal ArticleDOI
TL;DR: It is shown here that slowing down elongation can also cause exon skipping by promoting the recruitment of the negative factor ETR-3 onto the UG-repeat at E9 3' splice site, which displaces the constitutive splicing factor U2AF65 from the overlapping polypyrimidine tract.

Journal ArticleDOI
TL;DR: Genetic rescue and in vitro splicing show that the RNA ligase activity of RTCB is directly required for the splicing of XBP1 mRNA, demonstrating that RtcB is the long-soughtRNA ligase that catalyzes unconventional RNA splicing during the mammalian UPR.

Journal ArticleDOI
01 Oct 2014-eLife
TL;DR: This study uses dual-color single-molecule RNA imaging in living human cells to construct a complete kinetic profile of transcription and splicing of the β-globin gene and finds that kinetic competition results in multiple competing pathways for pre-mRNA splicing.
Abstract: To make a protein, part of a DNA sequence is copied to make a messenger RNA (or mRNA) molecule in a process known as transcription. The enzyme that builds an mRNA molecule first binds to a start point on a DNA strand, and then uses the DNA sequence to build a ‘pre-mRNA’ molecule until a stop signal is reached. To make the final mRNA molecule, sections called introns are removed from the pre-mRNA molecules, and the parts left behind—known as exons—are then joined together. This process is called splicing. However, it is not fully understood how the splicing process is coordinated with the other stages of transcription. For example, does splicing occur after the pre-mRNA molecule is completed or while it is still being built? And what controls the order in which these processes occur? One theory about how the different mRNA-making processes are coordinated is called kinetic competition. This theory states that the fastest process is the most likely to occur, even if the other processes use less energy and so might be expected to be preferred. Alternatively, the different steps may be started and stopped by ‘checkpoints’ that cause the different processes to follow on from each other in a set order. Coulon et al. used fluorescence microscopy to investigate how mRNA molecules are made during the transcription of a human gene that makes a hemoglobin protein. To make the RNA visible, two different fluorescent markers were introduced into the pre-mRNA that cause different regions of the mRNA to glow in different colors. Coulon et al. made the introns fluoresce red and the exons glow green. Unspliced pre-mRNA molecules contain both introns and exons and so fluoresce in both colors, whereas spliced mRNA molecules contain only exons and so only glow with a green color. By looking at both the red and green fluorescence signals at the same time, Coulon et al. could see when an intron was spliced out of the pre-mRNA. This revealed that in normal cells, splicing can occur either before or after the RNA is released from where it is transcribed. Thus, splicing and transcription does not follow a set pattern, suggesting that checkpoints do not control the sequence of events. Instead, the fact that a spliced mRNA molecule can be formed in different ways suggests kinetic competition controls the process. In some cancer cells, there are defects in the cellular machinery that controls splicing. When looking at cells with such a defect, Coulon et al. found that splicing only occurred after transcription was completed. This study thus provides insight into the complex workings of mRNA synthesis and establishes a blueprint for understanding how splicing is impaired in diseases such as cancer.

Journal ArticleDOI
TL;DR: This work proposes a model whereby var gene sequence polymorphism is mainly generated during the asexual part of the life cycle, indicating that millions of new antigenic structures could potentially be generated each day in a single infected individual.
Abstract: The most polymorphic gene family in P. falciparum is the ∼60 var genes distributed across parasite chromosomes, both in the subtelomeres and in internal regions. They encode hypervariable surface proteins known as P. falciparum erythrocyte membrane protein 1 (PfEMP1) that are critical for pathogenesis and immune evasion in Plasmodium falciparum. How var gene sequence diversity is generated is not currently completely understood. To address this, we constructed large clone trees and performed whole genome sequence analysis to study the generation of novel var gene sequences in asexually replicating parasites. While single nucleotide polymorphisms (SNPs) were scattered across the genome, structural variants (deletions, duplications, translocations) were focused in and around var genes, with considerable variation in frequency between strains. Analysis of more than 100 recombination events involving var exon 1 revealed that the average nucleotide sequence identity of two recombining exons was only 63% (range: 52.7-72.4%) yet the crossovers were error-free and occurred in such a way that the resulting sequence was in frame and domain architecture was preserved. Var exon 1, which encodes the immunologically exposed part of the protein, recombined in up to 0.2% of infected erythrocytes in vitro per life cycle. The high rate of var exon 1 recombination indicates that millions of new antigenic structures could potentially be generated each day in a single infected individual. We propose a model whereby var gene sequence polymorphism is mainly generated during the asexual part of the life cycle.

Journal ArticleDOI
TL;DR: It is found that changes in chromatin accessibility occurred primarily within actively transcribed genes, including intron retention and aberrant splicing, affecting ∼25% of all expressed genes.
Abstract: Comprehensive sequencing of human cancers has identified recurrent mutations in genes encoding chromatin regulatory proteins. For clear cell renal cell carcinoma (ccRCC), three of the five commonly mutated genes encode the chromatin regulators PBRM1, SETD2, and BAP1. How these mutations alter the chromatin landscape and transcriptional program in ccRCC or other cancers is not understood. Here, we identified alterations in chromatin organization and transcript profiles associated with mutations in chromatin regulators in a large cohort of primary human kidney tumors. By associating variation in chromatin organization with mutations in SETD2, which encodes the enzyme responsible for H3K36 trimethylation, we found that changes in chromatin accessibility occurred primarily within actively transcribed genes. This increase in chromatin accessibility was linked with widespread alterations in RNA processing, including intron retention and aberrant splicing, affecting ∼25% of all expressed genes. Furthermore, decreased nucleosome occupancy proximal to misspliced exons was observed in tumors lacking H3K36me3. These results directly link mutations in SETD2 to chromatin accessibility changes and RNA processing defects in cancer. Detecting the functional consequences of specific mutations in chromatin regulatory proteins in primary human samples could ultimately inform the therapeutic application of an emerging class of chromatin-targeted compounds.

Journal ArticleDOI
TL;DR: Combined transcriptome-wide crosslinking immunoprecipitation, RNA-seq, and quantitative proteomics analyses identify RBM20-regulated targets and provide insight into the pathogenesis of human heart failure.
Abstract: Mutations in the gene encoding the RNA-binding protein RBM20 have been implicated in dilated cardiomyopathy (DCM), a major cause of chronic heart failure, presumably through altering cardiac RNA splicing. Here, we combined transcriptome-wide crosslinking immunoprecipitation (CLIP-seq), RNA-seq, and quantitative proteomics in cell culture and rat and human hearts to examine how RBM20 regulates alternative splicing in the heart. Our analyses revealed the presence of a distinct RBM20 RNA-recognition element that is predominantly found within intronic binding sites and linked to repression of exon splicing with RBM20 binding near 3' and 5' splice sites. Proteomic analysis determined that RBM20 interacts with both U1 and U2 small nuclear ribonucleic particles (snRNPs) and suggested that RBM20-dependent splicing repression occurs through spliceosome stalling at complex A. Direct RBM20 targets included several genes previously shown to be involved in DCM as well as genes not typically associated with this disease. In failing human hearts, reduced expression of RBM20 affected alternative splicing of several direct targets, indicating that differences in RBM20 expression may affect cardiac function. Together, these findings identify RBM20-regulated targets and provide insight into the pathogenesis of human heart failure.

Journal ArticleDOI
TL;DR: A mechanism by which DDX5 and DDX17 cooperate with heterogeneous nuclear ribonucleoprotein H/F splicing factors to define epithelial- and myoblast-specific splicing subprograms is uncovered and it is proposed to name these proteins "master orchestrators of differentiation that dynamically orchestrate several layers of gene expression.

Journal ArticleDOI
TL;DR: It is reported that U2AF has the capacity to directly define ~88% of functional 3′ splice sites in the human genome, but numerous U2 AF binding events also occur in intronic locations.
Abstract: The U2AF heterodimer has been well studied for its role in defining functional 3' splice sites in pre-mRNA splicing, but many fundamental questions still remain unaddressed regarding the function of U2AF in mammalian genomes Through genome-wide analysis of U2AF-RNA interactions, we report that U2AF has the capacity to directly define ~88% of functional 3' splice sites in the human genome, but numerous U2AF binding events also occur in intronic locations Mechanistic dissection reveals that upstream intronic binding events interfere with the immediate downstream 3' splice site associated either with the alternative exon, to cause exon skipping, or with the competing constitutive exon, to induce exon inclusion We further demonstrate partial functional impairment with leukemia-associated mutations in U2AF35, but not U2AF65, in regulated splicing These findings reveal the genomic function and regulatory mechanism of U2AF in both normal and disease states

Journal ArticleDOI
TL;DR: The use of Cas9 system is used to effectively generate targeted mutations in rabbit embryos and the production of KO rabbits with RNA-guided Cas9 nucleases.
Abstract: Dear Editor, Recently, zinc finger nuclease, transcription activator-like effector nuclease, and RNA-guided Cas9 endonuclease (Cas9) have emerged as powerful means for genome editing (Conklin, 2013; Gaj et al., 2013). These nucleases are efficient in generating double-strand breaks in the genome that can be repaired by error-prone nonhomologous end joining leading to a functional knockout (KO) of the targeted gene or used to integrate a DNA sequence at a specific locus through homologous recombination. Although the Cas9 system has been shown highly efficient in generating genetically engineered mice and rats (Li et al., 2013a, b; Wang et al., 2013), its feasibility in the rabbits still needs be determined. Here we report the use of Cas9 system to effectively generate targeted mutations in rabbit embryos and the production of KO rabbits. The rabbit is a classic model animal species. It is useful for the study of many human diseases such as atherosclerosis, cystic fibrosis, and acquired immunodeficiency syndrome. However, production of gene targeted transgenic (GTT) rabbits has been an extreme challenge. This is mainly due to the lack of germline transmitting embryonic stem cells and the very low efficiency of somatic cell nuclear transfer in rabbits (Chesne et al., 2002). In the present work, we used the Cas9 system to target the rabbit genome. Because embryo transfer work in large animal species is costly, we first established an in vitro system to test the efficacy of the RNA-guided Cas9 nucleases. Individual single guide RNAs (sgRNAs) were designed (Figure 1A and Supplementary Table S1) to target nine rabbit genes: apolipoprotein E (APOE), cluster of differentiation 36 (CD36), cystic fibrosis transmembrane conductance regulator, low-density lipoprotein receptor (LDLR), apolipoprotein CIII, scavenger receptor class B, member 1 (SCARB1), leptin, leptin receptor, and ryanodine receptor 2 (RyR2). For each gene, RNA mixture of Cas9 constructs (150 ng/μl Cas9 mRNA plus 6 ng/μl sgRNA) was microinjected into cytoplasm of pronuclear stage rabbit embryos (n = 290). We chose to use the concentration at 6 ng/μl for sgRNA because higher concentrations (12, 18, or 24 ng/μl) did not significantly improve the mutation rates (data not shown), and we speculate that higher quantity of sgRNA may increase the frequency of off-target events. Embryos were cultured in vitro, collected at blastocyst (BL) stage, and subjected to single embryo PCR and sequencing (n = 116) to identify mutations in the corresponding target locus. All nine sgRNAs generated mutations on their corresponding targeting loci with efficiencies ranging from 10% to nearly 100% (Figure 1D). High percentage (four out of nine) of the sgRNAs resulted in mutation rates higher than 50%; interestingly, bi-allelic mutations were also identified in these four, but not in those where mutation rates were 50% or lower (Figure 1D). Figure 1 Generation of KO rabbits with RNA-guided Cas9 nucleases. (A) Constructs of Cas9 RNA system used in this study. NLS, nuclear localization signal; bGH-pA, bovine growth hormone poly-A; PAM, protospacer adaptor motif. (B) Representative T7 endonuclease 1 ... After validating the in vitro gene targeting capacity of these Cas9 constructs, we continued to use four of them (APOE, CD36, LDLR, and RYR2) to produce KO rabbits. CD36, LDLR, and APOE KO rabbits are useful to study lipid metabolisms and atherosclerosis. The fourth line (RyR2 KO) will be used as a model to study heart arrhythmia. This is in recognition of the increasing demand for non-murine models for cardiovascular diseases in the research community. Cardiac physiology of rabbit better mimics that of human in a number of key areas than mouse does (Fan and Watanabe, 2003). For example, cholesteryl ester transfer protein, which plays a central role in the atherosclerotic process, is abundant in both human and rabbit plasma but absent in the mouse. Like humans, rabbits are very susceptible to diet-induced atherosclerosis, whereas wild-type (WT) mice do not develop atherosclerosis naturally. A total of 301 embryos were injected with one of the four Cas9 constructs and transferred to 10 pseudo-pregnant recipient rabbits (20–35 embryos per recipient). After 1 month gestation, nine (90%) recipients gave birth to 68 live kits (7.6/L), out of which 38 were identified as positive KO after initial T7 endonuclease assay and final confirmation by PCR sequencing (Figure 1B, C, and E). The term rate calculated as total term kits/total embryos is 22.6% (68/301). The KO rate calculated as total KO kits/total term is 55.9% (38/68). Consistent with the prediction based on the in vitro results, three (i.e. APOE, CD36, and RyR2) out of the four Cas9 constructs resulted in higher than 50% mutation rates and bi-allelic mutations. It remains to be tested whether mutations in these founder animals will faithfully transmit to the next generation. In agreement with previous reports, Δ15 mutant alleles were repeatedly discovered in six out of nine LDLR KO founder rabbits (Figure 1C, left lower panel), likely caused by microhomology-mediated end joining (Wang et al., 2013). One main concern with the Cas9 system for gene targeting is the off-target effects (Fu et al., 2013; Hsu et al., 2013). Recently, Hsu et al. (2013) examined Cas9-induced off-target mutation events in human 293T and 293FT cells. They found that sgRNA can tolerate as much as 4 nt changes in the seed sequence, and that the change of the protospacer adaptor motif (PAM) sequence from NGG to NAG does not totally abolish the targeting capacity. Their work suggested that there may be up to hundreds of potential off-target loci in a mammalian genome for one particular Cas9 seed sequence. In the present work, we examined off-target effects in all the KO founders (LDLR, RYR2, CD36, and APOE), following similar strategies described by Wang et al. (2013) in their mouse work. We used BLASTn to identify exact match to the 15 nt sequence (12 nt seed region and 3 nt NGG). A total of 160 potential off-target loci were identified, of which 9 were within an exon region (Supplementary Table S2). Considering the fact that mutations in an exon region are more likely to cause gene mutation and phenotype changes in the animals, we looked at these nine loci (Supplementary Table S3). None of the 38 founders contained mutations in these exon regions. This finding substantially alleviated the concerns of using Cas9-based gene targeting in rabbits. Notably, we used the most stringent way (i.e. exact match of 12 nt seed sequence plus the 3 nt NGG in the PAM), yet failed to identify any off-target events. Several factors may have contributed to the low frequency of off-target events. First, we used very low concentration of the sgRNA (6 ng/μl). In contrast, 12.5–50 ng/μl were used in similar mouse and rat studies (Li et al., 2013a; Wang et al., 2013). Secondly, we used RNA, whereas Hsu et al. (2013) used plasmid DNA. The much shorter half-life of RNAs may also reduce the off-target frequencies. Thirdly, we worked in an in vivo system of a different species, quite different from the in vitro system using immortalized human cell lines (e.g. 293T and 293FT) (Hsu et al., 2013). Consistent with our findings, recent reports of GTT mice and rats generated via Cas9 system also had very few detectable off-target mutations (Li et al., 2013b; Wang et al., 2013). Nevertheless, we admit that additional work may be necessary to examine all potential off-target loci. It is also noteworthy that the current assembly of rabbit genome is still incomplete. On the other hand, in the context of transgenic animal production, we argue that the initial focus could be narrowed to those that fall in an exon region, therefore making off-target examination feasible and affordable. Researchers who use these novel rabbit models in their future studies should however bear in mind that there may be mutations, although remotely, in other regions of the animal genome. To our knowledge, our work represents the first report of successful generation of GTT rabbits using RNA-guided Cas9 nucleases. We used 10 donor rabbits to harvest embryos, transferred 301 embryos to 10 recipient rabbits, and generated 38 KO founders. On average, for a given targeted gene, five rabbits (donors and recipients) were used and nine founder KO kits were produced in as few as two months. Such high success rates and time efficiency make production of GTT rabbits technically and economically feasible for biomedical research. In the near future, Cas9 system will likely contribute to the generation of knock-in, multiplex GTT, conditional GTT rabbits too, as has been demonstrated in mice (Wang et al., 2013; Yang et al., 2013). Taken together, we have established a highly effective Cas9-based system to produce KO rabbits. We achieved 100% success rates in all nine genes and generated four novel lines of KO rabbits (CD36, LDLR, RyR2, and APOE). Analysis on potential off-targets showed no effects in the Exon regions of KO rabbits, supporting the application of the Cas9 technology on generating GTT rabbits. Importantly, this success positions rabbits as a brand new animal model amendable to gene targeting in translational biomedical research arena. [Supplementary material is available at Journal of Molecular Cell Biology online. This work was funded by grants from the National Institutes of Health (HL117491, HL088391, HL114038, and HL068878 to Y.E.C). This work utilized Core Services supported by Center for Advanced Models for Translational Sciences and Therapeutics (CAMTraST) at the University of Michigan Medical Center.]