scispace - formally typeset
Search or ask a question

Showing papers on "Exon published in 2012"


Journal ArticleDOI
01 Feb 2012-PLOS ONE
TL;DR: By deep sequencing of RNA from a variety of normal and malignant human cells, this work suggests that a non-canonical mode of RNA splicing, resulting in a circular RNA isoform, is a general feature of the gene expression program in human cells.
Abstract: Most human pre-mRNAs are spliced into linear molecules that retain the exon order defined by the genomic sequence. By deep sequencing of RNA from a variety of normal and malignant human cells, we found RNA transcripts from many human genes in which the exons were arranged in a non-canonical order. Statistical estimates and biochemical assays provided strong evidence that a substantial fraction of the spliced transcripts from hundreds of genes are circular RNAs. Our results suggest that a non-canonical mode of RNA splicing, resulting in a circular RNA isoform, is a general feature of the gene expression program in human cells.

1,989 citations


Journal ArticleDOI
TL;DR: DEXSeq is presented, a statistical method to test for differential exon usage in RNA-seq data that uses generalized linear models and offers reliable control of false discoveries by taking biological variation into account.
Abstract: RNA-seq is a powerful tool for the study of alternative splicing and other forms of alternative isoform expression. Understanding the regulation of these processes requires sensitive and specific detection of differential isoform abundance in comparisons between conditions, cell types, or tissues. We present DEXSeq, a statistical method to test for differential exon usage in RNA-seq data. DEXSeq uses generalized linear models and offers reliable control of false discoveries by taking biological variation into account. DEXSeq detects with high sensitivity genes, and in many cases exons, that are subject to differential exon usage. We demonstrate the versatility of DEXSeq by applying it to several data sets. The method facilitates the study of regulation and function of alternative exon usage on a genome-wide scale. An implementation of DEXSeq is available as an R/Bioconductor package.

1,332 citations


Journal ArticleDOI
21 Dec 2012-Science
TL;DR: The findings suggest that the evolution of alternative splicing has for the most part been very rapid and thatAlternative splicing patterns of most organs more strongly reflect the identity of the species rather than the organ type, with the highest complexity in primates.
Abstract: How species with similar repertoires of protein-coding genes differ so markedly at the phenotypic level is poorly understood. By comparing organ transcriptomes from vertebrate species spanning ~350 million years of evolution, we observed significant differences in alternative splicing complexity between vertebrate lineages, with the highest complexity in primates. Within 6 million years, the splicing profiles of physiologically equivalent organs diverged such that they are more strongly related to the identity of a species than they are to organ type. Most vertebrate species-specific splicing patterns are cis-directed. However, a subset of pronounced splicing changes are predicted to remodel protein interactions involving trans-acting regulators. These events likely further contributed to the diversification of splicing and other transcriptomic changes that underlie phenotypic differences among vertebrate species.

921 citations


Journal ArticleDOI
21 Dec 2012-Science
TL;DR: While tissue-specific gene expression programs are largely conserved, alternative splicing is well conserved in only a subset of tissues and is frequently lineage-specific, but the extent of splicing conservation is not clear.
Abstract: Most mammalian genes produce multiple distinct messenger RNAs through alternative splicing, but the extent of splicing conservation is not clear. To assess tissue-specific transcriptome variation across mammals, we sequenced complementary DNA from nine tissues from four mammals and one bird in biological triplicate, at unprecedented depth. We find that while tissue-specific gene expression programs are largely conserved, alternative splicing is well conserved in only a subset of tissues and is frequently lineage-specific. Thousands of previously unknown, lineage-specific, and conserved alternative exons were identified; widely conserved alternative exons had signatures of binding by MBNL, PTB, RBFOX, STAR, and TIA family splicing factors, implicating them as ancestral mammalian splicing regulators. Our data also indicate that alternative splicing often alters protein phosphorylatability, delimiting the scope of kinase signaling.

763 citations


Journal ArticleDOI
TL;DR: H19’s main physiological role is in limiting growth of the placenta before birth, by regulated processing of miR-675, which may also allow rapid inhibition of cell proliferation in response to cellular stress or oncogenic signals.
Abstract: The H19 large intergenic non-coding RNA (lincRNA) is one of the most highly abundant and conserved transcripts in mammalian development, being expressed in both embryonic and extra-embryonic cell lineages, yet its physiological function is unknown. Here we show that miR-675, a microRNA (miRNA) embedded in H19's first exon, is expressed exclusively in the placenta from the gestational time point when placental growth normally ceases, and placentas that lack H19 continue to grow. Overexpression of miR-675 in a range of embryonic and extra-embryonic cell lines results in their reduced proliferation; targets of the miRNA are upregulated in the H19 null placenta, including the growth-promoting insulin-like growth factor 1 receptor (Igf1r) gene. Moreover, the excision of miR-675 from H19 is dynamically regulated by the stress-response RNA-binding protein HuR. These results suggest that H19's main physiological role is in limiting growth of the placenta before birth, by regulated processing of miR-675. The controlled release of miR-675 from H19 may also allow rapid inhibition of cell proliferation in response to cellular stress or oncogenic signals.

711 citations


Journal ArticleDOI
TL;DR: Weber et al. as mentioned in this paper showed that alternative splicing is a widespread mechanism which increases transcriptome and proteome complexity and controls developmental programs and responses to the environment in higher eukaryotes.
Abstract: Alternative splicing (AS) is a widespread mechanism which increases transcriptome and proteome complexity and controls developmental programs and responses to the environment in higher eukaryotes. The splicing process, removal of introns and ligation of exons, is performed by a large RNA-protein complex, the spliceosome, consisting of five small nuclear RNAs (snRNAs) and about 180 proteins with different functions (Wahl et al. 2009). Assembly of the spliceosome on introns in a precursor messenger RNA (pre-mRNA) is directed by cis elements and trans-acting factors (Black 2003; Stamm et al. 2005). The cis sequences include the splice sites, branchpoint, and polypyrimidine tract which have degenerate consensus sequences in higher eukaryotes. While many splice sites are selected in all transcripts (constitutive splicing), others are used to various levels, resulting in alternative transcripts. Selection of such alternative splice sites is affected by auxiliary cis elements located within exonic and intronic sequences, termed splicing enhancers and silencers. These elements are binding sites for trans-acting splicing factors, for example, hnRNP and SR proteins. These proteins, in addition to their functions in constitutive splicing, play a key role in AS by inhibition or promotion of selection of particular splice sites. The presence and abundance of different splicing factors in different cell types, tissues, developmental stages, and environmental conditions determines the AS profiles of expressed genes and ultimately shapes the transcriptome. In addition, alternative transcripts can code for protein isoforms with altered amino acid and domain composition affecting their activity, interaction capacity, localization, and stability, thus affecting the proteome (Stamm et al. 2005). Alternative splicing was first described in 1977 as peculiar rearrangements in the adenovirus type 2 mRNA (Berget et al. 1977; Chow et al. 1977). Since the discovery of the first example of AS in an endogenous mammalian gene coding for calcitonin (Rosenfeld et al. 1981), the alignment of expressed sequence tag (EST) contigs to genomic DNA allowed the identification of a large number (∼35%) of alternatively spliced genes in humans (Mironov et al. 1999). Estimates of AS in many different organisms have been made using EST/cDNA libraries (Okazaki et al. 2002; Zavolan et al. 2003; Iida et al. 2004; Cusack and Wolfe 2005; Wakamatsu et al. 2009). With the advent of tiling arrays and high-throughput sequencing, the number of genes which undergo AS has continued to increase (Jones-Rhoades et al. 2007; Weber et al. 2007; Kwan et al. 2008; Mortazavi et al. 2008; Pan et al. 2008). In particular, the application of high-throughput sequencing to transcriptomes (RNA-seq) has now demonstrated that AS occurs in ∼95% of intron-containing genes in human (Pan et al. 2008). In plants, estimates of the occurrence of AS have been hampered by a low number of ESTs (Brett et al. 2002). However, the levels of AS have continued to increase with greater EST/cDNA coverage: 1.2% (Zhu et al. 2003), 5% (Zhu et al. 2003), 11.6% (Iida et al. 2004), 21.8% (Wang and Brendel 2006), 29% (Xiao et al. 2005), and >30% (Campbell et al. 2006). Many transcriptome studies using high-throughput sequencing have been performed in plants, but few have been used to examine AS (Weber et al. 2007; Filichkin et al. 2010; Lu et al. 2010; Zhang et al. 2010). The most recent estimate based on RNA-seq is that ∼42% of Arabidopsis intron-containing genes undergo AS (Filichkin et al. 2010). In terms of identifying AS in plants, the expression profile itself influences the representation of many transcripts in databases. For example, an Arabidopsis transcriptome study using 454 Life Sciences (Roche) sequencing (Weber et al. 2007) showed that the top 10 most highly expressed genes represent 25% of the total mapped reads, thus tremendously compromising the representation of less abundant transcripts. To improve gene representation and discovery of AS events in Arabidopsis, we have used RNA-seq of a normalized cDNA library made from Arabidopsis seedlings and flowers. We have shown that normalization significantly increases the coverage of reads across the genes, and we have identified a large number (∼47 k) of new splice junctions. Taking advantage of a high-resolution RT-PCR panel (Simpson et al. 2008a,b), we were able to validate many novel AS events. Altogether, our results show that at least 61% of intron-containing genes are alternatively spliced under normal growth conditions, which indicates a high complexity of the Arabidopsis transcriptome.

669 citations


01 Dec 2012
TL;DR: For example, this paper found that while tissue-specific gene expression programs are largely conserved, alternative splicing is well conserved in only a subset of tissues and is frequently lineage-specific.
Abstract: Most mammalian genes produce multiple distinct messenger RNAs through alternative splicing, but the extent of splicing conservation is not clear. To assess tissue-specific transcriptome variation across mammals, we sequenced complementary DNA from nine tissues from four mammals and one bird in biological triplicate, at unprecedented depth. We find that while tissue-specific gene expression programs are largely conserved, alternative splicing is well conserved in only a subset of tissues and is frequently lineage-specific. Thousands of previously unknown, lineage-specific, and conserved alternative exons were identified; widely conserved alternative exons had signatures of binding by MBNL, PTB, RBFOX, STAR, and TIA family splicing factors, implicating them as ancestral mammalian splicing regulators. Our data also indicate that alternative splicing often alters protein phosphorylatability, delimiting the scope of kinase signaling.

609 citations


Journal ArticleDOI
TL;DR: The first large scale RNA sequencing study of lung adenocarcinoma is presented, demonstrating its power to identify somatic point mutations as well as transcriptional variants such as gene fusions, alternative splicing events, and expression outliers.
Abstract: All cancers harbor molecular alterations in their genomes. The transcriptional consequences of these somatic mutations have not yet been comprehensively explored in lung cancer. Here we present the first large scale RNA sequencing study of lung adenocarcinoma, demonstrating its power to identify somatic point mutations as well as transcriptional variants such as gene fusions, alternative splicing events, and expression outliers. Our results reveal the genetic basis of 200 lung adenocarcinomas in Koreans including deep characterization of 87 surgical specimens by transcriptome sequencing. We identified driver somatic mutations in cancer genes including EGFR, KRAS, NRAS, BRAF, PIK3CA, MET, and CTNNB1. Candidates for novel driver mutations were also identified in genes newly implicated in lung adenocarcinoma such as LMTK2, ARID1A, NOTCH2, and SMARCA4. We found 45 fusion genes, eight of which were chimeric tyrosine kinases involving ALK, RET, ROS1, FGFR2, AXL, and PDGFRA. Among 17 recurrent alternative splicing events, we identified exon 14 skipping in the proto-oncogene MET as highly likely to be a cancer driver. The number of somatic mutations and expression outliers varied markedly between individual cancers and was strongly correlated with smoking history of patients. We identified genomic blocks within which gene expression levels were consistently increased or decreased that could be explained by copy number alterations in samples. We also found an association between lymph node metastasis and somatic mutations in TP53. These findings broaden our understanding of lung adenocarcinoma and may also lead to new diagnostic and therapeutic approaches.

538 citations


Journal ArticleDOI
TL;DR: In this paper, the structural, molecular, and clinical implications of EGFR exon 20 insertions were reviewed and an update with an emphasis on the structural and molecular implications of these insertions was provided.
Abstract: Summary Lung cancer is the leading cause of cancer-related death. The identification of epidermal growth factor receptor (EGFR) somatic mutations defined a new, molecularly classified subgroup of non-small-cell lung cancer (NSCLC). Classic EGFR activating mutations, such as inframe deletions in exon 19 or the Leu858Arg (L858R) point mutation in exon 21 are associated with sensitivity to first generation quinazoline reversible EGFR tyrosine kinase inhibitors (TKIs). EGFR exon 20 insertion mutations, which are typically located after the C-helix of the tyrosine kinase domain of EGFR, may account for up to 4% of all EGFR mutations. Preclinical models have shown that the most prevalent EGFR exon 20 insertion mutated proteins are resistant to clinically achievable doses of reversible (gefitinib, erlotinib) and irreversible (neratinib, afatinib, PF00299804) EGFR TKIs. Growing clinical experience with patients whose tumours harbour EGFR exon 20 insertions corresponds with the preclinical data; very few patients have had responses to EGFR TKIs. Despite the prevalence and biological importance of EGFR exon 20 insertions, few reports have summarised all preclinical and clinical data on these mutations. Here, we review the literature and provide an update with an emphasis on the structural, molecular, and clinical implications of EGFR exon 20 insertions.

482 citations


Journal ArticleDOI
TL;DR: The coSI measure, based on RNA-seq reads mapping to exon junctions and borders, is introduced, to assess the degree of splicing completion around internal exons, and significant enrichment of spliceosomal snRNAs in chromatin-associated RNA is found compared with other cellular RNA fractions and other nonspliceosome sn RNAs.
Abstract: Splicing remains an incompletely understood process. Recent findings suggest that chromatin structure participates in its regulation. Here, we analyze the RNA from subcellular fractions obtained through RNA-seq in the cell line K562. We show that in the human genome, splicing occurs predominantly during transcription. We introduce the coSI measure, based on RNA-seq reads mapping to exon junctions and borders, to assess the degree of splicing completion around internal exons. We show that, as expected, splicing is almost fully completed in cytosolic polyA+ RNA. In chromatin-associated RNA (which includes the RNA that is being transcribed), for 5.6% of exons, the removal of the surrounding introns is fully completed, compared with 0.3% of exons for which no intron-removal has occurred. The remaining exons exist as a mixture of spliced and fewer unspliced molecules, with a median coSI of 0.75. Thus, most RNAs undergo splicing while being transcribed: "co-transcriptional splicing." Consistent with co-transcriptional spliceosome assembly and splicing, we have found significant enrichment of spliceosomal snRNAs in chromatin-associated RNA compared with other cellular RNA fractions and other nonspliceosomal snRNAs. CoSI scores decrease along the gene, pointing to a "first transcribed, first spliced" rule, yet more downstream exons carry other characteristics, favoring rapid, co-transcriptional intron removal. Exons with low coSI values, that is, in the process of being spliced, are enriched with chromatin marks, consistent with a role for chromatin in splicing during transcription. For alternative exons and long noncoding RNAs, splicing tends to occur later, and the latter might remain unspliced in some cases.

448 citations


Journal ArticleDOI
17 Aug 2012-Cell
TL;DR: The muscle-blind-like (Mbnl) family of RNA-binding proteins plays important roles in muscle and eye development and in myotonic dystrophy (DM), in which expanded CUG or CCUG repeats functionally deplete Mbnl proteins as mentioned in this paper.

Journal ArticleDOI
06 Jul 2012-Cell
TL;DR: It is shown that rapid and transient transcriptional upregulation inherent to neuronal activation physiology creates U1 shortage relative to pre-mRNAs, and additional experiments suggest cotranscriptional PCPA counteracted by U1 association with nascent transcripts, a process the authors term telescripting, ensuring transcriptome integrity and regulating mRNA length.

Journal ArticleDOI
TL;DR: In this paper, the authors observed that brain and other tissue-regulated exons are significantly enriched in flexible regions of proteins that likely form conserved interaction surfaces, including Bridging Integrator 1 (Bin1)/Amphiphysin II and Dynamin 2 (Dnm2).

Journal ArticleDOI
TL;DR: In this article, the authors show that seed shattering in sorghum is controlled by a single gene, Shattering1 (Sh1), which encodes a YABBY transcription factor.
Abstract: A key step during crop domestication is the loss of seed shattering. Here, we show that seed shattering in sorghum is controlled by a single gene, Shattering1 (Sh1), which encodes a YABBY transcription factor. Domesticated sorghums harbor three different mutations at the Sh1 locus. Variants at regulatory sites in the promoter and intronic regions lead to a low level of expression, a 2.2-kb deletion causes a truncated transcript that lacks exons 2 and 3, and a GT-to-GG splice-site variant in the intron 4 results in removal of the exon 4. The distributions of these non-shattering haplotypes among sorghum landraces suggest three independent origins. The function of the rice ortholog (OsSh1) was subsequently validated with a shattering-resistant mutant, and two maize orthologs (ZmSh1-1 and ZmSh1-5.1+ZmSh1-5.2) were verified with a large mapping population. Our results indicate that Sh1 genes for seed shattering were under parallel selection during sorghum, rice and maize domestication.

Journal ArticleDOI
TL;DR: MATS (multivariate analysis of transcript splicing), a Bayesian statistical framework for flexible hypothesis testing of differential alternative splicing patterns on RNA-Seq data, is developed and demonstrated that MATS is an effective and flexible approach for detecting differential alternativesplicing from RNA- Seq data.
Abstract: Ultra-deep RNA sequencing has become a powerful approach for genome-wide analysis of pre-mRNA alternative splicing. We develop MATS (multivariate analysis of transcript splicing), a bayesian statistical framework for flexible hypothesis testing of differential alternative splicing patterns on RNA-Seq data. MATS uses a multivariate uniform prior to model the between-sample correlation in exon splicing patterns, and a Markov chain Monte Carlo (MCMC) method coupled with a simulation-based adaptive sampling procedure to calculate the P-value and false discovery rate (FDR) of differential alternative splicing. Importantly, the MATS approach is applicable to almost any type of null hypotheses of interest, providing the flexibility to identify differential alternative splicing events that match a given user-defined pattern. We evaluated the performance of MATS using simulated and real RNA-Seq data sets. In the RNA-Seq analysis of alternative splicing events regulated by the epithelial-specific splicing factor ESRP1, we obtained a high RT-PCR validation rate of 86% for differential exon skipping events with a MATS FDR of <10%. Additionally, over the full list of RT-PCR tested exons, the MATS FDR estimates matched well with the experimental validation rate. Our results demonstrate that MATS is an effective and flexible approach for detecting differential alternative splicing from RNA-Seq data.

Journal ArticleDOI
TL;DR: It is argued that the increased presence of SR and hnRNP proteins promoted the evolution of alternative splicing through relaxation of the sequence requirements of splice junctions.
Abstract: The splicing of pre-mRNAs is an essential step of gene expression in eukaryotes. Introns are removed from split genes through the activities of the spliceosome, a large ribonuclear machine that is conserved throughout the eukaryotic lineage. While unicellular eukaryotes are characterized by less complex splicing, pre-mRNA splicing of multicellular organisms is often associated with extensive alternative splicing that significantly enriches their proteome. The alternative selection of splice sites and exons permits multicellular organisms to modulate gene expression patterns in a cell type-specific fashion, thus contributing to their functional diversification. Alternative splicing is a regulated process that is mainly influenced by the activities of splicing regulators, such as SR proteins or hnRNPs. These modular factors have evolved from a common ancestor through gene duplication events to a diverse group of splicing regulators that mediate exon recognition through their sequence-specific binding to pre-mRNAs. Given the strong correlations between intron expansion, the complexity of pre-mRNA splicing, and the emergence of splicing regulators, it is argued that the increased presence of SR and hnRNP proteins promoted the evolution of alternative splicing through relaxation of the sequence requirements of splice junctions.


Journal ArticleDOI
TL;DR: It is shown that the association of nsp10 with nsp14 stimulates >35-fold the ExoN activity of the latter while playing no effect on N7-MTase activity, which indicates an RNA processing function potentially connected to a replicative mismatch repair mechanism.
Abstract: The replication/transcription complex of severe acute respiratory syndrome coronavirus is composed of at least 16 nonstructural proteins (nsp1–16) encoded by the ORF-1a/1b. This complex includes replication enzymes commonly found in positive-strand RNA viruses, but also a set of RNA-processing activities unique to some nidoviruses. The nsp14 protein carries both exoribonuclease (ExoN) and (guanine-N7)-methyltransferase (N7-MTase) activities. The nsp14 ExoN activity ensures a yet-uncharacterized function in the virus life cycle and must be regulated to avoid nonspecific RNA degradation. In this work, we show that the association of nsp10 with nsp14 stimulates >35-fold the ExoN activity of the latter while playing no effect on N7-MTase activity. Nsp10 mutants unable to interact with nsp14 are not proficient for ExoN activation. The nsp10/nsp14 complex hydrolyzes double-stranded RNA in a 3′ to 5′ direction as well as a single mismatched nucleotide at the 3′-end mimicking an erroneous replication product. In contrast, di-, tri-, and longer unpaired ribonucleotide stretches, as well as 3′-modified RNAs, resist nsp10/nsp14-mediated excision. In addition to the activation of nsp16-mediated 2′-O-MTase activity, nsp10 also activates nsp14 in an RNA processing function potentially connected to a replicative mismatch repair mechanism.

Journal ArticleDOI
TL;DR: The PWWP domain of the chromatin-associated protein Psip1/Ledgf can specifically recognize tri-methylated H3K36 and that, like this histone modification, thePsip1 short (p52) isoform is enriched at active genes.
Abstract: Increasing evidence suggests that chromatin modifications have important roles in modulating constitutive or alternative splicing. Here we demonstrate that the PWWP domain of the chromatin-associated protein Psip1/Ledgf can specifically recognize tri-methylated H3K36 and that, like this histone modification, the Psip1 short (p52) isoform is enriched at active genes. We show that the p52, but not the long (p75), isoform of Psip1 co-localizes and interacts with Srsf1 and other proteins involved in mRNA processing. The level of H3K36me3 associated Srsf1 is reduced in Psip1 mutant cells and alternative splicing of specific genes is affected. Moreover, we show altered Srsf1 distribution around the alternatively spliced exons of these genes in Psip1 null cells. We propose that Psip1/p52, through its binding to both chromatin and splicing factors, might act to modulate splicing.

Journal ArticleDOI
TL;DR: There is no indication that any prokaryote has ever possessed a spliceosome or introns in protein-coding genes, other than relatively rare mobile self-splicing introns, and introns were a major factor of evolution throughout the history of eukaryotes.
Abstract: Evolution of exon-intron structure of eukaryotic genes has been a matter of long-standing, intensive debate. The introns-early concept, later rebranded ‘introns first’ held that protein-coding genes were interrupted by numerous introns even at the earliest stages of life's evolution and that introns played a major role in the origin of proteins by facilitating recombination of sequences coding for small protein/peptide modules. The introns-late concept held that introns emerged only in eukaryotes and new introns have been accumulating continuously throughout eukaryotic evolution. Analysis of orthologous genes from completely sequenced eukaryotic genomes revealed numerous shared intron positions in orthologous genes from animals and plants and even between animals, plants and protists, suggesting that many ancestral introns have persisted since the last eukaryotic common ancestor (LECA). Reconstructions of intron gain and loss using the growing collection of genomes of diverse eukaryotes and increasingly advanced probabilistic models convincingly show that the LECA and the ancestors of each eukaryotic supergroup had intron-rich genes, with intron densities comparable to those in the most intron-rich modern genomes such as those of vertebrates. The subsequent evolution in most lineages of eukaryotes involved primarily loss of introns, with only a few episodes of substantial intron gain that might have accompanied major evolutionary innovations such as the origin of metazoa. The original invasion of self-splicing Group II introns, presumably originating from the mitochondrial endosymbiont, into the genome of the emerging eukaryote might have been a key factor of eukaryogenesis that in particular triggered the origin of endomembranes and the nucleus. Conversely, splicing errors gave rise to alternative splicing, a major contribution to the biological complexity of multicellular eukaryotes. There is no indication that any prokaryote has ever possessed a spliceosome or introns in protein-coding genes, other than relatively rare mobile self-splicing introns. Thus, the introns-first scenario is not supported by any evidence but exon-intron structure of protein-coding genes appears to have evolved concomitantly with the eukaryotic cell, and introns were a major factor of evolution throughout the history of eukaryotes. This article was reviewed by I. King Jordan, Manuel Irimia (nominated by Anthony Poole), Tobias Mourier (nominated by Anthony Poole), and Fyodor Kondrashov. For the complete reports, see the Reviewers’ Reports section.

Journal ArticleDOI
TL;DR: An unbiased, genome‐wide bioinformatic screen for gene fusions using Affymetrix Exon array expression data and the novel HEY1‐NCOA2 fusion appears to be the defining and diagnostic gene fusion in mesenchymal chondrosarcomas.
Abstract: Cancer gene fusions that encode a chimeric protein are often characterized by an intragenic discontinuity in the RNA\expression levels of the exons that are 5' or 3' to the fusion point in one or both of the fusion partners due to differences in the levels of activation of their respective promoters. Based on this, we developed an unbiased, genome-wide bioinformatic screen for gene fusions using Affymetrix Exon array expression data. Using a training set of 46 samples with different known gene fusions, we developed a data analysis pipeline, the "Fusion Score (FS) model", to score and rank genes for intragenic changes in expression. In a separate discovery set of 41 tumor samples with possible unknown gene fusions, the FS model generated a list of 552 candidate genes. The transcription factor gene NCOA2 was one of the candidates identified in a mesenchymal chondrosarcoma. A novel HEY1-NCOA2 fusion was identified by 5' RACE, representing an in-frame fusion of HEY1 exon 4 to NCOA2 exon 13. RT-PCR or FISH evidence of this HEY1-NCOA2 fusion was present in all additional mesenchymal chondrosarcomas tested with a definitive histologic diagnosis and adequate material for analysis (n = 9) but was absent in 15 samples of other subtypes of chondrosarcomas. We also identified a NUP107-LGR5 fusion in a dedifferentiated liposarcoma but analysis of 17 additional samples did not confirm it as a recurrent event in this sarcoma type. The novel HEY1-NCOA2 fusion appears to be the defining and diagnostic gene fusion in mesenchymal chondrosarcomas.

Journal ArticleDOI
TL;DR: It is found that both FUS and TDP-43 regulate genes that function in neuronal development, and a saw-tooth binding pattern in long genes demonstrated that FUS remains bound to pre-mRNAs until splicing is completed.
Abstract: Fused in sarcoma (FUS) and TAR DNA-binding protein 43 (TDP-43) are RNA-binding proteins pathogenetically linked to amyotrophic lateral sclerosis (ALS) and frontotemporal lobar degeneration (FTLD), but it is not known if they regulate the same transcripts. We addressed this question using crosslinking and immunoprecipitation (iCLIP) in mouse brain, which showed that FUS binds along the whole length of the nascent RNA with limited sequence specificity to GGU and related motifs. A saw-tooth binding pattern in long genes demonstrated that FUS remains bound to pre-mRNAs until splicing is completed. Analysis of FUS−/− brain demonstrated a role for FUS in alternative splicing, with increased crosslinking of FUS in introns around the repressed exons. We did not observe a significant overlap in the RNA binding sites or the exons regulated by FUS and TDP-43. Nevertheless, we found that both proteins regulate genes that function in neuronal development.

Journal ArticleDOI
04 Jan 2012-PLOS ONE
TL;DR: Analyses of the locations of the 5′ and 3′ transcriptional termini of 492 protein coding genes revealed that for 85% of these genes the boundaries extend beyond the current annotated termini, most often connecting with exons of transcripts from other well annotated genes.
Abstract: The classic organization of a gene structure has followed the Jacob and Monod bacterial gene model proposed more than 50 years ago. Since then, empirical determinations of the complexity of the transcriptomes found in yeast to human has blurred the definition and physical boundaries of genes. Using multiple analysis approaches we have characterized individual gene boundaries mapping on human chromosomes 21 and 22. Analyses of the locations of the 5' and 3' transcriptional termini of 492 protein coding genes revealed that for 85% of these genes the boundaries extend beyond the current annotated termini, most often connecting with exons of transcripts from other well annotated genes. The biological and evolutionary importance of these chimeric transcripts is underscored by (1) the non-random interconnections of genes involved, (2) the greater phylogenetic depth of the genes involved in many chimeric interactions, (3) the coordination of the expression of connected genes and (4) the close in vivo and three dimensional proximity of the genomic regions being transcribed and contributing to parts of the chimeric RNAs. The non-random nature of the connection of the genes involved suggest that chimeric transcripts should not be studied in isolation, but together, as an RNA network.

Journal ArticleDOI
TL;DR: The results reveal that differential exon-intron GC content is a previously unidentified determinant of exon selection and argue that the two GC content architectures reflect the two mechanisms by which splicing signals are recognized: exon definition and intron definition.

Journal ArticleDOI
TL;DR: The observations from the Drosophila model point toward an evolutionarily conserved role of RNA methylation in normal cognitive development, suggesting that mutations in this gene might even induce a syndromic form of ID.
Abstract: With a prevalence between 1 and 3%, hereditary forms of intellectual disability (ID) are among the most important problems in health care. Particularly, autosomal-recessive forms of the disorder have a very heterogeneous molecular basis, and genes with an increased number of disease-causing mutations are not common. Here, we report on three different mutations (two nonsense mutations, c.679C>T [p.Gln227∗] and c.1114C>T [p.Gln372∗], as well as one splicing mutation, g.6622224A>C [p.Ile179Argfs∗192]) that cause a loss of the tRNA-methyltransferase-encoding NSUN2 main transcript in homozygotes. We identified the mutations by sequencing exons and exon-intron boundaries within the genomic region where the linkage intervals of three independent consanguineous families of Iranian and Kurdish origin overlapped with the previously described MRT5 locus. In order to gain further evidence concerning the effect of a loss of NSUN2 on memory and learning, we constructed a Drosophila model by deleting the NSUN2 ortholog, CG6133, and investigated the mutants by using molecular and behavioral approaches. When the Drosophila melanogaster NSUN2 ortholog was deleted, severe short-term-memory (STM) deficits were observed; STM could be rescued by re-expression of the wild-type protein in the nervous system. The humans homozygous for NSUN2 mutations showed an overlapping phenotype consisting of moderate to severe ID and facial dysmorphism (which includes a long face, characteristic eyebrows, a long nose, and a small chin), suggesting that mutations in this gene might even induce a syndromic form of ID. Moreover, our observations from the Drosophila model point toward an evolutionarily conserved role of RNA methylation in normal cognitive development.

Journal ArticleDOI
TL;DR: It is demonstrated that engineered inactivation of severe acute respiratory syndrome-CoV ExoN activity results in a stable mutator phenotype with profoundly decreased fidelity in vivo and attenuation of pathogenesis in young, aged and immunocompromised mice.
Abstract: Live, attenuated RNA virus vaccines are efficacious but subject to reversion to virulence. Among RNA viruses, replication fidelity is recognized as a key determinant of virulence and escape from antiviral therapy; increased fidelity is attenuating for some viruses. Coronavirus (CoV) replication fidelity is approximately 20-fold greater than that of other RNA viruses and is mediated by a 3'→5' exonuclease (ExoN) activity that probably functions in RNA proofreading. In this study we demonstrate that engineered inactivation of severe acute respiratory syndrome (SARS)-CoV ExoN activity results in a stable mutator phenotype with profoundly decreased fidelity in vivo and attenuation of pathogenesis in young, aged and immunocompromised mice. The ExoN inactivation genotype and mutator phenotype are stable and do not revert to virulence, even after serial passage or long-term persistent infection in vivo. ExoN inactivation has potential for broad applications in the stable attenuation of CoVs and, perhaps, other RNA viruses.

Journal ArticleDOI
TL;DR: Coexistence of PIK3CA (the PI3K p110α subunit) exon 9 and 20 mutations, but not Pik3CA mutation in either exon 7 or 20 alone, is associated with poor prognosis of colorectal cancer patients.
Abstract: Purpose: Mutations in PIK3CA [the gene encoding the p110α catalytic subunit of phosphatidylinositide-3-kinase (PI3K)] play an important role in colorectal carcinogenesis. Experimental evidence suggests that PIK3CA exon 9 and exon 20 mutations trigger different biologic effects, and that concomitant mutations in both exons 9 and 20 synergistically enhance tumorigenic effects. Thus, we hypothesized that PIK3CA exon 9 and exon 20 mutations might have differential effects on clinical outcome in colorectal cancer, and that concomitant PIK3CA exon 9 and 20 mutations might confer aggressive tumor behavior. Experimental Design: We sequenced PIK3CA by pyrosequencing in 1,170 rectal and colon cancers in two prospective cohort studies, and found 189 (16%) PIK3CA mutated tumors. Mortality HR according to PIK3CA status was computed using Cox proportional hazards model, adjusting for clinical and molecular features, including microsatellite instability, CpG island methylator phenotype, LINE-1 methylation, and BRAF and KRAS mutations. Results: Compared with PIK3CA wild-type cases, patients with concomitant PIK3CA mutations in exons 9 and 20 experienced significantly worse cancer-specific survival [log-rank P = 0.031; multivariate HR = 3.51; 95% confidence interval (CI): 1.28–9.62] and overall survival (log-rank P = 0.0008; multivariate HR = 2.68; 95% CI: 1.24–5.77). PIK3CA mutation in either exon 9 or 20 alone was not significantly associated with patient survival. No significant interaction of PIK3CA mutation with BRAF or KRAS mutation was observed in survival analysis. Conclusion: Coexistence of PIK3CA (the PI3K p110α subunit) exon 9 and 20 mutations, but not PIK3CA mutation in either exon 9 or 20 alone, is associated with poor prognosis of colorectal cancer patients. Clin Cancer Res; 18(8); 2257–68. ©2012 AACR .

Journal ArticleDOI
TL;DR: A new role for 5-hmC in RNA splicing and synaptic function in the brain is suggested as well as substantial tissue-specific differential distributions of these DNA modifications at the exon-intron boundary in human and mouse tissues.
Abstract: The 5-methylcytosine (5-mC) derivative 5-hydroxymethylcytosine (5-hmC) is abundant in the brain for unknown reasons. Here we characterize the genomic distribution of 5-hmC and 5-mC in human and mouse tissues. We assayed 5-hmC by using glucosylation coupled with restriction-enzyme digestion and microarray analysis. We detected 5-hmC enrichment in genes with synapse-related functions in both human and mouse brain. We also identified substantial tissue-specific differential distributions of these DNA modifications at the exon-intron boundary in human and mouse. This boundary change was mainly due to 5-hmC in the brain but due to 5-mC in non-neural contexts. This pattern was replicated in multiple independent data sets and with single-molecule sequencing. Moreover, in human frontal cortex, constitutive exons contained higher levels of 5-hmC relative to alternatively spliced exons. Our study suggests a new role for 5-hmC in RNA splicing and synaptic function in the brain.

Journal ArticleDOI
TL;DR: DNMT1 is a widely expressed DNA methyltransferase maintaining methylation patterns in development, and mediating transcriptional repression by direct binding to HDAC2, also highly expressed in immune cells and required for the differentiation of CD4+ into T regulatory cells.
Abstract: Autosomal dominant cerebellar ataxia, deafness and narcolepsy (ADCA-DN) is characterized by late onset (30-40 years old) cerebellar ataxia, sensory neuronal deafness, narcolepsy-cataplexy and dementia. We performed exome sequencing in five individuals from three ADCA-DN kindreds and identified DNMT1 as the only gene with mutations found in all five affected individuals. Sanger sequencing confirmed the de novo mutation p.Ala570Val in one family, and showed co-segregation of p.Val606Phe and p.Ala570Val, with the ADCA-DN phenotype, in two other kindreds. An additional ADCA-DN kindred with a p.GLY605Ala mutation was subsequently identified. Narcolepsy and deafness were the first symptoms to appear in all pedigrees, followed by ataxia. DNMT1 is a widely expressed DNA methyltransferase maintaining methylation patterns in development, and mediating transcriptional repression by direct binding to HDAC2. It is also highly expressed in immune cells and required for the differentiation of CD4+ into T regulatory cells. Mutations in exon 20 of this gene were recently reported to cause hereditary sensory neuropathy with dementia and hearing loss (HSAN1). Our mutations are all located in exon 21 and in very close spatial proximity, suggesting distinct phenotypes depending on mutation location within this gene.

Journal ArticleDOI
TL;DR: This work reanalyzed single-base resolution bisulfite sequence data from Arabidopsis thaliana to differentiate body-methylated genes from unmethylated genes using a probabilistic approach and found that body- methylated genes tend to be longer and to be more functionally important, as measured by phenotypic effects of insertional mutants and by gene expression, than unmethylations.
Abstract: DNA methylation of coding regions, known as gene body methylation, is conserved across eukaryotic lineages. The function of body methylation is not known, but it may either prevent aberrant expression from intragenic promoters or enhance the accuracy of splicing. Given these putative functions, we hypothesized that body-methylated genes would be both longer and more functionally important than unmethylated genes. To test these hypotheses, we reanalyzed single-base resolution bisulfite sequence data from Arabidopsis thaliana to differentiate body-methylated genes from unmethylated genes using a probabilistic approach. Contrasting genic characteristics between the two groups, we found that body-methylated genes tend to be longer and to be more functionally important, as measured by phenotypic effects of insertional mutants and by gene expression, than unmethylated genes. We also found that methylated genes evolve more slowly than unmethylated genes, despite the potential for increased mutation rates in methylated CpG dinucleotides. We propose that slower rates in body-methylated genes are a function of higher selective constraint, lower nucleosome occupancy, and a lower proportion of CpG dinucleotides.