scispace - formally typeset
Search or ask a question
Topic

splice

About: splice is a research topic. Over the lifetime, 2708 publications have been published within this topic receiving 97622 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: The TopHat pipeline is much faster than previous systems, mapping nearly 2.2 million reads per CPU hour, which is sufficient to process an entire RNA-Seq experiment in less than a day on a standard desktop computer.
Abstract: Motivation: A new protocol for sequencing the messenger RNA in a cell, known as RNA-Seq, generates millions of short sequence fragments in a single run. These fragments, or ‘reads’, can be used to measure levels of gene expression and to identify novel splice variants of genes. However, current software for aligning RNA-Seq data to a genome relies on known splice junctions and cannot identify novel ones. TopHat is an efficient read-mapping algorithm designed to align reads from an RNA-Seq experiment to a reference genome without relying on known splice sites. Results: We mapped the RNA-Seq reads from a recent mammalian RNA-Seq experiment and recovered more than 72% of the splice junctions reported by the annotation-based software from that study, along with nearly 20 000 previously unreported junctions. The TopHat pipeline is much faster than previous systems, mapping nearly 2.2 million reads per CPU hour, which is sufficient to process an entire RNA-Seq experiment in less than a day on a standard desktop computer. We describe several challenges unique to ab initio splice site discovery from RNA-Seq reads that will require further algorithm development. Availability: TopHat is free, open-source software available from http://tophat.cbcb.umd.edu Contact: ude.dmu.sc@eloc Supplementary information: Supplementary data are available at Bioinformatics online.

11,473 citations

Journal ArticleDOI
TL;DR: Human Splicing Finder is designed, a tool to predict the effects of mutations on splicing signals or to identify splicing motifs in any human sequence, and it is shown that the mutation effect was correctly predicted in almost all cases.
Abstract: Thousands of mutations are identified yearly. Although many directly affect protein expression, an increasing proportion of mutations is now believed to influence mRNA splicing. They mostly affect existing splice sites, but synonymous, non-synonymous or nonsense mutations can also create or disrupt splice sites or auxiliary cis-splicing sequences. To facilitate the analysis of the different mutations, we designed Human Splicing Finder (HSF), a tool to predict the effects of mutations on splicing signals or to identify splicing motifs in any human sequence. It contains all available matrices for auxiliary sequence prediction as well as new ones for binding sites of the 9G8 and Tra2-beta Serine-Arginine proteins and the hnRNP A1 ribonucleoprotein. We also developed new Position Weight Matrices to assess the strength of 5' and 3' splice sites and branch points. We evaluated HSF efficiency using a set of 83 intronic and 35 exonic mutations known to result in splicing defects. We showed that the mutation effect was correctly predicted in almost all cases. HSF could thus represent a valuable resource for research, diagnostic and therapeutic (e.g. therapeutic exon skipping) purposes as well as for global studies, such as the GEN2PHEN European Project or the Human Variome Project.

2,300 citations

Journal ArticleDOI
TL;DR: A striking similarity among the rare splice junctions which do not contain AG at the 3' splice site or GT at the 5'splice site indicates the existence of special mechanisms to recognize them, and that these unique signals may be involved in crucial gene-regulation events and in differentiation.
Abstract: A systematic analysis of the RNA splice junction sequences of eukaryotic protein coding genes was carried out using the GENBANK databank. Nucleotide frequencies obtained for the highly conserved regions around the splice sites for different categories of organisms closely agree with each other. A striking similarity among the rare splice junctions which do not contain AG at the 3' splice site or GT at the 5' splice site indicates the existence of special mechanisms to recognize them, and that these unique signals may be involved in crucial gene-regulation events and in differentiation. A method was developed to predict potential exons in a bare sequence, using a scoring and ranking scheme based on nucleotide weight tables. This method was used to find a majority of the exons in selected known genes, and also predicted potential new exons which may be used in alternative splicing situations.

2,235 citations

Journal ArticleDOI
TL;DR: Evidence is presented that indicates that, at least for 5′ splice site mutations, crypticSplice site usage is favoured under conditions where (1) a number of such sites are present in the immediate vicinity and (2) these sites exhibit sufficient homology to thesplice site consensus sequence for them to be able to compete successfully with the mutated splice sites.
Abstract: A total of 101 different examples of point mutations, which lie in the vicinity of mRNA splice junctions, and which have been held to be responsible for a human genetic disease by altering the accuracy of efficiency of mRNA splicing, have been collated. These data comprise 62 mutations at 5′ splice sites, 26 at 3′ splice sites and 13 that result in the creation of novel splice sites. It is estimated that up to 15% of all point mutations causing human genetic disease result in an mRNA splicing defect. Of the 5′ splice site mutations, 60% involved the invariant GT dinucleotide; mutations were found to be non-randomly distributed with an excess over expectation at positions +1 and +2, and apparent deficiencies at positions −1 and −2. Of the 3′ splice site mutations, 87% involved the invariant AG dinucleotide; an excess of mutations over expectation was noted at position -2. This non-randomness of mutation reflects the evolutionary conservation apparent in splice site consensus sequences drawn up previously from primate genes, and is most probably attributable to detection bias resulting from the differing phenotypic severity of specific lesions. The spectrum of point mutations was also drastically skewed: purines were significantly overrepresented as substituting nucleotides, perhaps because of steric hindrance (e.g. in U1 snRNA binding at 5′ splice sites). Furthermore, splice sites affected by point mutations resulting in human genetic disease were markedly different from the splice site consensus sequences. When similarity was quantified by a ‘consensus value’, both extremely low and extremely high values were notably absent from the wild-type sequences of the mutated splice sites. Splice sites of intermediate similarity to the consensus sequence may thus be more prone to the deleterious effects of mutation. Regarding the phenotypic effects of mutations on mRNA splicing, exon skipping occurred more frequently than cryptic splice site usage. Evidence is presented that indicates that, at least for 5′ splice site mutations, cryptic splice site usage is favoured under conditions where (1) a number of such sites are present in the immediate vicinity and (2) these sites exhibit sufficient homology to the splice site consensus sequence for them to be able to compete successfully with the mutated splice site. The novel concept of a “potential for cryptic splice site usage” value was introduced in order to quantify these characteristics, and to predict the relative proportion of exon skipping vs cryptic splice site utilization consequent to the introduction of a mutation at a normal splice site.

1,310 citations

Journal ArticleDOI
TL;DR: A second generation splice detection algorithm, MapSplice, whose focus is high sensitivity and specificity in the detection of splices as well as CPU and memory efficiency is introduced, which indicates that Map Splice is a highly accurate algorithm for the alignment of RNA-seq reads to splice junctions.
Abstract: The accurate mapping of reads that span splice junctions is a critical component of all analytic techniques that work with RNA-seq data. We introduce a second generation splice detection algorithm, MapSplice, whose focus is high sensitivity and specificity in the detection of splices as well as CPU and memory efficiency. MapSplice can be applied to both short (<75 bp) and long reads (≥ 75 bp). MapSplice is not dependent on splice site features or intron length, consequently it can detect novel canonical as well as non-canonical splices. MapSplice leverages the quality and diversity of read alignments of a given splice to increase accuracy. We demonstrate that MapSplice achieves higher sensitivity and specificity than TopHat and SpliceMap on a set of simulated RNA-seq data. Experimental studies also support the accuracy of the algorithm. Splice junctions derived from eight breast cancer RNA-seq datasets recapitulated the extensiveness of alternative splicing on a global level as well as the differences between molecular subtypes of breast cancer. These combined results indicate that MapSplice is a highly accurate algorithm for the alignment of RNA-seq reads to splice junctions. Software download URL: http://www.netlab.uky.edu/p/bioinfo/MapSplice.

1,173 citations


Network Information
Related Topics (5)
Regulation of gene expression
85.4K papers, 5.8M citations
75% related
Gene expression
113.3K papers, 5.5M citations
74% related
Signal transduction
122.6K papers, 8.2M citations
73% related
RNA
111.6K papers, 5.4M citations
73% related
Gene
211.7K papers, 10.3M citations
73% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023489
2022279
202171
202090
201976
201883