TopHat: discovering splice junctions with RNA-Seq

doi:10.1093/BIOINFORMATICS/BTP120

Open AccessJournal ArticleDOI

TopHat: discovering splice junctions with RNA-Seq

Cole Trapnell, +2 more

- 01 May 2009 -

Bioinformatics

- Vol. 25, Iss: 9, pp 1105-1111

Chats0

TLDR

The TopHat pipeline is much faster than previous systems, mapping nearly 2.2 million reads per CPU hour, which is sufficient to process an entire RNA-Seq experiment in less than a day on a standard desktop computer.

Abstract:

Motivation: A new protocol for sequencing the messenger RNA in a cell, known as RNA-Seq, generates millions of short sequence fragments in a single run. These fragments, or ‘reads’, can be used to measure levels of gene expression and to identify novel splice variants of genes. However, current software for aligning RNA-Seq data to a genome relies on known splice junctions and cannot identify novel ones. TopHat is an efficient read-mapping algorithm designed to align reads from an RNA-Seq experiment to a reference genome without relying on known splice sites. Results: We mapped the RNA-Seq reads from a recent mammalian RNA-Seq experiment and recovered more than 72% of the splice junctions reported by the annotation-based software from that study, along with nearly 20 000 previously unreported junctions. The TopHat pipeline is much faster than previous systems, mapping nearly 2.2 million reads per CPU hour, which is sufficient to process an entire RNA-Seq experiment in less than a day on a standard desktop computer. We describe several challenges unique to ab initio splice site discovery from RNA-Seq reads that will require further algorithm development. Availability: TopHat is free, open-source software available from http://tophat.cbcb.umd.edu Contact: ude.dmu.sc@eloc Supplementary information: Supplementary data are available at Bioinformatics online.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

STAR: ultrafast universal RNA-seq aligner

Alexander Dobin, +8 more

- 01 Jan 2013 -

Bioinformatics

TL;DR: The Spliced Transcripts Alignment to a Reference (STAR) software based on a previously undescribed RNA-seq alignment algorithm that uses sequential maximum mappable seed search in uncompressed suffix arrays followed by seed clustering and stitching procedure outperforms other aligners by a factor of >50 in mapping speed.

...read moreread less

Journal ArticleDOI

Full-length transcriptome assembly from RNA-Seq data without a reference genome.

Manfred Grabherr, +22 more

- 01 Jul 2011 -

Nature Biotechnology

TL;DR: The Trinity method for de novo assembly of full-length transcripts and evaluate it on samples from fission yeast, mouse and whitefly, whose reference genome is not yet available, providing a unified solution for transcriptome reconstruction in any sample.

...read moreread less

Journal ArticleDOI

RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome

Bo Li, +1 more

- 04 Aug 2011 -

BMC Bioinformatics

TL;DR: It is shown that accurate gene-level abundance estimates are best obtained with large numbers of short single-end reads, and estimates of the relative frequencies of isoforms within single genes may be improved through the use of paired- end reads, depending on the number of possible splice forms for each gene.

...read moreread less

Journal ArticleDOI

featureCounts: an efficient general-purpose program for assigning sequence reads to genomic features

Yang Liao, +2 more

- 01 Apr 2014 -

Bioinformatics

TL;DR: FeatureCounts as discussed by the authors is a read summarization program suitable for counting reads generated from either RNA or genomic DNA sequencing experiments, which implements highly efficient chromosome hashing and feature blocking techniques.

...read moreread less

Journal ArticleDOI

Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation

Cole Trapnell, +9 more

- 01 May 2010 -

Nature Biotechnology

TL;DR: The results suggest that Cufflinks can illuminate the substantial regulatory flexibility and complexity in even this well-studied model of muscle development and that it can improve transcriptome-based genome annotation.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Ultrafast and memory-efficient alignment of short DNA sequences to the human genome

Ben Langmead, +3 more

- 04 Mar 2009 -

Genome Biology

TL;DR: Bowtie extends previous Burrows-Wheeler techniques with a novel quality-aware backtracking algorithm that permits mismatches and can be used simultaneously to achieve even greater alignment speeds.

...read moreread less

Journal ArticleDOI

Mapping and quantifying mammalian transcriptomes by RNA-Seq.

Ali Mortazavi, +4 more

- 29 Jun 2008 -

Nature Methods

TL;DR: Although >90% of uniquely mapped reads fell within known exons, the remaining data suggest new and revised gene models, including changed or additional promoters, exons and 3′ untranscribed regions, as well as new candidate microRNA precursors.

...read moreread less

Journal ArticleDOI

Velvet: Algorithms for de novo short read assembly using de Bruijn graphs

Daniel R. Zerbino, +1 more

- 01 May 2008 -

Genome Research

TL;DR: Velvet represents a new approach to assembly that can leverage very short reads in combination with read pairs to produce useful assemblies and is in close agreement with simulated results without read-pair information.

...read moreread less

Journal ArticleDOI

BLAT—The BLAST-Like Alignment Tool

W. James Kent

- 01 Apr 2002 -

Genome Research

TL;DR: How BLAT was optimized is described, which is more accurate and 500 times faster than popular existing tools for mRNA/DNA alignments and 50 times faster for protein alignments at sensitivity settings typically used when comparing vertebrate sequences.

...read moreread less

Journal ArticleDOI

Alternative Isoform Regulation in Human Tissue Transcriptomes

Eric T. Wang, +9 more

- 27 Nov 2008 -

Nature

TL;DR: An in-depth analysis of 15 diverse human tissue and cell line transcriptomes on the basis of deep sequencing of complementary DNA fragments yielding a digital inventory of gene and mRNA isoform expression suggested common involvement of specific factors in tissue-level regulation of both splicing and polyadenylation.

...read moreread less