scispace - formally typeset
Search or ask a question
Journal ArticleDOI

RNA-Seq: a revolutionary tool for transcriptomics

01 Jan 2009-Nature Reviews Genetics (Nature Publishing Group)-Vol. 10, Iss: 1, pp 57-63
TL;DR: The RNA-Seq approach to transcriptome profiling that uses deep-sequencing technologies provides a far more precise measurement of levels of transcripts and their isoforms than other methods.
Abstract: RNA-Seq is a recently developed approach to transcriptome profiling that uses deep-sequencing technologies. Studies using this method have already altered our view of the extent and complexity of eukaryotic transcriptomes. RNA-Seq also provides a far more precise measurement of levels of transcripts and their isoforms than other methods. This article describes the RNA-Seq approach, the challenges associated with its application, and the advances made so far in characterizing several eukaryote transcriptomes.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: It is shown that accurate gene-level abundance estimates are best obtained with large numbers of short single-end reads, and estimates of the relative frequencies of isoforms within single genes may be improved through the use of paired- end reads, depending on the number of possible splice forms for each gene.
Abstract: RNA-Seq is revolutionizing the way transcript abundances are measured. A key challenge in transcript quantification from RNA-Seq data is the handling of reads that map to multiple genes or isoforms. This issue is particularly important for quantification with de novo transcriptome assemblies in the absence of sequenced genomes, as it is difficult to determine which transcripts are isoforms of the same gene. A second significant issue is the design of RNA-Seq experiments, in terms of the number of reads, read length, and whether reads come from one or both ends of cDNA fragments. We present RSEM, an user-friendly software package for quantifying gene and isoform abundances from single-end or paired-end RNA-Seq data. RSEM outputs abundance estimates, 95% credibility intervals, and visualization files and can also simulate RNA-Seq data. In contrast to other existing tools, the software does not require a reference genome. Thus, in combination with a de novo transcriptome assembler, RSEM enables accurate transcript quantification for species without sequenced genomes. On simulated and real data sets, RSEM has superior or comparable performance to quantification methods that rely on a reference genome. Taking advantage of RSEM's ability to effectively use ambiguously-mapping reads, we show that accurate gene-level abundance estimates are best obtained with large numbers of short single-end reads. On the other hand, estimates of the relative frequencies of isoforms within single genes may be improved through the use of paired-end reads, depending on the number of possible splice forms for each gene. RSEM is an accurate and user-friendly software tool for quantifying transcript abundances from RNA-Seq data. As it does not rely on the existence of a reference genome, it is particularly useful for quantification with de novo transcriptome assemblies. In addition, RSEM has enabled valuable guidance for cost-efficient design of quantification experiments with RNA-Seq, which is currently relatively expensive.

14,524 citations


Cites background or methods from "RNA-Seq: a revolutionary tool for t..."

  • ...RNA-Seq is a powerful technology for analyzing transcriptomes that is predicted to replace microarrays [1]....

    [...]

  • ...While RNASeq is generally accepted as being a more accurate quantification technology than microarrays [1], it 0 20 40 60 80 100 0 20 40 60 80 10 0...

    [...]

  • ...In contrast with IsoEM, RSEM is capable of modeling non-uniform sequence-independent read start position distributions (RSPDs), such as 3’-biased distributions that are produced by some RNA-Seq protocols [1]....

    [...]

Journal ArticleDOI
TL;DR: A technical review of template preparation, sequencing and imaging, genome alignment and assembly approaches, and recent advances in current and near-term commercially available NGS instruments is presented.
Abstract: Demand has never been greater for revolutionary technologies that deliver fast, inexpensive and accurate genome information. This challenge has catalysed the development of next-generation sequencing (NGS) technologies. The inexpensive production of large volumes of sequence data is the primary advantage over conventional methods. Here, I present a technical review of template preparation, sequencing and imaging, genome alignment and assembly approaches, and recent advances in current and near-term commercially available NGS instruments. I also outline the broad range of applications for NGS technologies, in addition to providing guidelines for platform selection to address biological questions of interest.

7,023 citations


Cites background from "RNA-Seq: a revolutionary tool for t..."

  • ...For example, in gene-expression studies microarrays are now being replaced by seq-based methods , which can identify and quantify rare transcripts without prior knowledge of a particular gene and can provide information regarding alternative splicing and sequence variation in identified gene...

    [...]

Journal ArticleDOI
TL;DR: This protocol provides a workflow for genome-independent transcriptome analysis leveraging the Trinity platform and presents Trinity-supported companion utilities for downstream applications, including RSEM for transcript abundance estimation, R/Bioconductor packages for identifying differentially expressed transcripts across samples and approaches to identify protein-coding genes.
Abstract: De novo assembly of RNA-seq data enables researchers to study transcriptomes without the need for a genome sequence; this approach can be usefully applied, for instance, in research on 'non-model organisms' of ecological and evolutionary importance, cancer samples or the microbiome. In this protocol we describe the use of the Trinity platform for de novo transcriptome assembly from RNA-seq data in non-model organisms. We also present Trinity-supported companion utilities for downstream applications, including RSEM for transcript abundance estimation, R/Bioconductor packages for identifying differentially expressed transcripts across samples and approaches to identify protein-coding genes. In the procedure, we provide a workflow for genome-independent transcriptome analysis leveraging the Trinity platform. The software, documentation and demonstrations are freely available from http://trinityrnaseq.sourceforge.net. The run time of this protocol is highly dependent on the size and complexity of data to be analyzed. The example data set analyzed in the procedure detailed herein can be processed in less than 5 h.

6,369 citations

Journal ArticleDOI
TL;DR: A simple and effective method for performing normalization is outlined and dramatically improved results for inferring differential expression in simulated and publicly available data sets are shown.
Abstract: The fine detail provided by sequencing-based transcriptome surveys suggests that RNA-seq is likely to become the platform of choice for interrogating steady state RNA. In order to discover biologically important changes in expression, we show that normalization continues to be an essential step in the analysis. We outline a simple and effective method for performing normalization and show dramatically improved results for inferring differential expression in simulated and publicly available data sets.

6,042 citations


Cites background from "RNA-Seq: a revolutionary tool for t..."

  • ...It has been suggested that ‘One particularly powerful advantage of RNA-seq is that it can capture transcriptome dynamics across different tissues or conditions without sophisticated normalization of data sets’ [5]....

    [...]

Journal ArticleDOI
Zefang Tang1, Chenwei Li1, Boxi Kang1, Ge Gao1, Cheng Li1, Zemin Zhang 
TL;DR: GEPIA (Gene Expression Profiling Interactive Analysis) fills in the gap between cancer genomics big data and the delivery of integrated information to end users, thus helping unleash the value of the current data resources.
Abstract: Tremendous amount of RNA sequencing data have been produced by large consortium projects such as TCGA and GTEx, creating new opportunities for data mining and deeper understanding of gene functions. While certain existing web servers are valuable and widely used, many expression analysis functions needed by experimental biologists are still not adequately addressed by these tools. We introduce GEPIA (Gene Expression Profiling Interactive Analysis), a web-based tool to deliver fast and customizable functionalities based on TCGA and GTEx data. GEPIA provides key interactive and customizable functions including differential expression analysis, profiling plotting, correlation analysis, patient survival analysis, similar gene detection and dimensionality reduction analysis. The comprehensive expression analyses with simple clicking through GEPIA greatly facilitate data mining in wide research areas, scientific discussion and the therapeutic discovery process. GEPIA fills in the gap between cancer genomics big data and the delivery of integrated information to end users, thus helping unleash the value of the current data resources. GEPIA is available at http://gepia.cancer-pku.cn/.

5,980 citations


Cites methods from "RNA-Seq: a revolutionary tool for t..."

  • ...High-throughput RNA sequencing (RNA-Seq) has emerged as a powerful method for transcriptomic analysis (1), widely used for understanding gene functions and biological patterns, finding candidate drug targets and identifying biomarkers for disease classification and diagnosis (2)....

    [...]

References
More filters
Journal ArticleDOI
03 May 2002-Science
TL;DR: Quantitative polymerase chain reactions confirm the array-based finding that Prp17p and Prp18p are not dispensable for removal of introns with short branchpoint-to-3′ splice site distances.
Abstract: Introns interrupt almost every eukaryotic protein-coding gene, yet how the splicing apparatus interprets the genome during messenger RNA (mRNA) synthesis is poorly understood. We designed microarrays to distinguish spliced from unspliced RNA for each intron-containing yeast gene and measured genomewide effects on splicing caused by loss of 18 different mRNA processing factors. After accommodating changes in transcription and decay by using gene-specific indexes, functional relationships between mRNA processing factors can be identified through their common effects on spliced and unspliced RNA. Groups of genes with different dependencies on mRNA processing factors are also apparent. Quantitative polymerase chain reactions confirm the array-based finding that Prp17p and Prp18p are not dispensable for removal of introns with short branchpoint-to-3' splice site distances.

421 citations


"RNA-Seq: a revolutionary tool for t..." refers background in this paper

  • ...Specialized microarrays have also been designed; for example, arrays with probes spanning exon junctions can be used to detect and quantify distinct spliced isoform...

    [...]

Journal ArticleDOI
TL;DR: The RMAP tool, which can map reads having a wide range of lengths and allows base-call quality scores to determine which positions in each read are more important when mapping, indicates that significant gains in Solexa read mapping performance can be achieved by considering the information in 3' ends of longer reads, and appropriately using the base- call quality scores.
Abstract: Background Second-generation sequencing has the potential to revolutionize genomics and impact all areas of biomedical science. New technologies will make re-sequencing widely available for such applications as identifying genome variations or interrogating the oligonucleotide content of a large sample (e.g. ChIP-sequencing). The increase in speed, sensitivity and availability of sequencing technology brings demand for advances in computational technology to perform associated analysis tasks. The Solexa/Illumina 1G sequencer can produce tens of millions of reads, ranging in length from ~25–50 nt, in a single experiment. Accurately mapping the reads back to a reference genome is a critical task in almost all applications. Two sources of information that are often ignored when mapping reads from the Solexa technology are the 3' ends of longer reads, which contain a much higher frequency of sequencing errors, and the base-call quality scores.

385 citations

Journal ArticleDOI
TL;DR: A massively parallel pyro-sequencing technology commercialized by 454 Life Sciences Corporation was used to sequence the transcriptomes of shoot apical meristems isolated from two inbred lines of maize using laser capture microdissection (LCM) as mentioned in this paper.
Abstract: A massively parallel pyro-sequencing technology commercialized by 454 Life Sciences Corporation was used to sequence the transcriptomes of shoot apical meristems isolated from two inbred lines of maize using laser capture microdissection (LCM). A computational pipeline that uses the POLYBAYES polymorphism detection system was adapted for 454 ESTs and used to detect SNPs (single nucleotide polymorphisms) between the two inbred lines. Putative SNPs were computationally identified using 260 000 and 280 000 454 ESTs from the B73 and Mo17 inbred lines, respectively. Over 36 000 putative SNPs were detected within 9980 unique B73 genomic anchor sequences (MAGIs). Stringent post-processing reduced this number to > 7000 putative SNPs. Over 85% (94/110) of a sample of these putative SNPs were successfully validated by Sanger sequencing. Based on this validation rate, this pilot experiment conservatively identified > 4900 valid SNPs within > 2400 maize genes. These results demonstrate that 454-based transcriptome sequencing is an excellent method for the high-throughput acquisition of gene-associated SNPs.

373 citations

Journal ArticleDOI
TL;DR: It is concluded that the coupling of LCM and 454 sequencing technologies facilitates the discovery of rare, possibly cell-type-specific transcripts.
Abstract: 454 DNA sequencing technology achieves significant throughput relative to traditional approaches. More than 261,000 ESTs were generated by 454 Life Sciences from cDNA isolated using laser capture microdissection (LCM) from the developmentally important shoot apical meristem (SAM) of maize (Zea mays L.). This single sequencing run annotated >25,000 maize genomic sequences and also captured ∼400 expressed transcripts for which homologous sequences have not yet been identified in other species. Approximately 70% of the ESTs generated in this study had not been captured during a previous EST project conducted using a cDNA library constructed from hand-dissected apex tissue that is highly enriched for SAMs. In addition, at least 30% of the 454-ESTs do not align to any of the ∼648,000 extant maize ESTs using conservative alignment criteria. These results indicate that the combination of LCM and the deep sequencing possible with 454 technology enriches for SAM transcripts not present in current EST collections. RT-PCR was used to validate the expression of 27 genes whose expression had been detected in the SAM via LCM-454 technology, but that lacked orthologs in GenBank. Significantly, transcripts from ∼74% (20/27) of these validated SAM-expressed “orphans” were not detected in meristem-rich immature ears. We conclude that the coupling of LCM and 454 sequencing technologies facilitates the discovery of rare, possibly cell-type-specific transcripts. [The sequence data from this study have been submitted to GenBank under accession nos. DW724699–DW985434.]

350 citations

Journal ArticleDOI
TL;DR: This work compares and contrasts new sequencing platforms in terms of stage of development, instrument configuration, template format, sequencing chemistry, throughput capability, operating cost, data handling issues, and error models, to extend the utility of these platforms for genome analysis.
Abstract: DNA sequencing is in a period of rapid change, in which capillary sequencing is no longer the technology of choice for most ultra-high-throughput applications. A new generation of instruments that utilize primed synthesis in flow cells to obtain, simultaneously, the sequence of millions of different DNA templates has changed the field. We compare and contrast these new sequencing platforms in terms of stage of development, instrument configuration, template format, sequencing chemistry, throughput capability, operating cost, data handling issues, and error models. While these platforms outperform capillary instruments in terms of bases per day and cost per base, the short length of sequence reads obtained from most instruments and the limited number of samples that can be run simultaneously imposes some practical constraints on sequencing applications. However, recently developed methods for paired-end sequencing and for array-based direct selection of desired templates from complex mixtures extend the utility of these platforms for genome analysis. Given the ever increasing demand for DNA sequence information, we can expect continuous improvement of this new generation of instruments and their eventual replacement by even more powerful technology.

253 citations


Additional excerpts

  • ...In principle, any high-throughput sequencing technolog...

    [...]