scispace - formally typeset
Search or ask a question
Journal ArticleDOI

RNA-Seq: a revolutionary tool for transcriptomics

01 Jan 2009-Nature Reviews Genetics (Nature Publishing Group)-Vol. 10, Iss: 1, pp 57-63
TL;DR: The RNA-Seq approach to transcriptome profiling that uses deep-sequencing technologies provides a far more precise measurement of levels of transcripts and their isoforms than other methods.
Abstract: RNA-Seq is a recently developed approach to transcriptome profiling that uses deep-sequencing technologies. Studies using this method have already altered our view of the extent and complexity of eukaryotic transcriptomes. RNA-Seq also provides a far more precise measurement of levels of transcripts and their isoforms than other methods. This article describes the RNA-Seq approach, the challenges associated with its application, and the advances made so far in characterizing several eukaryote transcriptomes.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: In this review, recent findings from UHTS are summarized and potential opportunities and challenges for broad adoption of these technologies in the plant science community are discussed.
Abstract: Ultra high-throughput sequencing (UHTS) technologies offer the potential to interrogate transcriptomes in detail that has traditionally been restricted to single gene surveys. For instance, it is now possible to globally define transcription start sites, polyadenylation signals, alternative splice sites and generate quantitative data on gene transcript accumulation in single tissues or cell types. These technologies are thus paving the way for whole genome transcriptomics and will undoubtedly lead to novel insights into plant development and biotic and abiotic stress responses. However, several challenges exist to making this technology broadly accessible to the plant research community. These include the current need for a computationally intensive analysis of data sets, a lack of standardized alignment and formatting procedures and a relatively small number of analytical software packages to interpret UHTS outputs. In this review we summarize recent findings from UHTS and discuss potential opportunities and challenges for broad adoption of these technologies in the plant science community.

122 citations


Cites background or methods from "RNA-Seq: a revolutionary tool for t..."

  • ...modifications [3]....

    [...]

  • ...[3, 6]....

    [...]

  • ...The later method provides good coverage of 30 sequences but biases against the body of the transcript [3]....

    [...]

  • ...These newly developed ‘ultrahigh-throughput’ sequencing technologies promise to provide a much more detailed view of plant transcriptomes and to revolutionize the way eukaryotic transcriptomes are analyzed [3]....

    [...]

  • ...However, this approach is deemed less useful when reads are generated from repetitive regions with high copy numbers [3]....

    [...]

Journal ArticleDOI
TL;DR: An extensive transcriptome dataset was obtained by RNA-Seq, giving a comprehensive overview of the root transcriptomes at tillering and heading stages in a heterotic rice cross and providing a useful resource for the rice research community.
Abstract: Heterosis is a phenomenon in which hybrids exhibit superior performance relative to parental phenotypes. In addition to the heterosis of above-ground agronomic traits on which most existing studies have focused, root heterosis is also an indispensable component of heterosis in the entire plant and of major importance to plant breeding. Consequently, systematic investigations of root heterosis, particularly in reproductive-stage rice, are needed. The recent advent of RNA sequencing technology (RNA-Seq) provides an opportunity to conduct in-depth transcript profiling for heterosis studies. Using the Illumina HiSeq 2000 platform, the root transcriptomes of the super-hybrid rice variety Xieyou 9308 and its parents were analyzed at tillering and heading stages. Approximately 391 million high-quality paired-end reads (100-bp in size) were generated and aligned against the Nipponbare reference genome. We found that 38,872 of 42,081 (92.4%) annotated transcripts were represented by at least one sequence read. A total of 829 and 4186 transcripts that were differentially expressed between the hybrid and its parents (DGHP) were identified at tillering and heading stages, respectively. Out of the DGHP, 66.59% were down-regulated at the tillering stage and 64.41% were up-regulated at the heading stage. At the heading stage, the DGHP were significantly enriched in pathways related to processes such as carbohydrate metabolism and plant hormone signal transduction, with most of the key genes that are involved in the two pathways being up-regulated in the hybrid. Several significant DGHP that could be mapped to quantitative trait loci (QTLs) for yield and root traits are also involved in carbohydrate metabolism and plant hormone signal transduction pathways. An extensive transcriptome dataset was obtained by RNA-Seq, giving a comprehensive overview of the root transcriptomes at tillering and heading stages in a heterotic rice cross and providing a useful resource for the rice research community. Using comparative transcriptome analysis, we detected DGHP and identified a group of potential candidate transcripts. The changes in the expression of the candidate transcripts may lay a foundation for future studies on molecular mechanisms underlying root heterosis.

122 citations


Cites methods from "RNA-Seq: a revolutionary tool for t..."

  • ...Next-generation high-throughput RNA sequencing technology (RNA-Seq) is a recently-developed method for discovering, profiling, and quantifying RNA transcripts with several advantages over other expression profiling technologies including higher sensitivity and the ability to detect splicing isoforms and somatic mutations [16]....

    [...]

Journal ArticleDOI
Yajing Hao1, Wei Wu1, Hui Li1, Jiao Yuan1, Jianjun Luo1, Yi Zhao1, Runsheng Chen1 
01 Jan 2016-Database
TL;DR: The NPInter database is updated to version 3.0, which contains experimentally verified interactions between ncRNAs (excluding tRNAs and rRNAs), especially long noncoding RNAs (lncRNAs) and other biomolecules and new web services are added, including a local UCSC Genome Browser to visualize binding sites.
Abstract: Despite the fact that a large quantity of noncoding RNAs (ncRNAs) have been identified, their functions remain unclear. To enable researchers to have a better understanding of ncRNAs' functions, we updated the NPInter database to version 3.0, which contains experimentally verified interactions between ncRNAs (excluding tRNAs and rRNAs), especially long noncoding RNAs (lncRNAs) and other biomolecules (proteins, mRNAs, miRNAs and genomic DNAs). In NPInter v3.0, interactions pertaining to ncRNAs are not only manually curated from scientific literature but also curated from high-throughput technologies. In addition, we also curated lncRNA-miRNA interactions fromin silicopredictions supported by AGO CLIP-seq data. When compared with NPInter v2.0, the interactions are more informative (with additional information on tissues or cell lines, binding sites, conservation, co-expression values and other features) and more organized (with divisions on data sets by data sources, tissues or cell lines, experiments and other criteria). NPInter v3.0 expands the data set to 491,416 interactions in 188 tissues (or cell lines) from 68 kinds of experimental technologies. NPInter v3.0 also improves the user interface and adds new web services, including a local UCSC Genome Browser to visualize binding sites. Additionally, NPInter v3.0 defined a high-confidence set of interactions and predicted the functions of lncRNAs in human and mouse based on the interactions curated in the database. NPInter v3.0 is available athttp://www.bioinfo.org/NPInter/Database URL:http://www.bioinfo.org/NPInter/.

121 citations


Cites background from "RNA-Seq: a revolutionary tool for t..."

  • ...Over the past decade, numerous noncoding RNAs (ncRNAs) have been identified in human (1), mouse (2) and other organisms (3–5) due to the advances in high-throughput sequencing (6)....

    [...]

Journal ArticleDOI
TL;DR: It is found that the raw signals from these two technologies are reasonably well correlated but that RNA-Seq outperforms tiling arrays in several respects, notably in exon boundary detection and dynamic range of expression.
Abstract: Tiling arrays have been the tool of choice for probing an organism's transcriptome without prior assumptions about the transcribed regions, but RNA-Seq is becoming a viable alternative as the costs of sequencing continue to decrease. Understanding the relative merits of these technologies will help researchers select the appropriate technology for their needs. Here, we compare these two platforms using a matched sample of poly(A)-enriched RNA isolated from the second larval stage of C. elegans. We find that the raw signals from these two technologies are reasonably well correlated but that RNA-Seq outperforms tiling arrays in several respects, notably in exon boundary detection and dynamic range of expression. By exploring the accuracy of sequencing as a function of depth of coverage, we found that about 4 million reads are required to match the sensitivity of two tiling array replicates. The effects of cross-hybridization were analyzed using a "nearest neighbor" classifier applied to array probes; we describe a method for determining potential "black list" regions whose signals are unreliable. Finally, we propose a strategy for using RNA-Seq data as a gold standard set to calibrate tiling array data. All tiling array and RNA-Seq data sets have been submitted to the modENCODE Data Coordinating Center. Tiling arrays effectively detect transcript expression levels at a low cost for many species while RNA-Seq provides greater accuracy in several regards. Researchers will need to carefully select the technology appropriate to the biological investigations they are undertaking. It will also be important to reconsider a comparison such as ours as sequencing technologies continue to evolve.

121 citations


Cites background from "RNA-Seq: a revolutionary tool for t..."

  • ...It has excelled at determining exon boundaries and as a corollary, at detecting and quantifying alternative splicing [9,13,15-17]....

    [...]

Journal ArticleDOI
TL;DR: This work has expanded GeneFriends, an online database that allows users to identify co-expressed genes with one or more user-defined genes, and updated the tool allows candidate transcripts to be linked to diseases and processes using a guilt-by-association approach.
Abstract: Co-expression networks have proven effective at assigning putative functions to genes based on the functional annotation of their co-expressed partners, in candidate gene prioritization studies and in improving our understanding of regulatory networks. The growing number of genome resequencing efforts and genome-wide association studies often identify loci containing novel genes and there is a need to infer their functions and interaction partners. To facilitate this we have expanded GeneFriends, an online database that allows users to identify co-expressed genes with one or more user-defined genes. This expansion entails an RNA-seq-based co-expression map that includes genes and transcripts that are not present in the microarray-based co-expression maps, including over 10 000 non-coding RNAs. The results users obtain from GeneFriends include a coexpression network as well as a summary of the functional enrichment among the co-expressed genes. Novel insights can be gathered from this database for different splice variants and ncRNAs, such as microRNAs and lincRNAs. Furthermore, our updated tool allows candidate transcripts to be linked to diseases and processes using a guilt-by-association approach. GeneFriends is freely available from http: //www.GeneFriends.org and can be used to quickly identify and rank candidate targets relevant to the process or disease under study.

121 citations


Cites background from "RNA-Seq: a revolutionary tool for t..."

  • ...Transcriptome sequencing (RNA-seq) is a powerful and emerging technology that allows researchers to measure differential expression of genes more accurately than when using microarrays (21)....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: Although >90% of uniquely mapped reads fell within known exons, the remaining data suggest new and revised gene models, including changed or additional promoters, exons and 3′ untranscribed regions, as well as new candidate microRNA precursors.
Abstract: We have mapped and quantified mouse transcriptomes by deeply sequencing them and recording how frequently each gene is represented in the sequence sample (RNA-Seq). This provides a digital measure of the presence and prevalence of transcripts from known and previously unknown genes. We report reference measurements composed of 41–52 million mapped 25-base-pair reads for poly(A)-selected RNA from adult mouse brain, liver and skeletal muscle tissues. We used RNA standards to quantify transcript prevalence and to test the linear range of transcript detection, which spanned five orders of magnitude. Although >90% of uniquely mapped reads fell within known exons, the remaining data suggest new and revised gene models, including changed or additional promoters, exons and 3′ untranscribed regions, as well as new candidate microRNA precursors. RNA splice events, which are not readily measured by standard gene expression microarray or serial analysis of gene expression methods, were detected directly by mapping splice-crossing sequence reads. We observed 1.45 × 10 5 distinct splices, and alternative splices were prominent, with 3,500 different genes expressing one or more alternate internal splices. The mRNA population specifies a cell’s identity and helps to govern its present and future activities. This has made transcriptome analysis a general phenotyping method, with expression microarrays of many kinds in routine use. Here we explore the possibility that transcriptome analysis, transcript discovery and transcript refinement can be done effectively in large and complex mammalian genomes by ultra-high-throughput sequencing. Expression microarrays are currently the most widely used methodology for transcriptome analysis, although some limitations persist. These include hybridization and cross-hybridization artifacts 1–3 , dye-based detection issues and design constraints that preclude or seriously limit the detection of RNA splice patterns and previously unmapped genes. These issues have made it difficult for standard array designs to provide full sequence comprehensiveness (coverage of all possible genes, including unknown ones, in large genomes) or transcriptome comprehensiveness (reliable detection of all RNAs of all prevalence classes, including the least abundant ones that are physiologically relevant). Other

12,293 citations

PatentDOI
04 Oct 2000-Science
TL;DR: Serial analysis of gene expression (SAGE) should provide a broadly applicable means for the quantitative cataloging and comparison of expressed genes in a variety of normal, developmental, and disease states.
Abstract: PROBLEM TO BE SOLVED: To provide a method for preparing a short nucleotide sequence (tag) which is useful to identify a cDNA oligonucleotide and is derived from a restricted position in a mRNA or a cDNA. SOLUTION: This is the method of preparing a tag for identifying the cDNA oligonucleotide. The above method comprises preparing the cDNA oligonucleotide bearing 5' and 3' terminals, collecting cDNA fragments by cutting the cDNA oligonucleotide with a restriction enzyme at the first restriction endonuclease site, separating a cDNA oligonucleotide bearing 5' or 3' terminal and connecting an oligonucleotide linker to the isolated cDNA fragment bearing the cDNA oligonucleotide 5' or 3' terminal. Here, the oligonucleotide linker contains the recognition site of the second restriction endonuclease enzyme and the isolated cDNA fragment is cut with the second restriction endonuclease enzyme which cuts the cDNA fragment in a section separated from the recognition site to obtain the tag for identifying the cDNA oligonucleotide.

4,437 citations

Journal ArticleDOI
TL;DR: This work describes the software MAQ, software that can build assemblies by mapping shotgun short reads to a reference genome, using quality scores to derive genotype calls of the consensus sequence of a diploid genome, e.g., from a human sample.
Abstract: New sequencing technologies promise a new era in the use of DNA sequence. However, some of these technologies produce very short reads, typically of a few tens of base pairs, and to use these reads effectively requires new algorithms and software. In particular, there is a major issue in efficiently aligning short reads to a reference genome and handling ambiguity or lack of accuracy in this alignment. Here we introduce the concept of mapping quality, a measure of the confidence that a read actually comes from the position it is aligned to by the mapping algorithm. We describe the software MAQ that can build assemblies by mapping shotgun short reads to a reference genome, using quality scores to derive genotype calls of the consensus sequence of a diploid genome, e.g., from a human sample. MAQ makes full use of mate-pair information and estimates the error probability of each read alignment. Error probabilities are also derived for the final genotype calls, using a Bayesian statistical model that incorporates the mapping qualities, error probabilities from the raw sequence quality scores, sampling of the two haplotypes, and an empirical model for correlated errors at a site. Both read mapping and genotype calling are evaluated on simulated data and real data. MAQ is accurate, efficient, versatile, and user-friendly. It is freely available at http://maq.sourceforge.net.

2,927 citations

Journal ArticleDOI
TL;DR: It is found that the Illumina sequencing data are highly replicable, with relatively little technical variation, and thus, for many purposes, it may suffice to sequence each mRNA sample only once (i.e., using one lane).
Abstract: Ultra-high-throughput sequencing is emerging as an attractive alternative to microarrays for genotyping, analysis of methylation patterns, and identification of transcription factor binding sites. Here, we describe an application of the Illumina sequencing (formerly Solexa sequencing) platform to study mRNA expression levels. Our goals were to estimate technical variance associated with Illumina sequencing in this context and to compare its ability to identify differentially expressed genes with existing array technologies. To do so, we estimated gene expression differences between liver and kidney RNA samples using multiple sequencing replicates, and compared the sequencing data to results obtained from Affymetrix arrays using the same RNA samples. We find that the Illumina sequencing data are highly replicable, with relatively little technical variation, and thus, for many purposes, it may suffice to sequence each mRNA sample only once (i.e., using one lane). The information in a single lane of Illumina sequencing data appears comparable to that in a single array in enabling identification of differentially expressed genes, while allowing for additional analyses such as detection of low-expressed genes, alternative splice variants, and novel transcripts. Based on our observations, we propose an empirical protocol and a statistical framework for the analysis of gene expression using ultra-high-throughput sequencing technology.

2,834 citations

Journal ArticleDOI
TL;DR: The program SOAP is designed to handle the huge amounts of short reads generated by parallel sequencing using the new generation Illumina-Solexa sequencing technology, which supports multi-threaded parallel computing and has a batch module for multiple query sets.
Abstract: Summary: We have developed a program SOAP for efficient gapped and ungapped alignment of short oligonucleotides onto reference sequences. The program is designed to handle the huge amounts of short reads generated by parallel sequencing using the new generation Illumina-Solexa sequencing technology. SOAP is compatible with numerous applications, including single-read or pair-end resequencing, small RNA discovery and mRNA tag sequence mapping. SOAP is a command-driven program, which supports multi-threaded parallel computing, and has a batch module for multiple query sets. Availability: http://soap.genomics.org.cn Contact: soap@genomics.org.cn

2,729 citations


"RNA-Seq: a revolutionary tool for t..." refers methods in this paper

  • ...There are several programs for mapping reads to the genome, including ELAND, SOA...

    [...]