scispace - formally typeset
Search or ask a question
Journal ArticleDOI

RNA-Seq: a revolutionary tool for transcriptomics

01 Jan 2009-Nature Reviews Genetics (Nature Publishing Group)-Vol. 10, Iss: 1, pp 57-63
TL;DR: The RNA-Seq approach to transcriptome profiling that uses deep-sequencing technologies provides a far more precise measurement of levels of transcripts and their isoforms than other methods.
Abstract: RNA-Seq is a recently developed approach to transcriptome profiling that uses deep-sequencing technologies. Studies using this method have already altered our view of the extent and complexity of eukaryotic transcriptomes. RNA-Seq also provides a far more precise measurement of levels of transcripts and their isoforms than other methods. This article describes the RNA-Seq approach, the challenges associated with its application, and the advances made so far in characterizing several eukaryote transcriptomes.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: The bird’s eye view of Pol II transcription in the genome as well as insights provided by detailed mechanistic studies are discussed, although some studies from yeast are also described for comparative purposes.
Abstract: Regulation of gene expression is critical in determining cell identity, development and responses to the cellular environment. DNA is the inherited source of genetic information and regulation of gene expression starts with the selection of which genes will undergo transcription. RNA, the product of transcription, is then utilized to generate functional products, including being translated into protein or processed into functional RNA. In eukaryotes, protein coding genes are transcribed by RNA polymerase II (Pol II) into messenger RNAs (mRNA). These short-lived RNA species have a variety of characteristics and are extensively regulated from production to degradation.1 With the assistance of methods such as those using microarrays and high-throughput sequencing, the scale and depth of Pol II transcription studies have exploded. The sheer volume and complexity of data from many sources have even triggered a call for careful rethinking of the methods used for analysis and interpretation.2 It is doubtless, though, that regulation of transcription critically affects gene expression and thus cell state and cellular identity.3 Pol II transcription starts with the assembly of a pre-initiation complex (PIC) with general transcription factors (GTFs) that recognize DNA sequence elements around the promoter and recruit Pol II.4 This process also requires the multi-subunit Mediator complex that could be viewed as a platform for transcription.5 In the PIC, the two strands of DNA are separated and the template strand migrates into the active center of Pol II, thereby allowing the synthesis of RNA from the transcription start site (TSS).6 Although initiation could be viewed as the “on” switch for Pol II, much of mRNA production is regulated at the elongation phase.7 Pioneering studies on MYC8, HIV9, and HSP7010 transcription have indicated that Pol II can be transcriptionally engaged in the 5′ end of genes without generating full-length mRNA prior to induction. Genome wide analyses showed that a large fraction of human and Drosophila genes have poised Pol II about 50 nt downstream of the transcription start site (TSS).11 Under various activation conditions, Pol II is released from promoter proximal positions to produce full length transcripts and subsequently increase mRNA level.12 The factor required to trigger Pol II to enter productive elongation is P-TEFb.13 Productive elongation has a high elongation rate that ranges from 1.1 to 4.3 kb/min as measured by many different methods.14 During productive elongation the RNA is co-transcriptionally spliced and polyadenylated to generate mature mRNAs.15 Mirroring the dramatic differences in properties, productive elongation complexes have significantly different protein compositions than early elongation complexes.16 Transcription termination is crucial for recycling Pol II after a round of transcription and globally releasing Pol II from chromatin prior to cell division.17 It also helps to prevent interference of promoter function by transcription from neighboring genes.18 In metazoans, Pol II termination downstream of the 3′ end of almost all protein coding genes requires a functional Poly(A) signal and is always coupled with 3′ end processing.19 Because termination is the end of transcription elongation and by definition is a very transient state, it has been notoriously difficult to study, especially in vivo.20 The steps in transcription have been traditionally studied individually in great depth using specific genes. The development of new technologies has allowed transcription to be viewed and studied on a global scale. This review discusses the bird’s eye view of Pol II transcription in the genome as well as insights provided by detailed mechanistic studies. Recent studies are emphasized, but initial discoveries are also described to provide a historical perspective. We mostly focus on metazoan systems, although some studies from yeast are also described for comparative purposes. Our goal is to cover topics in multiple levels so that beginning scientists as well as experienced researchers will find the review useful.

114 citations

Journal ArticleDOI
TL;DR: It is shown how Variance-Stabilizing Transformed RNA-seq data samples are the most similar to microarray ones, with respect to inter-sample variation, correlation coefficient distribution and network topological architecture, and shown how betweenness centrality is generally a positive marker for essential genes in A.thaliana, regardless of the platform originating the data.
Abstract: Motivation: Coexpression networks are data-derived representations of genes behaving in a similar way across tissues and experimental conditions. They have been used for hypothesis generation and guilt-by-association approaches for inferring functions of previously unknown genes. So far, the main platform for expression data has been DNA microarrays; however, the recent development of RNA-seq allows for higher accuracy and coverage of transcript populations. It is therefore important to assess the potential for biological investigation of coexpression networks derived from this novel technique in a condition-independent dataset. Results: We collected 65 publicly available Illumina RNA-seq high quality Arabidopsis thaliana samples and generated Pearson correlation coexpression networks. These networks were then compared with those derived from analogous microarray data. We show how Variance-Stabilizing Transformed (VST) RNA-seq data samples are the most similar to microarray ones, with respect to inter-sample variation, correlation coefficient distribution and network topological architecture. Microarray networks show a slightly higher score in biology-derived quality assessments such as overlap with the known protein–protein interaction network and edge ontological agreement. Different coexpression network centralities are investigated; in particular, we show how betweenness centrality is generally a positive marker for essential genes in A.thaliana, regardless of the platform originating the data. In the end, we focus on a specific gene network case, showing that although microarray data seem more suited for gene network reverse engineering, RNA-seq offers the great advantage of extending coexpression analyses to the entire transcriptome. Contact: fgiorgi@appliedgenomics.org Supplementary information: Supplementary data are available at Bioinformatics online.

114 citations

Journal ArticleDOI
26 Jan 2012-PLOS ONE
TL;DR: An approach that allows cell type-specific transcriptional profiling of distinct target cells, which are rare and difficult to access, with unprecedented sensitivity and resolution is presented and it is shown that this approach can be applied to most eukaryotic organisms.
Abstract: The acquisition of distinct cell fates is central to the development of multicellular organisms and is largely mediated by gene expression patterns specific to individual cells and tissues. A spatially and temporally resolved analysis of gene expression facilitates the elucidation of transcriptional networks linked to cellular identity and function. We present an approach that allows cell type-specific transcriptional profiling of distinct target cells, which are rare and difficult to access, with unprecedented sensitivity and resolution. We combined laser-assisted microdissection (LAM), linear amplification starting from <1 ng of total RNA, and RNA-sequencing (RNA-Seq). As a model we used the central cell of the Arabidopsis thaliana female gametophyte, one of the female gametes harbored in the reproductive organs of the flower. We estimated the number of expressed genes to be more than twice the number reported previously in a study using LAM and ATH1 microarrays, and identified several classes of genes that were systematically underrepresented in the transcriptome measured with the ATH1 microarray. Among them are many genes that are likely to be important for developmental processes and specific cellular functions. In addition, we identified several intergenic regions, which are likely to be transcribed, and describe a considerable fraction of reads mapping to introns and regions flanking annotated loci, which may represent alternative transcript isoforms. Finally, we performed a de novo assembly of the transcriptome and show that the method is suitable for studying individual cell types of organisms lacking reference sequence information, demonstrating that this approach can be applied to most eukaryotic organisms.

114 citations


Cites background or result from "RNA-Seq: a revolutionary tool for t..."

  • ...The bias was likely due to the oligo-dT primed cDNA generation, which has been reported to preferentially represent the 39 ends of transcripts when compared to direct RNA fragmentation [8,15]....

    [...]

  • ...potential to overcome these limitations [8,9] and offers a variety of new possibilities such as the transcriptional profiling of organisms...

    [...]

  • ...Given that RNA-Seq is highly accurate [8,9,21,27], the results demonstrate the superior...

    [...]

  • ...reliance upon existing knowledge about the genome sequence [8]....

    [...]

Journal ArticleDOI
TL;DR: The combination of single-cell genome sequencing and a novel low-input metatranscriptomics protocol is used to reveal the intricate metabolic capabilities and microbial interactions of an alkane-degrading methanogenic community.
Abstract: Microbial interactions have a key role in global geochemical cycles. Although we possess significant knowledge about the general biochemical processes occurring in microbial communities, we are often unable to decipher key functions of individual microorganisms within the environment in part owing to the inability to cultivate or study them in isolation. Here, we circumvent this shortcoming through the use of single-cell genome sequencing and a novel low-input metatranscriptomics protocol to reveal the intricate metabolic capabilities and microbial interactions of an alkane-degrading methanogenic community. This methanogenic consortium oxidizes saturated hydrocarbons under anoxic conditions through a thus-far-uncharacterized biochemical process. The genome sequence of a dominant bacterial member of this community, belonging to the genus Smithella, was sequenced and served as the basis for subsequent analysis through metabolic reconstruction. Metatranscriptomic data generated from less than 500 pg of mRNA highlighted metabolically active genes during anaerobic alkane oxidation in comparison with growth on fatty acids. These data sets suggest that Smithella is not activating hexadecane by fumarate addition. Differential expression assisted in the identification of hypothetical proteins with no known homology that may be involved in hexadecane activation. Additionally, the combination of 16S rDNA sequence and metatranscriptomic data enabled the study of other prevalent organisms within the consortium and their interactions with Smithella, thus yielding a comprehensive characterization of individual constituents at the genome scale during methanogenic alkane oxidation.

114 citations


Cites background from "RNA-Seq: a revolutionary tool for t..."

  • ...Furthermore, unlike hybridization-based approaches, entire transcriptomes can be characterized without the knowledge of existing reference genomes before sequencing (Wang et al., 2009)....

    [...]

Journal ArticleDOI
TL;DR: This review assesses the impact of transcriptomics, proteomics and metabolomics on fungal plant pathology over the last decade and discusses their futures.
Abstract: Peer-reviewed literature is today littered with exciting new tools and techniques that are being used in all areas of biology and medicine. Transcriptomics, proteomics and, more recently, metabolomics are three of these techniques that have impacted on fungal plant pathology. Used individually, each of these techniques can generate a plethora of data that could occupy a laboratory for years. When used in combination, they have the potential to comprehensively dissect a system at the transcriptional and translational level. Transcriptomics, or quantitative gene expression profiling, is arguably the most familiar to researchers in the field of fungal plant pathology. Microarrays have been the primary technique for the last decade, but others are now emerging. Proteomics has also been exploited by the fungal phytopathogen community, but perhaps not to its potential. A lack of genome sequence information has frustrated proteomics researchers and has largely contributed to this technique not fulfilling its potential. The coming of the genome sequencing era has partially alleviated this problem. Metabolomics is the most recent of these techniques to emerge and is concerned with the non-targeted profiling of all metabolites in a given system. Metabolomics studies on fungal plant pathogens are only just beginning to appear, although its potential to dissect many facets of the pathogen and disease will see its popularity increase quickly. This review assesses the impact of transcriptomics, proteomics and metabolomics on fungal plant pathology over the last decade and discusses their futures. Each of the techniques is described briefly with further reading recommended. Key examples highlighting the application of these technologies to fungal plant pathogens are also reviewed.

114 citations


Cites background from "RNA-Seq: a revolutionary tool for t..."

  • ...Transcriptomics is the quantification of the transcriptome, the complete set of transcripts in a cell, and their abundance, for a specific developmental stage or physiological condition (Wang et al., 2009)....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: Although >90% of uniquely mapped reads fell within known exons, the remaining data suggest new and revised gene models, including changed or additional promoters, exons and 3′ untranscribed regions, as well as new candidate microRNA precursors.
Abstract: We have mapped and quantified mouse transcriptomes by deeply sequencing them and recording how frequently each gene is represented in the sequence sample (RNA-Seq). This provides a digital measure of the presence and prevalence of transcripts from known and previously unknown genes. We report reference measurements composed of 41–52 million mapped 25-base-pair reads for poly(A)-selected RNA from adult mouse brain, liver and skeletal muscle tissues. We used RNA standards to quantify transcript prevalence and to test the linear range of transcript detection, which spanned five orders of magnitude. Although >90% of uniquely mapped reads fell within known exons, the remaining data suggest new and revised gene models, including changed or additional promoters, exons and 3′ untranscribed regions, as well as new candidate microRNA precursors. RNA splice events, which are not readily measured by standard gene expression microarray or serial analysis of gene expression methods, were detected directly by mapping splice-crossing sequence reads. We observed 1.45 × 10 5 distinct splices, and alternative splices were prominent, with 3,500 different genes expressing one or more alternate internal splices. The mRNA population specifies a cell’s identity and helps to govern its present and future activities. This has made transcriptome analysis a general phenotyping method, with expression microarrays of many kinds in routine use. Here we explore the possibility that transcriptome analysis, transcript discovery and transcript refinement can be done effectively in large and complex mammalian genomes by ultra-high-throughput sequencing. Expression microarrays are currently the most widely used methodology for transcriptome analysis, although some limitations persist. These include hybridization and cross-hybridization artifacts 1–3 , dye-based detection issues and design constraints that preclude or seriously limit the detection of RNA splice patterns and previously unmapped genes. These issues have made it difficult for standard array designs to provide full sequence comprehensiveness (coverage of all possible genes, including unknown ones, in large genomes) or transcriptome comprehensiveness (reliable detection of all RNAs of all prevalence classes, including the least abundant ones that are physiologically relevant). Other

12,293 citations

PatentDOI
04 Oct 2000-Science
TL;DR: Serial analysis of gene expression (SAGE) should provide a broadly applicable means for the quantitative cataloging and comparison of expressed genes in a variety of normal, developmental, and disease states.
Abstract: PROBLEM TO BE SOLVED: To provide a method for preparing a short nucleotide sequence (tag) which is useful to identify a cDNA oligonucleotide and is derived from a restricted position in a mRNA or a cDNA. SOLUTION: This is the method of preparing a tag for identifying the cDNA oligonucleotide. The above method comprises preparing the cDNA oligonucleotide bearing 5' and 3' terminals, collecting cDNA fragments by cutting the cDNA oligonucleotide with a restriction enzyme at the first restriction endonuclease site, separating a cDNA oligonucleotide bearing 5' or 3' terminal and connecting an oligonucleotide linker to the isolated cDNA fragment bearing the cDNA oligonucleotide 5' or 3' terminal. Here, the oligonucleotide linker contains the recognition site of the second restriction endonuclease enzyme and the isolated cDNA fragment is cut with the second restriction endonuclease enzyme which cuts the cDNA fragment in a section separated from the recognition site to obtain the tag for identifying the cDNA oligonucleotide.

4,437 citations

Journal ArticleDOI
TL;DR: This work describes the software MAQ, software that can build assemblies by mapping shotgun short reads to a reference genome, using quality scores to derive genotype calls of the consensus sequence of a diploid genome, e.g., from a human sample.
Abstract: New sequencing technologies promise a new era in the use of DNA sequence. However, some of these technologies produce very short reads, typically of a few tens of base pairs, and to use these reads effectively requires new algorithms and software. In particular, there is a major issue in efficiently aligning short reads to a reference genome and handling ambiguity or lack of accuracy in this alignment. Here we introduce the concept of mapping quality, a measure of the confidence that a read actually comes from the position it is aligned to by the mapping algorithm. We describe the software MAQ that can build assemblies by mapping shotgun short reads to a reference genome, using quality scores to derive genotype calls of the consensus sequence of a diploid genome, e.g., from a human sample. MAQ makes full use of mate-pair information and estimates the error probability of each read alignment. Error probabilities are also derived for the final genotype calls, using a Bayesian statistical model that incorporates the mapping qualities, error probabilities from the raw sequence quality scores, sampling of the two haplotypes, and an empirical model for correlated errors at a site. Both read mapping and genotype calling are evaluated on simulated data and real data. MAQ is accurate, efficient, versatile, and user-friendly. It is freely available at http://maq.sourceforge.net.

2,927 citations

Journal ArticleDOI
TL;DR: It is found that the Illumina sequencing data are highly replicable, with relatively little technical variation, and thus, for many purposes, it may suffice to sequence each mRNA sample only once (i.e., using one lane).
Abstract: Ultra-high-throughput sequencing is emerging as an attractive alternative to microarrays for genotyping, analysis of methylation patterns, and identification of transcription factor binding sites. Here, we describe an application of the Illumina sequencing (formerly Solexa sequencing) platform to study mRNA expression levels. Our goals were to estimate technical variance associated with Illumina sequencing in this context and to compare its ability to identify differentially expressed genes with existing array technologies. To do so, we estimated gene expression differences between liver and kidney RNA samples using multiple sequencing replicates, and compared the sequencing data to results obtained from Affymetrix arrays using the same RNA samples. We find that the Illumina sequencing data are highly replicable, with relatively little technical variation, and thus, for many purposes, it may suffice to sequence each mRNA sample only once (i.e., using one lane). The information in a single lane of Illumina sequencing data appears comparable to that in a single array in enabling identification of differentially expressed genes, while allowing for additional analyses such as detection of low-expressed genes, alternative splice variants, and novel transcripts. Based on our observations, we propose an empirical protocol and a statistical framework for the analysis of gene expression using ultra-high-throughput sequencing technology.

2,834 citations

Journal ArticleDOI
TL;DR: The program SOAP is designed to handle the huge amounts of short reads generated by parallel sequencing using the new generation Illumina-Solexa sequencing technology, which supports multi-threaded parallel computing and has a batch module for multiple query sets.
Abstract: Summary: We have developed a program SOAP for efficient gapped and ungapped alignment of short oligonucleotides onto reference sequences. The program is designed to handle the huge amounts of short reads generated by parallel sequencing using the new generation Illumina-Solexa sequencing technology. SOAP is compatible with numerous applications, including single-read or pair-end resequencing, small RNA discovery and mRNA tag sequence mapping. SOAP is a command-driven program, which supports multi-threaded parallel computing, and has a batch module for multiple query sets. Availability: http://soap.genomics.org.cn Contact: soap@genomics.org.cn

2,729 citations


"RNA-Seq: a revolutionary tool for t..." refers methods in this paper

  • ...There are several programs for mapping reads to the genome, including ELAND, SOA...

    [...]