RNA-Seq: a revolutionary tool for transcriptomics

doi:10.1038/NRG2484

Home
/
Papers
/
RNA-Seq: a revolutionary tool for transcriptomics

Journal Article•DOI•

RNA-Seq: a revolutionary tool for transcriptomics

Zhong Wang¹, Mark Gerstein¹, Michael Snyder¹•Institutions (1)

Yale University¹

01 Jan 2009-Nature Reviews Genetics (Nature Publishing Group)-Vol. 10, Iss: 1, pp 57-63

TL;DR: The RNA-Seq approach to transcriptome profiling that uses deep-sequencing technologies provides a far more precise measurement of levels of transcripts and their isoforms than other methods.

read less

Abstract: RNA-Seq is a recently developed approach to transcriptome profiling that uses deep-sequencing technologies. Studies using this method have already altered our view of the extent and complexity of eukaryotic transcriptomes. RNA-Seq also provides a far more precise measurement of levels of transcripts and their isoforms than other methods. This article describes the RNA-Seq approach, the challenges associated with its application, and the advances made so far in characterizing several eukaryote transcriptomes.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

RNA Polymerase II Transcription Elongation Control

[...]

Jiannan Guo¹, David H. Price¹•Institutions (1)

University of Iowa¹

06 Aug 2013-Chemical Reviews

TL;DR: The bird’s eye view of Pol II transcription in the genome as well as insights provided by detailed mechanistic studies are discussed, although some studies from yeast are also described for comparative purposes.

...read moreread less

Abstract: Regulation of gene expression is critical in determining cell identity, development and responses to the cellular environment. DNA is the inherited source of genetic information and regulation of gene expression starts with the selection of which genes will undergo transcription. RNA, the product of transcription, is then utilized to generate functional products, including being translated into protein or processed into functional RNA. In eukaryotes, protein coding genes are transcribed by RNA polymerase II (Pol II) into messenger RNAs (mRNA). These short-lived RNA species have a variety of characteristics and are extensively regulated from production to degradation.1 With the assistance of methods such as those using microarrays and high-throughput sequencing, the scale and depth of Pol II transcription studies have exploded. The sheer volume and complexity of data from many sources have even triggered a call for careful rethinking of the methods used for analysis and interpretation.2 It is doubtless, though, that regulation of transcription critically affects gene expression and thus cell state and cellular identity.3 Pol II transcription starts with the assembly of a pre-initiation complex (PIC) with general transcription factors (GTFs) that recognize DNA sequence elements around the promoter and recruit Pol II.4 This process also requires the multi-subunit Mediator complex that could be viewed as a platform for transcription.5 In the PIC, the two strands of DNA are separated and the template strand migrates into the active center of Pol II, thereby allowing the synthesis of RNA from the transcription start site (TSS).6 Although initiation could be viewed as the “on” switch for Pol II, much of mRNA production is regulated at the elongation phase.7 Pioneering studies on MYC8, HIV9, and HSP7010 transcription have indicated that Pol II can be transcriptionally engaged in the 5′ end of genes without generating full-length mRNA prior to induction. Genome wide analyses showed that a large fraction of human and Drosophila genes have poised Pol II about 50 nt downstream of the transcription start site (TSS).11 Under various activation conditions, Pol II is released from promoter proximal positions to produce full length transcripts and subsequently increase mRNA level.12 The factor required to trigger Pol II to enter productive elongation is P-TEFb.13 Productive elongation has a high elongation rate that ranges from 1.1 to 4.3 kb/min as measured by many different methods.14 During productive elongation the RNA is co-transcriptionally spliced and polyadenylated to generate mature mRNAs.15 Mirroring the dramatic differences in properties, productive elongation complexes have significantly different protein compositions than early elongation complexes.16 Transcription termination is crucial for recycling Pol II after a round of transcription and globally releasing Pol II from chromatin prior to cell division.17 It also helps to prevent interference of promoter function by transcription from neighboring genes.18 In metazoans, Pol II termination downstream of the 3′ end of almost all protein coding genes requires a functional Poly(A) signal and is always coupled with 3′ end processing.19 Because termination is the end of transcription elongation and by definition is a very transient state, it has been notoriously difficult to study, especially in vivo.20 The steps in transcription have been traditionally studied individually in great depth using specific genes. The development of new technologies has allowed transcription to be viewed and studied on a global scale. This review discusses the bird’s eye view of Pol II transcription in the genome as well as insights provided by detailed mechanistic studies. Recent studies are emphasized, but initial discoveries are also described to provide a historical perspective. We mostly focus on metazoan systems, although some studies from yeast are also described for comparative purposes. Our goal is to cover topics in multiple levels so that beginning scientists as well as experienced researchers will find the review useful.

...read moreread less

114 citations

Journal Article•DOI•

Comparative study of RNA-seq-and Microarray-derived coexpression networks in Arabidopsis thaliana

[...]

Federico M. Giorgi¹, Cristian Del Fabbro¹, Francesco Licausi¹•Institutions (1)

Sant'Anna School of Advanced Studies¹

01 Mar 2013-Bioinformatics

TL;DR: It is shown how Variance-Stabilizing Transformed RNA-seq data samples are the most similar to microarray ones, with respect to inter-sample variation, correlation coefficient distribution and network topological architecture, and shown how betweenness centrality is generally a positive marker for essential genes in A.thaliana, regardless of the platform originating the data.

...read moreread less

Abstract: Motivation: Coexpression networks are data-derived representations of genes behaving in a similar way across tissues and experimental conditions. They have been used for hypothesis generation and guilt-by-association approaches for inferring functions of previously unknown genes. So far, the main platform for expression data has been DNA microarrays; however, the recent development of RNA-seq allows for higher accuracy and coverage of transcript populations. It is therefore important to assess the potential for biological investigation of coexpression networks derived from this novel technique in a condition-independent dataset. Results: We collected 65 publicly available Illumina RNA-seq high quality Arabidopsis thaliana samples and generated Pearson correlation coexpression networks. These networks were then compared with those derived from analogous microarray data. We show how Variance-Stabilizing Transformed (VST) RNA-seq data samples are the most similar to microarray ones, with respect to inter-sample variation, correlation coefficient distribution and network topological architecture. Microarray networks show a slightly higher score in biology-derived quality assessments such as overlap with the known protein–protein interaction network and edge ontological agreement. Different coexpression network centralities are investigated; in particular, we show how betweenness centrality is generally a positive marker for essential genes in A.thaliana, regardless of the platform originating the data. In the end, we focus on a specific gene network case, showing that although microarray data seem more suited for gene network reverse engineering, RNA-seq offers the great advantage of extending coexpression analyses to the entire transcriptome. Contact: fgiorgi@appliedgenomics.org Supplementary information: Supplementary data are available at Bioinformatics online.

...read moreread less

114 citations

Journal Article•DOI•

A powerful method for transcriptional profiling of specific cell types in eukaryotes: laser-assisted microdissection and RNA sequencing.

[...]

Marc W. Schmid¹, Anja Schmidt¹, Ulrich C. Klostermeier², Matthias Barann², Philip Rosenstiel², Ueli Grossniklaus¹ - Show less +2 more•Institutions (2)

University of Zurich¹, University of Kiel²

26 Jan 2012-PLOS ONE

TL;DR: An approach that allows cell type-specific transcriptional profiling of distinct target cells, which are rare and difficult to access, with unprecedented sensitivity and resolution is presented and it is shown that this approach can be applied to most eukaryotic organisms.

...read moreread less

Abstract: The acquisition of distinct cell fates is central to the development of multicellular organisms and is largely mediated by gene expression patterns specific to individual cells and tissues. A spatially and temporally resolved analysis of gene expression facilitates the elucidation of transcriptional networks linked to cellular identity and function. We present an approach that allows cell type-specific transcriptional profiling of distinct target cells, which are rare and difficult to access, with unprecedented sensitivity and resolution. We combined laser-assisted microdissection (LAM), linear amplification starting from <1 ng of total RNA, and RNA-sequencing (RNA-Seq). As a model we used the central cell of the Arabidopsis thaliana female gametophyte, one of the female gametes harbored in the reproductive organs of the flower. We estimated the number of expressed genes to be more than twice the number reported previously in a study using LAM and ATH1 microarrays, and identified several classes of genes that were systematically underrepresented in the transcriptome measured with the ATH1 microarray. Among them are many genes that are likely to be important for developmental processes and specific cellular functions. In addition, we identified several intergenic regions, which are likely to be transcribed, and describe a considerable fraction of reads mapping to introns and regions flanking annotated loci, which may represent alternative transcript isoforms. Finally, we performed a de novo assembly of the transcriptome and show that the method is suitable for studying individual cell types of organisms lacking reference sequence information, demonstrating that this approach can be applied to most eukaryotic organisms.

...read moreread less

114 citations

Cites background or result from "RNA-Seq: a revolutionary tool for t..."

...The bias was likely due to the oligo-dT primed cDNA generation, which has been reported to preferentially represent the 39 ends of transcripts when compared to direct RNA fragmentation [8,15]....
[...]
...potential to overcome these limitations [8,9] and offers a variety of new possibilities such as the transcriptional profiling of organisms...
[...]
...Given that RNA-Seq is highly accurate [8,9,21,27], the results demonstrate the superior...
[...]
...reliance upon existing knowledge about the genome sequence [8]....
[...]

Journal Article•DOI•

Single-cell genome and metatranscriptome sequencing reveal metabolic interactions of an alkane-degrading methanogenic community.

[...]

Mallory Embree¹, Harish Nagarajan¹, Narjes S. Movahedi², Hamidreza Chitsaz², Karsten Zengler¹ - Show less +1 more•Institutions (2)

University of California, San Diego¹, Wayne State University²

01 Apr 2014-The ISME Journal

TL;DR: The combination of single-cell genome sequencing and a novel low-input metatranscriptomics protocol is used to reveal the intricate metabolic capabilities and microbial interactions of an alkane-degrading methanogenic community.

...read moreread less

Abstract: Microbial interactions have a key role in global geochemical cycles. Although we possess significant knowledge about the general biochemical processes occurring in microbial communities, we are often unable to decipher key functions of individual microorganisms within the environment in part owing to the inability to cultivate or study them in isolation. Here, we circumvent this shortcoming through the use of single-cell genome sequencing and a novel low-input metatranscriptomics protocol to reveal the intricate metabolic capabilities and microbial interactions of an alkane-degrading methanogenic community. This methanogenic consortium oxidizes saturated hydrocarbons under anoxic conditions through a thus-far-uncharacterized biochemical process. The genome sequence of a dominant bacterial member of this community, belonging to the genus Smithella, was sequenced and served as the basis for subsequent analysis through metabolic reconstruction. Metatranscriptomic data generated from less than 500 pg of mRNA highlighted metabolically active genes during anaerobic alkane oxidation in comparison with growth on fatty acids. These data sets suggest that Smithella is not activating hexadecane by fumarate addition. Differential expression assisted in the identification of hypothetical proteins with no known homology that may be involved in hexadecane activation. Additionally, the combination of 16S rDNA sequence and metatranscriptomic data enabled the study of other prevalent organisms within the consortium and their interactions with Smithella, thus yielding a comprehensive characterization of individual constituents at the genome scale during methanogenic alkane oxidation.

...read moreread less

114 citations

Cites background from "RNA-Seq: a revolutionary tool for t..."

...Furthermore, unlike hybridization-based approaches, entire transcriptomes can be characterized without the knowledge of existing reference genomes before sequencing (Wang et al., 2009)....
[...]

Journal Article•DOI•

Assessing the impact of transcriptomics, proteomics and metabolomics on fungal Phytopathology

[...]

Kar-Chun Tan¹, Simon V. S. Ipcho¹, Robert D. Trengove¹, Richard P. Oliver¹, Peter S. Solomon¹ - Show less +1 more•Institutions (1)

Murdoch University¹

01 Sep 2009-Molecular Plant Pathology

TL;DR: This review assesses the impact of transcriptomics, proteomics and metabolomics on fungal plant pathology over the last decade and discusses their futures.

...read moreread less

Abstract: Peer-reviewed literature is today littered with exciting new tools and techniques that are being used in all areas of biology and medicine. Transcriptomics, proteomics and, more recently, metabolomics are three of these techniques that have impacted on fungal plant pathology. Used individually, each of these techniques can generate a plethora of data that could occupy a laboratory for years. When used in combination, they have the potential to comprehensively dissect a system at the transcriptional and translational level. Transcriptomics, or quantitative gene expression profiling, is arguably the most familiar to researchers in the field of fungal plant pathology. Microarrays have been the primary technique for the last decade, but others are now emerging. Proteomics has also been exploited by the fungal phytopathogen community, but perhaps not to its potential. A lack of genome sequence information has frustrated proteomics researchers and has largely contributed to this technique not fulfilling its potential. The coming of the genome sequencing era has partially alleviated this problem. Metabolomics is the most recent of these techniques to emerge and is concerned with the non-targeted profiling of all metabolites in a given system. Metabolomics studies on fungal plant pathogens are only just beginning to appear, although its potential to dissect many facets of the pathogen and disease will see its popularity increase quickly. This review assesses the impact of transcriptomics, proteomics and metabolomics on fungal plant pathology over the last decade and discusses their futures. Each of the techniques is described briefly with further reading recommended. Key examples highlighting the application of these technologies to fungal plant pathogens are also reviewed.

...read moreread less

114 citations

Cites background from "RNA-Seq: a revolutionary tool for t..."

...Transcriptomics is the quantification of the transcriptome, the complete set of transcripts in a cell, and their abundance, for a specific developmental stage or physiological condition (Wang et al., 2009)....
[...]

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
…
140
141
142
143
144
145
146
…
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Mapping and quantifying mammalian transcriptomes by RNA-Seq.

[...]

Ali Mortazavi¹, Brian A. Williams¹, Kenneth McCue¹, Lorian Schaeffer¹, Barbara J. Wold¹ - Show less +1 more•Institutions (1)

California Institute of Technology¹

29 Jun 2008-Nature Methods

TL;DR: Although >90% of uniquely mapped reads fell within known exons, the remaining data suggest new and revised gene models, including changed or additional promoters, exons and 3′ untranscribed regions, as well as new candidate microRNA precursors.

...read moreread less

Abstract: We have mapped and quantified mouse transcriptomes by deeply sequencing them and recording how frequently each gene is represented in the sequence sample (RNA-Seq). This provides a digital measure of the presence and prevalence of transcripts from known and previously unknown genes. We report reference measurements composed of 41–52 million mapped 25-base-pair reads for poly(A)-selected RNA from adult mouse brain, liver and skeletal muscle tissues. We used RNA standards to quantify transcript prevalence and to test the linear range of transcript detection, which spanned five orders of magnitude. Although >90% of uniquely mapped reads fell within known exons, the remaining data suggest new and revised gene models, including changed or additional promoters, exons and 3′ untranscribed regions, as well as new candidate microRNA precursors. RNA splice events, which are not readily measured by standard gene expression microarray or serial analysis of gene expression methods, were detected directly by mapping splice-crossing sequence reads. We observed 1.45 × 10 5 distinct splices, and alternative splices were prominent, with 3,500 different genes expressing one or more alternate internal splices. The mRNA population specifies a cell’s identity and helps to govern its present and future activities. This has made transcriptome analysis a general phenotyping method, with expression microarrays of many kinds in routine use. Here we explore the possibility that transcriptome analysis, transcript discovery and transcript refinement can be done effectively in large and complex mammalian genomes by ultra-high-throughput sequencing. Expression microarrays are currently the most widely used methodology for transcriptome analysis, although some limitations persist. These include hybridization and cross-hybridization artifacts 1–3 , dye-based detection issues and design constraints that preclude or seriously limit the detection of RNA splice patterns and previously unmapped genes. These issues have made it difficult for standard array designs to provide full sequence comprehensiveness (coverage of all possible genes, including unknown ones, in large genomes) or transcriptome comprehensiveness (reliable detection of all RNAs of all prevalence classes, including the least abundant ones that are physiologically relevant). Other

...read moreread less

12,293 citations

Patent•DOI•

Serial analysis of gene expression

[...]

Kenneth W. Kinzler¹, Victor Velculescu², Bert Vogelstein², Lin Zhang², ヴェルヴレスク，ヴィクター，イー．, ヴォゲルステイン，バート, キンズラー，ケネス，ダブリュ．, ツァン，リン - Show less +4 more•Institutions (2)

Johns Hopkins University¹, Howard Hughes Medical Institute²

04 Oct 2000-Science

TL;DR: Serial analysis of gene expression (SAGE) should provide a broadly applicable means for the quantitative cataloging and comparison of expressed genes in a variety of normal, developmental, and disease states.

...read moreread less

Abstract: PROBLEM TO BE SOLVED: To provide a method for preparing a short nucleotide sequence (tag) which is useful to identify a cDNA oligonucleotide and is derived from a restricted position in a mRNA or a cDNA. SOLUTION: This is the method of preparing a tag for identifying the cDNA oligonucleotide. The above method comprises preparing the cDNA oligonucleotide bearing 5' and 3' terminals, collecting cDNA fragments by cutting the cDNA oligonucleotide with a restriction enzyme at the first restriction endonuclease site, separating a cDNA oligonucleotide bearing 5' or 3' terminal and connecting an oligonucleotide linker to the isolated cDNA fragment bearing the cDNA oligonucleotide 5' or 3' terminal. Here, the oligonucleotide linker contains the recognition site of the second restriction endonuclease enzyme and the isolated cDNA fragment is cut with the second restriction endonuclease enzyme which cuts the cDNA fragment in a section separated from the recognition site to obtain the tag for identifying the cDNA oligonucleotide.

...read moreread less

4,437 citations

Journal Article•DOI•

Mapping short DNA sequencing reads and calling variants using mapping quality scores

[...]

Heng Li¹, Jue Ruan, Richard Durbin•Institutions (1)

Wellcome Trust Sanger Institute¹

01 Nov 2008-Genome Research

TL;DR: This work describes the software MAQ, software that can build assemblies by mapping shotgun short reads to a reference genome, using quality scores to derive genotype calls of the consensus sequence of a diploid genome, e.g., from a human sample.

...read moreread less

Abstract: New sequencing technologies promise a new era in the use of DNA sequence. However, some of these technologies produce very short reads, typically of a few tens of base pairs, and to use these reads effectively requires new algorithms and software. In particular, there is a major issue in efficiently aligning short reads to a reference genome and handling ambiguity or lack of accuracy in this alignment. Here we introduce the concept of mapping quality, a measure of the confidence that a read actually comes from the position it is aligned to by the mapping algorithm. We describe the software MAQ that can build assemblies by mapping shotgun short reads to a reference genome, using quality scores to derive genotype calls of the consensus sequence of a diploid genome, e.g., from a human sample. MAQ makes full use of mate-pair information and estimates the error probability of each read alignment. Error probabilities are also derived for the final genotype calls, using a Bayesian statistical model that incorporates the mapping qualities, error probabilities from the raw sequence quality scores, sampling of the two haplotypes, and an empirical model for correlated errors at a site. Both read mapping and genotype calling are evaluated on simulated data and real data. MAQ is accurate, efficient, versatile, and user-friendly. It is freely available at http://maq.sourceforge.net.

...read moreread less

2,927 citations

Journal Article•DOI•

RNA-seq: An assessment of technical reproducibility and comparison with gene expression arrays

[...]

John C. Marioni¹, Christopher E. Mason, Shrikant Mane, Matthew Stephens, Yoav Gilad - Show less +1 more•Institutions (1)

University of Chicago¹

01 Sep 2008-Genome Research

TL;DR: It is found that the Illumina sequencing data are highly replicable, with relatively little technical variation, and thus, for many purposes, it may suffice to sequence each mRNA sample only once (i.e., using one lane).

...read moreread less

Abstract: Ultra-high-throughput sequencing is emerging as an attractive alternative to microarrays for genotyping, analysis of methylation patterns, and identification of transcription factor binding sites. Here, we describe an application of the Illumina sequencing (formerly Solexa sequencing) platform to study mRNA expression levels. Our goals were to estimate technical variance associated with Illumina sequencing in this context and to compare its ability to identify differentially expressed genes with existing array technologies. To do so, we estimated gene expression differences between liver and kidney RNA samples using multiple sequencing replicates, and compared the sequencing data to results obtained from Affymetrix arrays using the same RNA samples. We find that the Illumina sequencing data are highly replicable, with relatively little technical variation, and thus, for many purposes, it may suffice to sequence each mRNA sample only once (i.e., using one lane). The information in a single lane of Illumina sequencing data appears comparable to that in a single array in enabling identification of differentially expressed genes, while allowing for additional analyses such as detection of low-expressed genes, alternative splice variants, and novel transcripts. Based on our observations, we propose an empirical protocol and a statistical framework for the analysis of gene expression using ultra-high-throughput sequencing technology.

...read moreread less

2,834 citations

Journal Article•DOI•

SOAP: short oligonucleotide alignment program

[...]

Ruiqiang Li¹, Yingrui Li², Karsten Kristiansen², Jun Wang²•Institutions (2)

Beijing Genomics Institute¹, University of Southern Denmark²

01 Mar 2008-Bioinformatics

TL;DR: The program SOAP is designed to handle the huge amounts of short reads generated by parallel sequencing using the new generation Illumina-Solexa sequencing technology, which supports multi-threaded parallel computing and has a batch module for multiple query sets.

...read moreread less

Abstract: Summary: We have developed a program SOAP for efficient gapped and ungapped alignment of short oligonucleotides onto reference sequences. The program is designed to handle the huge amounts of short reads generated by parallel sequencing using the new generation Illumina-Solexa sequencing technology. SOAP is compatible with numerous applications, including single-read or pair-end resequencing, small RNA discovery and mRNA tag sequence mapping. SOAP is a command-driven program, which supports multi-threaded parallel computing, and has a batch module for multiple query sets. Availability: http://soap.genomics.org.cn Contact: soap@genomics.org.cn

...read moreread less

2,729 citations