RNA-Seq: a revolutionary tool for transcriptomics

doi:10.1038/NRG2484

Home
/
Papers
/
RNA-Seq: a revolutionary tool for transcriptomics

Journal Article•DOI•

RNA-Seq: a revolutionary tool for transcriptomics

Zhong Wang¹, Mark Gerstein¹, Michael Snyder¹•Institutions (1)

Yale University¹

01 Jan 2009-Nature Reviews Genetics (Nature Publishing Group)-Vol. 10, Iss: 1, pp 57-63

TL;DR: The RNA-Seq approach to transcriptome profiling that uses deep-sequencing technologies provides a far more precise measurement of levels of transcripts and their isoforms than other methods.

read less

Abstract: RNA-Seq is a recently developed approach to transcriptome profiling that uses deep-sequencing technologies. Studies using this method have already altered our view of the extent and complexity of eukaryotic transcriptomes. RNA-Seq also provides a far more precise measurement of levels of transcripts and their isoforms than other methods. This article describes the RNA-Seq approach, the challenges associated with its application, and the advances made so far in characterizing several eukaryote transcriptomes.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Challenges of sequencing human genomes

[...]

Daniel C. Koboldt¹, Li Ding, Elaine R. Mardis, Richard K. Wilson•Institutions (1)

Washington University in St. Louis¹

02 Jun 2010-Briefings in Bioinformatics

TL;DR: The state of current NGS technologies, as well as the strategies that enable NGS users to characterize the full spectrum of DNA sequence variation in humans are described.

...read moreread less

Abstract: Massively parallel sequencing technologies continue to alter the study of human genetics. As the cost of sequencing declines, next-generation sequencing (NGS) instruments and datasets will become increasingly accessible to the wider research community. Investigators are understandably eager to harness the power of these new technologies. Sequencing human genomes on these platforms, however, presents numerous production and bioinformatics challenges. Production issues like sample contamination, library chimaeras and variable run quality have become increasingly problematic in the transition from technology development lab to production floor. Analysis of NGS data, too, remains challenging, particularly given the short-read lengths (35-250 bp) and sheer volume of data. The development of streamlined, highly automated pipelines for data analysis is critical for transition from technology adoption to accelerated research and publication. This review aims to describe the state of current NGS technologies, as well as the strategies that enable NGS users to characterize the full spectrum of DNA sequence variation in humans.

...read moreread less

157 citations

Additional excerpts

...NGS technologies [31]....
[...]

Journal Article•DOI•

Genome-wide annotation of genes and noncoding RNAs of foxtail millet in response to simulated drought stress by deep sequencing

[...]

Xin Qi¹, Shaojun Xie¹, Yuwei Liu¹, Fei Yi¹, Jingjuan Yu¹ - Show less +1 more•Institutions (1)

University of Minnesota¹

17 Jul 2013-Plant Molecular Biology

TL;DR: A deep sequencing approach was used to generate a genome-wide transcriptome of foxtail millet after exposure to simulated drought stress, and it was found that the reduced levels of 24-nt siRNA flanking genes were associated, for the most part, with proximal up-regulated genes, indicating a potential effect of 24.nt siRNAs on drought-regulated gene expression.

...read moreread less

Abstract: Drought is a major abiotic stress that affects plant growth, production, and survival. Plants have evolved sophisticated and highly complex reactions to drought stress, including large-scale transcriptome reconfiguration. Foxtail millet (Setaria italica) is a member of the Poaceae family. Because of its outstanding tolerance to drought stress foxtail millet has the potential to become a new model organism. To enrich our knowledge of the processes that contribute to drought resistance, we have used a deep sequencing approach to generate a genome-wide transcriptome of foxtail millet after exposure to simulated drought stress. A large number of differentially expressed genes were characterized; in particular, we examined the roles of small interfering RNAs (siRNAs) and long noncoding RNAs (lncRNAs) in response to a water-deficit condition. These RNAs have remained largely unexplored in previous studies of stress-induced transcriptomes. We found that the reduced levels of 24-nt siRNA flanking genes were associated, for the most part, with proximal up-regulated genes, indicating a potential effect of 24-nt siRNAs on drought-regulated gene expression. Several lncRNAs that responded to the simulated drought stress were also identified, and we found that one of them shared sequence conservation and colinearity with its counterpart in sorghum (Sorghum bicolor). Our findings provide new insights into drought-induced changes in the foxtail millet transcriptome.

...read moreread less

157 citations

Cites methods from "RNA-Seq: a revolutionary tool for t..."

...Here, we used deep sequencing technology (Nobuta et al. 2007; Wang et al. 2009; Li et al. 2010; Lu et al. 2010; Metzker 2010; Trapnell et al. 2010; Wang et al. 2010a; Kakumanu et al. 2012) to investigate the genomewide transcriptome reconfiguration of foxtail millet challenged by polyethylene glycol-simulated drought stress in a high-throughput manner....
[...]
...Here, we used deep sequencing technology (Nobuta et al. 2007; Wang et al. 2009; Li et al. 2010; Lu et al. 2010; Metzker 2010; Trapnell et al. 2010; Wang et al. 2010a; Kakumanu et al. 2012) to investigate the genomewide transcriptome reconfiguration of foxtail millet challenged by polyethylene…...
[...]

Journal Article•DOI•

A low-cost library construction protocol and data analysis pipeline for Illumina-based strand-specific multiplex RNA-seq.

[...]

Lin Wang¹, Yaqing Si², Lauren K. Dedow¹, Ying Shao³, Peng Liu², Thomas P. Brutnell¹ - Show less +2 more•Institutions (3)

Boyce Thompson Institute for Plant Research¹, Iowa State University², Cornell University³

19 Oct 2011-PLOS ONE

TL;DR: A low-cost and robust protocol to produce Illumina-compatible (GAIIx and HiSeq2000 platforms) RNA-seq libraries by combining several recent improvements is developed and significance tests for determining differential gene expression and intron retention events are applied.

...read moreread less

Abstract: The emergence of NextGen sequencing technology has generated much interest in the exploration of transcriptomes. Currently, Illumina Inc. (San Diego, CA) provides one of the most widely utilized sequencing platforms for gene expression analysis. While Illumina reagents and protocols perform adequately in RNA-sequencing (RNA-seq), alternative reagents and protocols promise a higher throughput at a much lower cost. We have developed a low-cost and robust protocol to produce Illumina-compatible (GAIIx and HiSeq2000 platforms) RNA-seq libraries by combining several recent improvements. First, we designed balanced adapter sequences for multiplexing of samples; second, dUTP incorporation in 2(nd) strand synthesis was used to enforce strand-specificity; third, we simplified RNA purification, fragmentation and library size-selection steps thus drastically reducing the time and increasing throughput of library construction; fourth, we included an RNA spike-in control for validation and normalization purposes. To streamline informatics analysis for the community, we established a pipeline within the iPlant Collaborative. These scripts are easily customized to meet specific research needs and improve on existing informatics and statistical treatments of RNA-seq data. In particular, we apply significance tests for determining differential gene expression and intron retention events. To demonstrate the potential of both the library-construction protocol and data-analysis pipeline, we characterized the transcriptome of the rice leaf. Our data supports novel gene models and can be used to improve current rice genome annotation. Additionally, using the rice transcriptome data, we compared different methods of calculating gene expression and discuss the advantages of a strand-specific approach to detect bona-fide anti-sense transcripts and to detect intron retention events. Our results demonstrate the potential of this low cost and robust method for RNA-seq library construction and data analysis.

...read moreread less

156 citations

Cites background from "RNA-Seq: a revolutionary tool for t..."

...This advancement in sequencing technology has led to new opportunities to explore global genomic and transcriptomic landscapes; such studies include wholegenome de novo/re-sequencing [3,4], bisulfite-sequencing [5,6], chromatin immuno-precipitation-sequencing (Chip-seq) [7,8], and RNA sequencing (RNA-seq) [9,10]....
[...]

Journal Article•DOI•

IVT-seq reveals extreme bias in RNA sequencing

[...]

Nicholas F. Lahens¹, Ibrahim Halil Kavakli², Ray Zhang¹, Katharina E. Hayer¹, Michael B Black, Hannah Dueck¹, Angel Pizarro³, Junhyong Kim¹, Rafael A. Irizarry⁴, Russell S. Thomas, Gregory R. Grant¹, John B. Hogenesch¹ - Show less +8 more•Institutions (4)

University of Pennsylvania¹, Koç University², Amazon.com³, Johns Hopkins University⁴

30 Jun 2014-Genome Biology

TL;DR: It is found rRNA depletion is responsible for substantial, unappreciated biases in coverage introduced during library preparation, which suggest exon-level expression analysis may be inadvisable, and the utility of IVT-seq for promoting better understanding of bias introduced by RNA-seq is shown.

...read moreread less

Abstract: Background: RNA-seq is a powerful technique for identifying and quantifying transcription and splicing events, both known and novel. However, given its recent development and the proliferation of library construction methods, understanding the bias it introduces is incomplete but critical to realizing its value. Results: We present a method, in vitro transcription sequencing (IVT-seq), for identifying and assessing the technical biases in RNA-seq library generation and sequencing at scale. We created a pool of over 1,000 in vitro transcribed RNAs from a full-length human cDNA library and sequenced them with polyA and total RNA-seq, the most common protocols. Because each cDNA is full length, and we show in vitro transcription is incredibly processive, each base in each transcript should be equivalently represented. However, with common RNA-seq applications and platforms, we find 50% of transcripts have more than two-fold and 10% have more than 10-fold differences in within-transcript sequence coverage. We also find greater than 6% of transcripts have regions of dramatically unpredictable sequencing coverage between samples, confounding accurate determination of their expression. We use a combination of experimental and computational approaches to show rRNA depletion is responsible for the most significant variability in coverage, and several sequence determinants also strongly influence representation. Conclusions: These results show the utility of IVT-seq for promoting better understanding of bias introduced by RNA-seq. We find rRNA depletion is responsible for substantial, unappreciated biases in coverage introduced during library preparation. These biases suggest exon-level expression analysis may be inadvisable, and we recommend caution when interpreting RNA-seq results.

...read moreread less

155 citations

Cites methods from "RNA-Seq: a revolutionary tool for t..."

...Using RNA-seq, not only can we perform traditional differential gene expression analysis with better resolution, we can now comprehensively study alternative splicing, RNA editing, allele specific expression, and identify novel transcripts, both coding and non-coding RNAs [1– 3]....
[...]
...We created a pool of > 1000 in vitro transcribed (IVT) RNAs from a full-length human cDNA library and sequenced them with poly-A and total RNA-seq, the most common protocols....
[...]
...Following a DNase I treatment to remove the DNA template and RNA purification, a pool of 1062 different human RNAs derived from fully sequenced plasmids was produced....
[...]
...We created a pool of in vitro transcribed RNAs from a collection of full length human cDNAs, followed by high-throughput sequencing (Figure 1)....
[...]
...When we examined the different libraries, we saw that fragments from all of the RNAseq data showed nucleotide frequencies characteristic of random priming bias (Additional file 6: Figure S6)....
[...]

Journal Article•DOI•

Chronic cocaine-regulated epigenomic changes in mouse nucleus accumbens

[...]

Jian Feng¹, Matthew Wilkinson¹, Xiaochuan Liu¹, Immanuel Purushothaman¹, Deveroux Ferguson¹, Vincent Vialou¹, Ian Maze¹, Ning-Yi Shao¹, Pamela J. Kennedy¹, Ja Wook Koo¹, Caroline Dias¹, Benjamin M. Laitman¹, Victoria Stockman¹, Quincey LaPlant¹, Michael E. Cahill¹, Eric J. Nestler¹, Li Shen¹ - Show less +13 more•Institutions (1)

Icahn School of Medicine at Mount Sinai¹

22 Apr 2014-Genome Biology

TL;DR: This delineation of the cocaine-induced epigenome in the nucleus accumbens reveals several novel modes of regulation by which cocaine alters the brain, and serves as a template for the analysis of other systems to reveal new transcriptional and epigenetic mechanisms of neuronal regulation.

...read moreread less

Abstract: Increasing evidence supports a role for altered gene expression in mediating the lasting effects of cocaine on the brain, and recent work has demonstrated the involvement of chromatin modifications in these alterations. However, all such studies to date have been restricted by their reliance on microarray technologies that have intrinsic limitations. We use next generation sequencing methods, RNA-seq and ChIP-seq for RNA polymerase II and several histone methylation marks, to obtain a more complete view of cocaine-induced changes in gene expression and associated adaptations in numerous modes of chromatin regulation in the mouse nucleus accumbens, a key brain reward region. We demonstrate an unexpectedly large number of pre-mRNA splicing alterations in response to repeated cocaine treatment. In addition, we identify combinations of chromatin changes, or signatures, that correlate with cocaine-dependent regulation of gene expression, including those involving pre-mRNA alternative splicing. Through bioinformatic prediction and biological validation, we identify one particular splicing factor, A2BP1(Rbfox1/Fox-1), which is enriched at genes that display certain chromatin signatures and contributes to drug-induced behavioral abnormalities. Together, this delineation of the cocaine-induced epigenome in the nucleus accumbens reveals several novel modes of regulation by which cocaine alters the brain. We establish combinatorial chromatin and transcriptional profiles in mouse nucleus accumbens after repeated cocaine treatment. These results serve as an important resource for the field and provide a template for the analysis of other systems to reveal new transcriptional and epigenetic mechanisms of neuronal regulation.

...read moreread less

155 citations

Cites background from "RNA-Seq: a revolutionary tool for t..."

...Third, genome-wide characterizations of gene expression in brain have to date relied mainly on microarrays, as opposed to RNA-seq, which provides unprecedented advantages such as more precise measurement of levels of transcripts and their splicing isoforms [16]....
[...]
...RNA-seq provides unique advantages for alternative splicing analysis [16]....
[...]

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
…
92
93
94
95
96
97
98
…
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Mapping and quantifying mammalian transcriptomes by RNA-Seq.

[...]

Ali Mortazavi¹, Brian A. Williams¹, Kenneth McCue¹, Lorian Schaeffer¹, Barbara J. Wold¹ - Show less +1 more•Institutions (1)

California Institute of Technology¹

29 Jun 2008-Nature Methods

TL;DR: Although >90% of uniquely mapped reads fell within known exons, the remaining data suggest new and revised gene models, including changed or additional promoters, exons and 3′ untranscribed regions, as well as new candidate microRNA precursors.

...read moreread less

Abstract: We have mapped and quantified mouse transcriptomes by deeply sequencing them and recording how frequently each gene is represented in the sequence sample (RNA-Seq). This provides a digital measure of the presence and prevalence of transcripts from known and previously unknown genes. We report reference measurements composed of 41–52 million mapped 25-base-pair reads for poly(A)-selected RNA from adult mouse brain, liver and skeletal muscle tissues. We used RNA standards to quantify transcript prevalence and to test the linear range of transcript detection, which spanned five orders of magnitude. Although >90% of uniquely mapped reads fell within known exons, the remaining data suggest new and revised gene models, including changed or additional promoters, exons and 3′ untranscribed regions, as well as new candidate microRNA precursors. RNA splice events, which are not readily measured by standard gene expression microarray or serial analysis of gene expression methods, were detected directly by mapping splice-crossing sequence reads. We observed 1.45 × 10 5 distinct splices, and alternative splices were prominent, with 3,500 different genes expressing one or more alternate internal splices. The mRNA population specifies a cell’s identity and helps to govern its present and future activities. This has made transcriptome analysis a general phenotyping method, with expression microarrays of many kinds in routine use. Here we explore the possibility that transcriptome analysis, transcript discovery and transcript refinement can be done effectively in large and complex mammalian genomes by ultra-high-throughput sequencing. Expression microarrays are currently the most widely used methodology for transcriptome analysis, although some limitations persist. These include hybridization and cross-hybridization artifacts 1–3 , dye-based detection issues and design constraints that preclude or seriously limit the detection of RNA splice patterns and previously unmapped genes. These issues have made it difficult for standard array designs to provide full sequence comprehensiveness (coverage of all possible genes, including unknown ones, in large genomes) or transcriptome comprehensiveness (reliable detection of all RNAs of all prevalence classes, including the least abundant ones that are physiologically relevant). Other

...read moreread less

12,293 citations

Patent•DOI•

Serial analysis of gene expression

[...]

Kenneth W. Kinzler¹, Victor Velculescu², Bert Vogelstein², Lin Zhang², ヴェルヴレスク，ヴィクター，イー．, ヴォゲルステイン，バート, キンズラー，ケネス，ダブリュ．, ツァン，リン - Show less +4 more•Institutions (2)

Johns Hopkins University¹, Howard Hughes Medical Institute²

04 Oct 2000-Science

TL;DR: Serial analysis of gene expression (SAGE) should provide a broadly applicable means for the quantitative cataloging and comparison of expressed genes in a variety of normal, developmental, and disease states.

...read moreread less

Abstract: PROBLEM TO BE SOLVED: To provide a method for preparing a short nucleotide sequence (tag) which is useful to identify a cDNA oligonucleotide and is derived from a restricted position in a mRNA or a cDNA. SOLUTION: This is the method of preparing a tag for identifying the cDNA oligonucleotide. The above method comprises preparing the cDNA oligonucleotide bearing 5' and 3' terminals, collecting cDNA fragments by cutting the cDNA oligonucleotide with a restriction enzyme at the first restriction endonuclease site, separating a cDNA oligonucleotide bearing 5' or 3' terminal and connecting an oligonucleotide linker to the isolated cDNA fragment bearing the cDNA oligonucleotide 5' or 3' terminal. Here, the oligonucleotide linker contains the recognition site of the second restriction endonuclease enzyme and the isolated cDNA fragment is cut with the second restriction endonuclease enzyme which cuts the cDNA fragment in a section separated from the recognition site to obtain the tag for identifying the cDNA oligonucleotide.

...read moreread less

4,437 citations

Journal Article•DOI•

Mapping short DNA sequencing reads and calling variants using mapping quality scores

[...]

Heng Li¹, Jue Ruan, Richard Durbin•Institutions (1)

Wellcome Trust Sanger Institute¹

01 Nov 2008-Genome Research

TL;DR: This work describes the software MAQ, software that can build assemblies by mapping shotgun short reads to a reference genome, using quality scores to derive genotype calls of the consensus sequence of a diploid genome, e.g., from a human sample.

...read moreread less

Abstract: New sequencing technologies promise a new era in the use of DNA sequence. However, some of these technologies produce very short reads, typically of a few tens of base pairs, and to use these reads effectively requires new algorithms and software. In particular, there is a major issue in efficiently aligning short reads to a reference genome and handling ambiguity or lack of accuracy in this alignment. Here we introduce the concept of mapping quality, a measure of the confidence that a read actually comes from the position it is aligned to by the mapping algorithm. We describe the software MAQ that can build assemblies by mapping shotgun short reads to a reference genome, using quality scores to derive genotype calls of the consensus sequence of a diploid genome, e.g., from a human sample. MAQ makes full use of mate-pair information and estimates the error probability of each read alignment. Error probabilities are also derived for the final genotype calls, using a Bayesian statistical model that incorporates the mapping qualities, error probabilities from the raw sequence quality scores, sampling of the two haplotypes, and an empirical model for correlated errors at a site. Both read mapping and genotype calling are evaluated on simulated data and real data. MAQ is accurate, efficient, versatile, and user-friendly. It is freely available at http://maq.sourceforge.net.

...read moreread less

2,927 citations

Journal Article•DOI•

RNA-seq: An assessment of technical reproducibility and comparison with gene expression arrays

[...]

John C. Marioni¹, Christopher E. Mason, Shrikant Mane, Matthew Stephens, Yoav Gilad - Show less +1 more•Institutions (1)

University of Chicago¹

01 Sep 2008-Genome Research

TL;DR: It is found that the Illumina sequencing data are highly replicable, with relatively little technical variation, and thus, for many purposes, it may suffice to sequence each mRNA sample only once (i.e., using one lane).

...read moreread less

Abstract: Ultra-high-throughput sequencing is emerging as an attractive alternative to microarrays for genotyping, analysis of methylation patterns, and identification of transcription factor binding sites. Here, we describe an application of the Illumina sequencing (formerly Solexa sequencing) platform to study mRNA expression levels. Our goals were to estimate technical variance associated with Illumina sequencing in this context and to compare its ability to identify differentially expressed genes with existing array technologies. To do so, we estimated gene expression differences between liver and kidney RNA samples using multiple sequencing replicates, and compared the sequencing data to results obtained from Affymetrix arrays using the same RNA samples. We find that the Illumina sequencing data are highly replicable, with relatively little technical variation, and thus, for many purposes, it may suffice to sequence each mRNA sample only once (i.e., using one lane). The information in a single lane of Illumina sequencing data appears comparable to that in a single array in enabling identification of differentially expressed genes, while allowing for additional analyses such as detection of low-expressed genes, alternative splice variants, and novel transcripts. Based on our observations, we propose an empirical protocol and a statistical framework for the analysis of gene expression using ultra-high-throughput sequencing technology.

...read moreread less

2,834 citations

Journal Article•DOI•

SOAP: short oligonucleotide alignment program

[...]

Ruiqiang Li¹, Yingrui Li², Karsten Kristiansen², Jun Wang²•Institutions (2)

Beijing Genomics Institute¹, University of Southern Denmark²

01 Mar 2008-Bioinformatics

TL;DR: The program SOAP is designed to handle the huge amounts of short reads generated by parallel sequencing using the new generation Illumina-Solexa sequencing technology, which supports multi-threaded parallel computing and has a batch module for multiple query sets.

...read moreread less

Abstract: Summary: We have developed a program SOAP for efficient gapped and ungapped alignment of short oligonucleotides onto reference sequences. The program is designed to handle the huge amounts of short reads generated by parallel sequencing using the new generation Illumina-Solexa sequencing technology. SOAP is compatible with numerous applications, including single-read or pair-end resequencing, small RNA discovery and mRNA tag sequence mapping. SOAP is a command-driven program, which supports multi-threaded parallel computing, and has a batch module for multiple query sets. Availability: http://soap.genomics.org.cn Contact: soap@genomics.org.cn

...read moreread less

2,729 citations