RNA-Seq: a revolutionary tool for transcriptomics

doi:10.1038/NRG2484

Home
/
Papers
/
RNA-Seq: a revolutionary tool for transcriptomics

Journal Article•DOI•

RNA-Seq: a revolutionary tool for transcriptomics

Zhong Wang¹, Mark Gerstein¹, Michael Snyder¹•Institutions (1)

Yale University¹

01 Jan 2009-Nature Reviews Genetics (Nature Publishing Group)-Vol. 10, Iss: 1, pp 57-63

TL;DR: The RNA-Seq approach to transcriptome profiling that uses deep-sequencing technologies provides a far more precise measurement of levels of transcripts and their isoforms than other methods.

read less

Abstract: RNA-Seq is a recently developed approach to transcriptome profiling that uses deep-sequencing technologies. Studies using this method have already altered our view of the extent and complexity of eukaryotic transcriptomes. RNA-Seq also provides a far more precise measurement of levels of transcripts and their isoforms than other methods. This article describes the RNA-Seq approach, the challenges associated with its application, and the advances made so far in characterizing several eukaryote transcriptomes.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Harnessing cloud computing with Galaxy Cloud.

[...]

Enis Afgan¹, Dannon Baker¹, Nate Coraor², Hiroki Goto², Ian M. Paul², Kateryna D. Makova², Anton Nekrutenko², James Taylor¹ - Show less +4 more•Institutions (2)

Emory University¹, Pennsylvania State University²

08 Nov 2011-Nature Biotechnology

TL;DR: Galaxy Cloud provides a solution that retains user control and privacy, makes complex analysis accessible and enables the use of practically limitless on-demand computing resources.

...read moreread less

Abstract: As next-generation sequencing becomes an indispensible tool for biomedical research, it is crucial to provide analysis solutions that are usable and cost effective for biomedical researchers. Galaxy Cloud addresses this by combining the accessible Galaxy interface with automated management of cloud computing resources. Unlike purpose-built solutions, Galaxy allows users either to use existing tested best practices in the form of workflows or to construct their own analyses for novel tasks. Galaxy Cloud instances are owned and controlled entirely by the user who created them and can be used effectively in secure private clouds. Thus, Galaxy Cloud provides a solution that retains user control and privacy, makes complex analysis accessible and enables the use of practically limitless on-demand computing resources.

...read moreread less

126 citations

Journal Article•DOI•

Evaluation of the coverage and depth of transcriptome by RNA-Seq in chickens

[...]

Ying Wang, Noushin Ghaffari¹, Charles D. Johnson¹, Ulisses Braga-Neto¹, Hui-Hui Wang², Rui-rui Chen², Huaijun Zhou - Show less +3 more•Institutions (2)

Texas A&M University¹, Baylor College of Medicine²

18 Oct 2011-BMC Bioinformatics

TL;DR: It is demonstrated that 30 M (75 bp) reads is sufficient to detect all annotated genes in chicken lungs, and RNA-Seq at this depth can serve as a replacement of microarray technology.

...read moreread less

Abstract: Background: RNA-Seq is the recently developed high-throughput sequencing technology for profiling the entire transcriptome in any organism. It has several major advantages over current hybridization-based approach such as microarrays. However, the cost per sample by RNA-Seq is still prohibitive for most laboratories. With continued improvement in sequence output, it would be cost-effective if multiple samples are multiplexed and sequenced in a single lane with sufficient transcriptome coverage. The objective of this analysis is to evaluate what sequencing depth might be sufficient to interrogate gene expression profiling in the chicken by RNA-Seq. Results: Two cDNA libraries from chicken lungs were sequenced initially, and 4.9 million (M) and 1.6 M (60 bp) reads were generated, respectively. With significant improvements in sequencing technology, two technical replicate cDNA libraries were re-sequenced. Totals of 29.6 M and 28.7 M (75 bp) reads were obtained with the two samples. More than 90% of annotated genes were detected in the data sets with 28.7-29.6 M reads, while only 68% of genes were detected in the data set with 1.6 M reads. The correlation coefficients of gene expression between technical replicates within the same sample were 0.9458 and 0.8442. To evaluate the appropriate depth needed for mRNA profiling, a random sampling method was used to generate different number of reads from each sample. There was a significant increase in correlation coefficients from a sequencing depth of 1.6 M to 10 M for all genes except highly abundant genes. No significant improvement was observed from the depth of 10 M to 20 M (75 bp) reads. Conclusion: The analysis from the current study demonstrated that 30 M (75 bp) reads is sufficient to detect all annotated genes in chicken lungs. Ten million (75 bp) reads could detect about 80% of annotated chicken genes, and RNA-Seq at this depth can serve as a replacement of microarray technology. Furthermore, the depth of sequencing had a significant impact on measuring gene expression of low abundant genes. Finally, the combination of experimental and simulation approaches is a powerful approach to address the relationship between the depth of sequencing and transcriptome coverage.

...read moreread less

126 citations

Reference Entry•DOI•

Ribosomal RNA depletion for efficient use of RNA-seq capacity.

[...]

Dominic O'Neil, Heike Glowatz, Martin Schlumpberger

01 Jul 2013-Current protocols in molecular biology

TL;DR: This unit describes an rRNA depletion method based on selective hybridization of oligonucleotides to rRNA, recognition with a hybrid-specific antibody, and removal of the antibody-hybrid complex on magnetic beads.

...read moreread less

Abstract: Ribosomal RNA (rRNA) is the most highly abundant component of RNA, comprising the majority (>80% to 90%) of the molecules present in a total RNA sample. Depletion of this rRNA fraction is desirable prior to performing an RNA-seq reaction, so that sequencing capacity can be focused on more informative parts of the transcriptome. This unit describes an rRNA depletion method based on selective hybridization of oligonucleotides to rRNA, recognition with a hybrid-specific antibody, and removal of the antibody-hybrid complex on magnetic beads.

...read moreread less

126 citations

Cites background from "RNA-Seq: a revolutionary tool for t..."

...…of the transcriptome by sequencing cDNA (RNA-seq; UNIT 4.11) is of great interest, as it provides vast amounts of data on the transcriptome, allowing single-nucleotide resolution of transcript sequences, detection of novel RNAs, and excellent dynamic range for gene expression (Wang et al., 2009)....
[...]

Journal Article•DOI•

Comprehensive multi-center assessment of small RNA-seq methods for quantitative miRNA profiling.

[...]

María Dolores Giráldez¹, Ryan M. Spengler¹, Alton Etheridge², Paula M. Godoy³, Andrea J. Barczak³, Srimeenakshi Srinivasan⁴, Peter De Hoff⁴, Kahraman Tanriverdi⁵, Amanda Courtright⁶, Shulin Lu⁷, Joseph A. Khoory⁷, Renee Rubio⁸, David Baxter⁹, Tom A. P. Driedonks¹⁰, Henk P. J. Buermans¹¹, Esther N. M. Nolte-‘t Hoen¹⁰, Hui Jiang¹, Kai Wang⁹, Ionita Ghiran⁷, Yaoyu E. Wang⁸, Kendall Van Keuren-Jensen⁶, Jane E. Freedman⁵, Prescott G. Woodruff³, Louise C. Laurent⁴, David J. Erle³, David J. Galas², Muneesh Tewari - Show less +23 more•Institutions (11)

University of Michigan¹, Pacific Northwest Diabetes Research Institute², University of California, San Francisco³, University of California, San Diego⁴, University of Massachusetts Medical School⁵, Translational Genomics Research Institute⁶, Beth Israel Deaconess Medical Center⁷, Harvard University⁸, Institute for Systems Biology⁹, Utrecht University¹⁰, Leiden University Medical Center¹¹

16 Jul 2018-Nature Biotechnology

TL;DR: Results obtained by a consortium of nine labs that independently sequenced reference, 'ground truth' samples of synthetic small RNAs and human plasma-derived RNA found that microRNA relative quantification between samples using small RNA-seq was accurate and reproducible across laboratories and methods.

...read moreread less

Abstract: RNA-seq is increasingly used for quantitative profiling of small RNAs (for example, microRNAs, piRNAs and snoRNAs) in diverse sample types, including isolated cells, tissues and cell-free biofluids. The accuracy and reproducibility of the currently used small RNA-seq library preparation methods have not been systematically tested. Here we report results obtained by a consortium of nine labs that independently sequenced reference, 'ground truth' samples of synthetic small RNAs and human plasma-derived RNA. We assessed three commercially available library preparation methods that use adapters of defined sequence and six methods using adapters with degenerate bases. Both protocol- and sequence-specific biases were identified, including biases that reduced the ability of small RNA-seq to accurately measure adenosine-to-inosine editing in microRNAs. We found that these biases were mitigated by library preparation methods that incorporate adapters with degenerate bases. MicroRNA relative quantification between samples using small RNA-seq was accurate and reproducible across laboratories and methods.

...read moreread less

126 citations

Journal Article•DOI•

Cooperation, conflict, and the evolution of queen pheromones

[...]

Sarah D. Kocher¹, Sarah D. Kocher², Christina M. Grozinger²•Institutions (2)

Harvard University¹, Pennsylvania State University²

15 Nov 2011-Journal of Chemical Ecology

TL;DR: Overall, these studies suggest that queen-worker pheromone communication is a multi-component, labile dialog between the castes, rather than a simple, fixed signal-response system.

...read moreread less

Abstract: While chemical communication regulates individual behavior in a wide variety of species, these communication systems are most elaborated in insect societies. In these complex systems, pheromones produced by the reproductive individuals (queens) are critical in establishing and maintaining dominant reproductive status over hundreds to thousands of workers. The proximate and ultimate mechanisms by which these intricate pheromone communication systems evolved are largely unknown, though there has been much debate over whether queen pheromones function as a control mechanism or as an honest signal facilitating cooperation. Here, we summarize results from recent studies in honey bees, bumble bees, wasps, ants and termites. We further discuss evolutionary mechanisms by which queen pheromone communication systems may have evolved. Overall, these studies suggest that queen-worker pheromone communication is a multi-component, labile dialog between the castes, rather than a simple, fixed signal-response system. We also discuss future approaches that can shed light on the proximate and ultimate mechanisms that underlie these complex systems by focusing on the development of increasingly sophisticated genomic tools and their potential applications to examine the molecular mechanisms that regulate pheromone production and perception.

...read moreread less

125 citations

Cites background from "RNA-Seq: a revolutionary tool for t..."

...In such cases, finding candidate genes can be accelerated by monitoring expression patterns associated with a behavioral or physiological state using either microarray technology (Gibson 2003) or RNA-sequencing (RNA-seq; Wang et al. 2009)....
[...]

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
…
126
127
128
129
130
131
132
…
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Mapping and quantifying mammalian transcriptomes by RNA-Seq.

[...]

Ali Mortazavi¹, Brian A. Williams¹, Kenneth McCue¹, Lorian Schaeffer¹, Barbara J. Wold¹ - Show less +1 more•Institutions (1)

California Institute of Technology¹

29 Jun 2008-Nature Methods

TL;DR: Although >90% of uniquely mapped reads fell within known exons, the remaining data suggest new and revised gene models, including changed or additional promoters, exons and 3′ untranscribed regions, as well as new candidate microRNA precursors.

...read moreread less

Abstract: We have mapped and quantified mouse transcriptomes by deeply sequencing them and recording how frequently each gene is represented in the sequence sample (RNA-Seq). This provides a digital measure of the presence and prevalence of transcripts from known and previously unknown genes. We report reference measurements composed of 41–52 million mapped 25-base-pair reads for poly(A)-selected RNA from adult mouse brain, liver and skeletal muscle tissues. We used RNA standards to quantify transcript prevalence and to test the linear range of transcript detection, which spanned five orders of magnitude. Although >90% of uniquely mapped reads fell within known exons, the remaining data suggest new and revised gene models, including changed or additional promoters, exons and 3′ untranscribed regions, as well as new candidate microRNA precursors. RNA splice events, which are not readily measured by standard gene expression microarray or serial analysis of gene expression methods, were detected directly by mapping splice-crossing sequence reads. We observed 1.45 × 10 5 distinct splices, and alternative splices were prominent, with 3,500 different genes expressing one or more alternate internal splices. The mRNA population specifies a cell’s identity and helps to govern its present and future activities. This has made transcriptome analysis a general phenotyping method, with expression microarrays of many kinds in routine use. Here we explore the possibility that transcriptome analysis, transcript discovery and transcript refinement can be done effectively in large and complex mammalian genomes by ultra-high-throughput sequencing. Expression microarrays are currently the most widely used methodology for transcriptome analysis, although some limitations persist. These include hybridization and cross-hybridization artifacts 1–3 , dye-based detection issues and design constraints that preclude or seriously limit the detection of RNA splice patterns and previously unmapped genes. These issues have made it difficult for standard array designs to provide full sequence comprehensiveness (coverage of all possible genes, including unknown ones, in large genomes) or transcriptome comprehensiveness (reliable detection of all RNAs of all prevalence classes, including the least abundant ones that are physiologically relevant). Other

...read moreread less

12,293 citations

Patent•DOI•

Serial analysis of gene expression

[...]

Kenneth W. Kinzler¹, Victor Velculescu², Bert Vogelstein², Lin Zhang², ヴェルヴレスク，ヴィクター，イー．, ヴォゲルステイン，バート, キンズラー，ケネス，ダブリュ．, ツァン，リン - Show less +4 more•Institutions (2)

Johns Hopkins University¹, Howard Hughes Medical Institute²

04 Oct 2000-Science

TL;DR: Serial analysis of gene expression (SAGE) should provide a broadly applicable means for the quantitative cataloging and comparison of expressed genes in a variety of normal, developmental, and disease states.

...read moreread less

Abstract: PROBLEM TO BE SOLVED: To provide a method for preparing a short nucleotide sequence (tag) which is useful to identify a cDNA oligonucleotide and is derived from a restricted position in a mRNA or a cDNA. SOLUTION: This is the method of preparing a tag for identifying the cDNA oligonucleotide. The above method comprises preparing the cDNA oligonucleotide bearing 5' and 3' terminals, collecting cDNA fragments by cutting the cDNA oligonucleotide with a restriction enzyme at the first restriction endonuclease site, separating a cDNA oligonucleotide bearing 5' or 3' terminal and connecting an oligonucleotide linker to the isolated cDNA fragment bearing the cDNA oligonucleotide 5' or 3' terminal. Here, the oligonucleotide linker contains the recognition site of the second restriction endonuclease enzyme and the isolated cDNA fragment is cut with the second restriction endonuclease enzyme which cuts the cDNA fragment in a section separated from the recognition site to obtain the tag for identifying the cDNA oligonucleotide.

...read moreread less

4,437 citations

Journal Article•DOI•

Mapping short DNA sequencing reads and calling variants using mapping quality scores

[...]

Heng Li¹, Jue Ruan, Richard Durbin•Institutions (1)

Wellcome Trust Sanger Institute¹

01 Nov 2008-Genome Research

TL;DR: This work describes the software MAQ, software that can build assemblies by mapping shotgun short reads to a reference genome, using quality scores to derive genotype calls of the consensus sequence of a diploid genome, e.g., from a human sample.

...read moreread less

Abstract: New sequencing technologies promise a new era in the use of DNA sequence. However, some of these technologies produce very short reads, typically of a few tens of base pairs, and to use these reads effectively requires new algorithms and software. In particular, there is a major issue in efficiently aligning short reads to a reference genome and handling ambiguity or lack of accuracy in this alignment. Here we introduce the concept of mapping quality, a measure of the confidence that a read actually comes from the position it is aligned to by the mapping algorithm. We describe the software MAQ that can build assemblies by mapping shotgun short reads to a reference genome, using quality scores to derive genotype calls of the consensus sequence of a diploid genome, e.g., from a human sample. MAQ makes full use of mate-pair information and estimates the error probability of each read alignment. Error probabilities are also derived for the final genotype calls, using a Bayesian statistical model that incorporates the mapping qualities, error probabilities from the raw sequence quality scores, sampling of the two haplotypes, and an empirical model for correlated errors at a site. Both read mapping and genotype calling are evaluated on simulated data and real data. MAQ is accurate, efficient, versatile, and user-friendly. It is freely available at http://maq.sourceforge.net.

...read moreread less

2,927 citations

Journal Article•DOI•

RNA-seq: An assessment of technical reproducibility and comparison with gene expression arrays

[...]

John C. Marioni¹, Christopher E. Mason, Shrikant Mane, Matthew Stephens, Yoav Gilad - Show less +1 more•Institutions (1)

University of Chicago¹

01 Sep 2008-Genome Research

TL;DR: It is found that the Illumina sequencing data are highly replicable, with relatively little technical variation, and thus, for many purposes, it may suffice to sequence each mRNA sample only once (i.e., using one lane).

...read moreread less

Abstract: Ultra-high-throughput sequencing is emerging as an attractive alternative to microarrays for genotyping, analysis of methylation patterns, and identification of transcription factor binding sites. Here, we describe an application of the Illumina sequencing (formerly Solexa sequencing) platform to study mRNA expression levels. Our goals were to estimate technical variance associated with Illumina sequencing in this context and to compare its ability to identify differentially expressed genes with existing array technologies. To do so, we estimated gene expression differences between liver and kidney RNA samples using multiple sequencing replicates, and compared the sequencing data to results obtained from Affymetrix arrays using the same RNA samples. We find that the Illumina sequencing data are highly replicable, with relatively little technical variation, and thus, for many purposes, it may suffice to sequence each mRNA sample only once (i.e., using one lane). The information in a single lane of Illumina sequencing data appears comparable to that in a single array in enabling identification of differentially expressed genes, while allowing for additional analyses such as detection of low-expressed genes, alternative splice variants, and novel transcripts. Based on our observations, we propose an empirical protocol and a statistical framework for the analysis of gene expression using ultra-high-throughput sequencing technology.

...read moreread less

2,834 citations

Journal Article•DOI•

SOAP: short oligonucleotide alignment program

[...]

Ruiqiang Li¹, Yingrui Li², Karsten Kristiansen², Jun Wang²•Institutions (2)

Beijing Genomics Institute¹, University of Southern Denmark²

01 Mar 2008-Bioinformatics

TL;DR: The program SOAP is designed to handle the huge amounts of short reads generated by parallel sequencing using the new generation Illumina-Solexa sequencing technology, which supports multi-threaded parallel computing and has a batch module for multiple query sets.

...read moreread less

Abstract: Summary: We have developed a program SOAP for efficient gapped and ungapped alignment of short oligonucleotides onto reference sequences. The program is designed to handle the huge amounts of short reads generated by parallel sequencing using the new generation Illumina-Solexa sequencing technology. SOAP is compatible with numerous applications, including single-read or pair-end resequencing, small RNA discovery and mRNA tag sequence mapping. SOAP is a command-driven program, which supports multi-threaded parallel computing, and has a batch module for multiple query sets. Availability: http://soap.genomics.org.cn Contact: soap@genomics.org.cn

...read moreread less

2,729 citations