RNA-Seq: a revolutionary tool for transcriptomics

doi:10.1038/NRG2484

Home
/
Papers
/
RNA-Seq: a revolutionary tool for transcriptomics

Journal Article•DOI•

RNA-Seq: a revolutionary tool for transcriptomics

Zhong Wang¹, Mark Gerstein¹, Michael Snyder¹•Institutions (1)

Yale University¹

01 Jan 2009-Nature Reviews Genetics (Nature Publishing Group)-Vol. 10, Iss: 1, pp 57-63

TL;DR: The RNA-Seq approach to transcriptome profiling that uses deep-sequencing technologies provides a far more precise measurement of levels of transcripts and their isoforms than other methods.

read less

Abstract: RNA-Seq is a recently developed approach to transcriptome profiling that uses deep-sequencing technologies. Studies using this method have already altered our view of the extent and complexity of eukaryotic transcriptomes. RNA-Seq also provides a far more precise measurement of levels of transcripts and their isoforms than other methods. This article describes the RNA-Seq approach, the challenges associated with its application, and the advances made so far in characterizing several eukaryote transcriptomes.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

A systems-level approach for metabolic engineering of yeast cell factories

[...]

Il-Kwon Kim¹, António Roldão¹, Verena Siewers¹, Jens Nielsen¹•Institutions (1)

Chalmers University of Technology¹

01 Mar 2012-Fems Yeast Research

TL;DR: Examples on how systems and synthetic biology brought yeast metabolic engineering closer to industrial biotechnology are described in this review, and these examples should demonstrate the potential of a systems-level approach for fast and efficient generation of yeast cell factories.

...read moreread less

Abstract: The generation of novel yeast cell factories for production of high-value industrial biotechnological products relies on three metabolic engineering principles: design, construction, and analysis. In the last two decades, strong efforts have been put on developing faster and more efficient strategies and/or technologies for each one of these principles. For design and construction, three major strategies are described in this review: (1) rational metabolic engineering; (2) inverse metabolic engineering; and (3) evolutionary strategies. Independent of the selected strategy, the process of designing yeast strains involves five decision points: (1) choice of product, (2) choice of chassis, (3) identification of target genes, (4) regulating the expression level of target genes, and (5) network balancing of the target genes. At the construction level, several molecular biology tools have been developed through the concept of synthetic biology and applied for the generation of novel, engineered yeast strains. For comprehensive and quantitative analysis of constructed strains, systems biology tools are commonly used and using a multi-omics approach. Key information about the biological system can be revealed, for example, identification of genetic regulatory mechanisms and competitive pathways, thereby assisting the in silico design of metabolic engineering strategies for improving strain performance. Examples on how systems and synthetic biology brought yeast metabolic engineering closer to industrial biotechnology are described in this review, and these examples should demonstrate the potential of a systems-level approach for fast and efficient generation of yeast cell factories.

...read moreread less

119 citations

Cites methods from "RNA-Seq: a revolutionary tool for t..."

...RNA deep sequencing (RNA-seq) techniques such as Illumina and SOLiD sequencing are commonly used techniques for high-throughput transcriptome analysis (further details on these and other RNA sequencing techniques are reviewed in Wang et al., 2009)....
[...]

Journal Article•DOI•

Comparative genomics based on massive parallel transcriptome sequencing reveals patterns of substitution and selection across 10 bird species

[...]

Axel Künstner¹, Jochen B. W. Wolf¹, Niclas Backström¹, Osceola Whitney², Christopher N. Balakrishnan³, Lainy B. Day⁴, Scott V. Edwards⁵, Daniel E. Janes⁵, Barney A. Schlinger⁶, Richard K. Wilson⁷, Erich D. Jarvis², Wesley C. Warren⁷, Hans Ellegren¹ - Show less +9 more•Institutions (7)

Uppsala University¹, Duke University², University of Illinois at Urbana–Champaign³, University of Mississippi⁴, Harvard University⁵, University of California, Los Angeles⁶, Washington University in St. Louis⁷

01 Mar 2010-Molecular Ecology

TL;DR: Overall, this study demonstrates the usefulness of next‐generation sequencing for obtaining genomic resources for comparative genomic analysis of non‐model organisms.

...read moreread less

Abstract: Next-generation sequencing technology provides an attractive means to obtain large-scale sequence data necessary for comparative genomic analysis. To analyse the patterns of mutation rate variation and selection intensity across the avian genome, we performed brain transcriptome sequencing using Roche 454 technology of 10 different non-model avian species. Contigs from de novo assemblies were aligned to the two available avian reference genomes, chicken and zebra finch. In total, we identified 6499 different genes across all 10 species, with approximately 1000 genes found in each full run per species. We found evidence for a higher mutation rate of the Z chromosome than of autosomes (male-biased mutation) and a negative correlation between the neutral substitution rate (d(S)) and chromosome size. Analyses of the mean d(N)/d(S) ratio (omega) of genes across chromosomes supported the Hill-Robertson effect (the effect of selection at linked loci) and point at stochastic problems with omega as an independent measure of selection. Overall, this study demonstrates the usefulness of next-generation sequencing for obtaining genomic resources for comparative genomic analysis of non-model organisms.

...read moreread less

118 citations

Cites background from "RNA-Seq: a revolutionary tool for t..."

...Importantly, the generation of large amounts of DNA sequence data from related species will allow comparative genomic approaches for the identification of trait loci, and this is particularly so with transcriptome sequencing (‘RNA-seq’; Wang et al. 2009)....
[...]

Journal Article•DOI•

RNA editing: a driving force for adaptive evolution?

[...]

Willemijn Maria Gommans¹, Sean P. Mullen¹, Stefan Maas¹•Institutions (1)

Lehigh University¹

01 Oct 2009-BioEssays

TL;DR: It is proposed that higher organisms have evolved to systems with increasing RNA editing activity and, as a result, to more complex systems.

...read moreread less

Abstract: Genetic variability is considered a key to the evolvability of species. The conversion of an adenosine (A) to inosine (I) in primary RNA transcripts can result in an amino acid change in the encoded protein, a change in secondary structure of the RNA, creation or destruction of a splice consensus site, or otherwise alter RNA fate. Substantial transcriptome and proteome variability is generated by A-to-I RNA editing through site-selective post-transcriptional recoding of single nucleotides. We posit that this epigenetic source of phenotypic variation is an unrecognized mechanism of adaptive evolution. The genetic variation introduced through editing occurs at low evolutionary cost since predominant production of the wild-type protein is retained. This property even allows exploration of sequence space that is inaccessible through mutation, leading to increased phenotypic plasticity and provides an evolutionary advantage for acclimatization as well as long-term adaptation. Furthermore, continuous probing for novel RNA editing sites throughout the transcriptome is an intrinsic property of the editing machinery and represents the molecular basis for increased adaptability. We propose that higher organisms have therefore evolved to systems with increasing RNA editing activity and, as a result, to more complex systems.

...read moreread less

118 citations

Cites background or methods from "RNA-Seq: a revolutionary tool for t..."

...The 454 direct sequencing approach is also characterized by a smaller intrinsic error rate than conventional Sanger-based sequencing technology.((66,67))...
[...]
...With a single sequencing run, several hundred gene-specific PCR amplicons together with several hundred genomic control fragments can be analyzed obtaining coverage of about 1,000 reads per cDNA fragment.((66,67)) This would allow for the detection of editing events with a sub-percentage penetrance....
[...]
...For example, the 454 KS GLS platform (Roche) with average read lengths of 250 bp is well suited for this purpose.((66,67)) With a single sequencing run, several hundred gene-specific PCR amplicons together with several hundred genomic control fragments can be analyzed obtaining coverage of about 1,000 reads per cDNA fragment....
[...]

Journal Article•DOI•

RNA-Seq quantification of the human small airway epithelium transcriptome

[...]

Neil R. Hackett¹, Marcus W. Butler¹, Renat Shaykhiev¹, Jacqueline Salit¹, Larsson Omberg¹, Juan L. Rodriguez-Flores¹, Jason G. Mezey¹, Yael Strulovici-Barel¹, Guoqing Wang¹, Lukas Didon¹, Ronald G. Crystal¹ - Show less +7 more•Institutions (1)

Cornell University¹

29 Feb 2012-BMC Genomics

TL;DR: Quantification of the absolute smoking-induced changes in SAE gene expression revealed that, compared to ubiquitous genes, more SAE-enriched genes responded to smoking with up-regulation, and those with the highest basal expression levels showed most dramatic changes.

...read moreread less

Abstract: The small airway epithelium (SAE), the cell population that covers the human airway surface from the 6th generation of airway branching to the alveoli, is the major site of lung disease caused by smoking. The focus of this study is to provide quantitative assessment of the SAE transcriptome in the resting state and in response to chronic cigarette smoking using massive parallel mRNA sequencing (RNA-Seq). The data demonstrate that 48% of SAE expressed genes are ubiquitous, shared with many tissues, with 52% enriched in this cell population. The most highly expressed gene, SCGB1A1, is characteristic of Clara cells, the cell type unique to the human SAE. Among other genes expressed by the SAE are those related to Clara cell differentiation, secretory mucosal defense, and mucociliary differentiation. The high sensitivity of RNA-Seq permitted quantification of gene expression related to infrequent cell populations such as neuroendocrine cells and epithelial stem/progenitor cells. Quantification of the absolute smoking-induced changes in SAE gene expression revealed that, compared to ubiquitous genes, more SAE-enriched genes responded to smoking with up-regulation, and those with the highest basal expression levels showed most dramatic changes. Smoking had no effect on SAE gene splicing, but was associated with a shift in molecular pattern from Clara cell-associated towards the mucus-secreting cell differentiation pathway with multiple features of cancer-associated molecular phenotype. These observations provide insights into the unique biology of human SAE by providing quantit-ative assessment of the global transcriptome under physiological conditions and in response to the stress of chronic cigarette smoking.

...read moreread less

118 citations

Cites background from "RNA-Seq: a revolutionary tool for t..."

...The advent of RNA-Seq technology, in which the entire polyadenylated transcriptome is sequenced [19-24], is capable of building on this microarray data to provide additional insights into the transcriptome of the airway epithelium and its response to cigarette smoke....
[...]
...The development of massive parallel RNA sequencing (RNA-Seq) technology permits quantitative assessment of poly(A) mRNA levels to a high degree of sensitivity [19-24]....
[...]
...Because RNA-Seq provides direct sequencing information of all polyadenylated mRNAs and is not limited by probe design, RNA-Seq data has inherently less noise and higher specificity, and, importantly, provides quantitative information on mRNA transcript number [19]....
[...]

Journal Article•DOI•

Characterization of transcriptome dynamics during watermelon fruit development: sequencing, assembly, annotation and gene expression profiles

[...]

Shaogui Guo¹, Jingan Liu, Yi Zheng¹, Mingyun Huang¹, Haiying Zhang, Guoyi Gong, Hongju He, Yi Ren, Silin Zhong¹, Zhangjun Fei², Zhangjun Fei¹, Yong Xu - Show less +8 more•Institutions (2)

Boyce Thompson Institute for Plant Research¹, Ithaca College²

21 Sep 2011-BMC Genomics

TL;DR: A large collection of watermelon ESTs is generated, which represents a significant expansion of the current transcript catalog ofWatermelon and a valuable resource for future studies on the genomics of watermelons and other closely-related species.

...read moreread less

Abstract: Cultivated watermelon [Citrullus lanatus (Thunb.) Matsum. & Nakai var. lanatus] is an important agriculture crop world-wide. The fruit of watermelon undergoes distinct stages of development with dramatic changes in its size, color, sweetness, texture and aroma. In order to better understand the genetic and molecular basis of these changes and significantly expand the watermelon transcript catalog, we have selected four critical stages of watermelon fruit development and used Roche/454 next-generation sequencing technology to generate a large expressed sequence tag (EST) dataset and a comprehensive transcriptome profile for watermelon fruit flesh tissues. We performed half Roche/454 GS-FLX run for each of the four watermelon fruit developmental stages (immature white, white-pink flesh, red flesh and over-ripe) and obtained 577,023 high quality ESTs with an average length of 302.8 bp. De novo assembly of these ESTs together with 11,786 watermelon ESTs collected from GenBank produced 75,068 unigenes with a total length of approximately 31.8 Mb. Overall 54.9% of the unigenes showed significant similarities to known sequences in GenBank non-redundant (nr) protein database and around two-thirds of them matched proteins of cucumber, the most closely-related species with a sequenced genome. The unigenes were further assigned with gene ontology (GO) terms and mapped to biochemical pathways. More than 5,000 SSRs were identified from the EST collection. Furthermore we carried out digital gene expression analysis of these ESTs and identified 3,023 genes that were differentially expressed during watermelon fruit development and ripening, which provided novel insights into watermelon fruit biology and a comprehensive resource of candidate genes for future functional analysis. We then generated profiles of several interesting metabolites that are important to fruit quality including pigmentation and sweetness. Integrative analysis of metabolite and digital gene expression profiles helped elucidating molecular mechanisms governing these important quality-related traits during watermelon fruit development. We have generated a large collection of watermelon ESTs, which represents a significant expansion of the current transcript catalog of watermelon and a valuable resource for future studies on the genomics of watermelon and other closely-related species. Digital expression analysis of this EST collection allowed us to identify a large set of genes that were differentially expressed during watermelon fruit development and ripening, which provide a rich source of candidates for future functional analysis and represent a valuable increase in our knowledge base of watermelon fruit biology.

...read moreread less

118 citations

Cites methods from "RNA-Seq: a revolutionary tool for t..."

...Digital expression profiling (or RNA-seq) is a powerful and efficient approach for large-scale gene expression analysis [23]....
[...]

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
…
134
135
136
137
138
139
140
…
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Mapping and quantifying mammalian transcriptomes by RNA-Seq.

[...]

Ali Mortazavi¹, Brian A. Williams¹, Kenneth McCue¹, Lorian Schaeffer¹, Barbara J. Wold¹ - Show less +1 more•Institutions (1)

California Institute of Technology¹

29 Jun 2008-Nature Methods

TL;DR: Although >90% of uniquely mapped reads fell within known exons, the remaining data suggest new and revised gene models, including changed or additional promoters, exons and 3′ untranscribed regions, as well as new candidate microRNA precursors.

...read moreread less

Abstract: We have mapped and quantified mouse transcriptomes by deeply sequencing them and recording how frequently each gene is represented in the sequence sample (RNA-Seq). This provides a digital measure of the presence and prevalence of transcripts from known and previously unknown genes. We report reference measurements composed of 41–52 million mapped 25-base-pair reads for poly(A)-selected RNA from adult mouse brain, liver and skeletal muscle tissues. We used RNA standards to quantify transcript prevalence and to test the linear range of transcript detection, which spanned five orders of magnitude. Although >90% of uniquely mapped reads fell within known exons, the remaining data suggest new and revised gene models, including changed or additional promoters, exons and 3′ untranscribed regions, as well as new candidate microRNA precursors. RNA splice events, which are not readily measured by standard gene expression microarray or serial analysis of gene expression methods, were detected directly by mapping splice-crossing sequence reads. We observed 1.45 × 10 5 distinct splices, and alternative splices were prominent, with 3,500 different genes expressing one or more alternate internal splices. The mRNA population specifies a cell’s identity and helps to govern its present and future activities. This has made transcriptome analysis a general phenotyping method, with expression microarrays of many kinds in routine use. Here we explore the possibility that transcriptome analysis, transcript discovery and transcript refinement can be done effectively in large and complex mammalian genomes by ultra-high-throughput sequencing. Expression microarrays are currently the most widely used methodology for transcriptome analysis, although some limitations persist. These include hybridization and cross-hybridization artifacts 1–3 , dye-based detection issues and design constraints that preclude or seriously limit the detection of RNA splice patterns and previously unmapped genes. These issues have made it difficult for standard array designs to provide full sequence comprehensiveness (coverage of all possible genes, including unknown ones, in large genomes) or transcriptome comprehensiveness (reliable detection of all RNAs of all prevalence classes, including the least abundant ones that are physiologically relevant). Other

...read moreread less

12,293 citations

Patent•DOI•

Serial analysis of gene expression

[...]

Kenneth W. Kinzler¹, Victor Velculescu², Bert Vogelstein², Lin Zhang², ヴェルヴレスク，ヴィクター，イー．, ヴォゲルステイン，バート, キンズラー，ケネス，ダブリュ．, ツァン，リン - Show less +4 more•Institutions (2)

Johns Hopkins University¹, Howard Hughes Medical Institute²

04 Oct 2000-Science

TL;DR: Serial analysis of gene expression (SAGE) should provide a broadly applicable means for the quantitative cataloging and comparison of expressed genes in a variety of normal, developmental, and disease states.

...read moreread less

Abstract: PROBLEM TO BE SOLVED: To provide a method for preparing a short nucleotide sequence (tag) which is useful to identify a cDNA oligonucleotide and is derived from a restricted position in a mRNA or a cDNA. SOLUTION: This is the method of preparing a tag for identifying the cDNA oligonucleotide. The above method comprises preparing the cDNA oligonucleotide bearing 5' and 3' terminals, collecting cDNA fragments by cutting the cDNA oligonucleotide with a restriction enzyme at the first restriction endonuclease site, separating a cDNA oligonucleotide bearing 5' or 3' terminal and connecting an oligonucleotide linker to the isolated cDNA fragment bearing the cDNA oligonucleotide 5' or 3' terminal. Here, the oligonucleotide linker contains the recognition site of the second restriction endonuclease enzyme and the isolated cDNA fragment is cut with the second restriction endonuclease enzyme which cuts the cDNA fragment in a section separated from the recognition site to obtain the tag for identifying the cDNA oligonucleotide.

...read moreread less

4,437 citations

Journal Article•DOI•

Mapping short DNA sequencing reads and calling variants using mapping quality scores

[...]

Heng Li¹, Jue Ruan, Richard Durbin•Institutions (1)

Wellcome Trust Sanger Institute¹

01 Nov 2008-Genome Research

TL;DR: This work describes the software MAQ, software that can build assemblies by mapping shotgun short reads to a reference genome, using quality scores to derive genotype calls of the consensus sequence of a diploid genome, e.g., from a human sample.

...read moreread less

Abstract: New sequencing technologies promise a new era in the use of DNA sequence. However, some of these technologies produce very short reads, typically of a few tens of base pairs, and to use these reads effectively requires new algorithms and software. In particular, there is a major issue in efficiently aligning short reads to a reference genome and handling ambiguity or lack of accuracy in this alignment. Here we introduce the concept of mapping quality, a measure of the confidence that a read actually comes from the position it is aligned to by the mapping algorithm. We describe the software MAQ that can build assemblies by mapping shotgun short reads to a reference genome, using quality scores to derive genotype calls of the consensus sequence of a diploid genome, e.g., from a human sample. MAQ makes full use of mate-pair information and estimates the error probability of each read alignment. Error probabilities are also derived for the final genotype calls, using a Bayesian statistical model that incorporates the mapping qualities, error probabilities from the raw sequence quality scores, sampling of the two haplotypes, and an empirical model for correlated errors at a site. Both read mapping and genotype calling are evaluated on simulated data and real data. MAQ is accurate, efficient, versatile, and user-friendly. It is freely available at http://maq.sourceforge.net.

...read moreread less

2,927 citations

Journal Article•DOI•

RNA-seq: An assessment of technical reproducibility and comparison with gene expression arrays

[...]

John C. Marioni¹, Christopher E. Mason, Shrikant Mane, Matthew Stephens, Yoav Gilad - Show less +1 more•Institutions (1)

University of Chicago¹

01 Sep 2008-Genome Research

TL;DR: It is found that the Illumina sequencing data are highly replicable, with relatively little technical variation, and thus, for many purposes, it may suffice to sequence each mRNA sample only once (i.e., using one lane).

...read moreread less

Abstract: Ultra-high-throughput sequencing is emerging as an attractive alternative to microarrays for genotyping, analysis of methylation patterns, and identification of transcription factor binding sites. Here, we describe an application of the Illumina sequencing (formerly Solexa sequencing) platform to study mRNA expression levels. Our goals were to estimate technical variance associated with Illumina sequencing in this context and to compare its ability to identify differentially expressed genes with existing array technologies. To do so, we estimated gene expression differences between liver and kidney RNA samples using multiple sequencing replicates, and compared the sequencing data to results obtained from Affymetrix arrays using the same RNA samples. We find that the Illumina sequencing data are highly replicable, with relatively little technical variation, and thus, for many purposes, it may suffice to sequence each mRNA sample only once (i.e., using one lane). The information in a single lane of Illumina sequencing data appears comparable to that in a single array in enabling identification of differentially expressed genes, while allowing for additional analyses such as detection of low-expressed genes, alternative splice variants, and novel transcripts. Based on our observations, we propose an empirical protocol and a statistical framework for the analysis of gene expression using ultra-high-throughput sequencing technology.

...read moreread less

2,834 citations

Journal Article•DOI•

SOAP: short oligonucleotide alignment program

[...]

Ruiqiang Li¹, Yingrui Li², Karsten Kristiansen², Jun Wang²•Institutions (2)

Beijing Genomics Institute¹, University of Southern Denmark²

01 Mar 2008-Bioinformatics

TL;DR: The program SOAP is designed to handle the huge amounts of short reads generated by parallel sequencing using the new generation Illumina-Solexa sequencing technology, which supports multi-threaded parallel computing and has a batch module for multiple query sets.

...read moreread less

Abstract: Summary: We have developed a program SOAP for efficient gapped and ungapped alignment of short oligonucleotides onto reference sequences. The program is designed to handle the huge amounts of short reads generated by parallel sequencing using the new generation Illumina-Solexa sequencing technology. SOAP is compatible with numerous applications, including single-read or pair-end resequencing, small RNA discovery and mRNA tag sequence mapping. SOAP is a command-driven program, which supports multi-threaded parallel computing, and has a batch module for multiple query sets. Availability: http://soap.genomics.org.cn Contact: soap@genomics.org.cn

...read moreread less

2,729 citations