RNA-Seq: a revolutionary tool for transcriptomics

doi:10.1038/NRG2484

Home
/
Papers
/
RNA-Seq: a revolutionary tool for transcriptomics

Journal Article•DOI•

RNA-Seq: a revolutionary tool for transcriptomics

Zhong Wang¹, Mark Gerstein¹, Michael Snyder¹•Institutions (1)

Yale University¹

01 Jan 2009-Nature Reviews Genetics (Nature Publishing Group)-Vol. 10, Iss: 1, pp 57-63

TL;DR: The RNA-Seq approach to transcriptome profiling that uses deep-sequencing technologies provides a far more precise measurement of levels of transcripts and their isoforms than other methods.

read less

Abstract: RNA-Seq is a recently developed approach to transcriptome profiling that uses deep-sequencing technologies. Studies using this method have already altered our view of the extent and complexity of eukaryotic transcriptomes. RNA-Seq also provides a far more precise measurement of levels of transcripts and their isoforms than other methods. This article describes the RNA-Seq approach, the challenges associated with its application, and the advances made so far in characterizing several eukaryote transcriptomes.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Effect of the down-regulation of the high Grain Protein Content (GPC) genes on the wheat transcriptome during monocarpic senescence

[...]

Dario Cantu¹, Stephen Pearce¹, Assaf Distelfeld², Assaf Distelfeld¹, Michael W Christiansen¹, Michael W Christiansen³, Cristobal Uauy⁴, Eduard Akhunov⁵, Tzion Fahima⁶, Jorge Dubcovsky¹ - Show less +6 more•Institutions (6)

University of California, Davis¹, Tel Aviv University², Aarhus University³, John Innes Centre⁴, Kansas State University⁵, University of Haifa⁶

07 Oct 2011-BMC Genomics

TL;DR: This work used high-throughput mRNA-seq technologies to characterize the effect of the GPC down-regulation on the wheat flag-leaf transcriptome 12 days after anthesis and identified a set of 691 genes differentially regulated by GPC, which includes transporters, hormone regulated genes, and transcription factors.

...read moreread less

Abstract: Background: Increasing the nutrient concentration of wheat grains is important to ameliorate nutritional deficiencies in many parts of the world. Proteins and nutrients in the wheat grain are largely derived from the remobilization of degraded leaf molecules during monocarpic senescence. The down-regulation of the NAC transcription factor Grain Protein Content (GPC) in transgenic wheat plants delays senescence (>3 weeks) and reduces the concentration of protein, Zn and Fe in the grain (>30%), linking senescence and nutrient remobilization. Based on the early and rapid up-regulation of GPC in wheat flag leaves after anthesis, we hypothesized that this transcription factor is an early regulator of monocarpic senescence. To test this hypothesis, we used high-throughput mRNA-seq technologies to characterize the effect of the GPC down-regulation on the wheat flag-leaf transcriptome 12 days after anthesis. At this early stage of senescence GPC transcript levels are significantly lower in transgenic GPC-RNAi plants than in the wild type, but there are still no visible phenotypic differences between genotypes. Results: We generated 1.4 million 454 reads from early senescing flag leaves (average ~350 nt) and assembled 1.2 million into 30,497 contigs that were used as a reference to map 145 million Illumina reads from three wild type and four GPC-RNAi plants. Following normalization and statistical testing, we identified a set of 691 genes differentially regulated by GPC (431 ≥ 2-fold change). Transcript level ratios between transgenic and wild type plants showed a high correlation (R = 0.83) between qRT-PCR and Illumina results, providing independent validation of the mRNA-seq approach. A set of differentially expressed genes were analyzed across an early senescence time-course. Conclusions: Monocarpic senescence is an active process characterized by large-scale changes in gene expression which begins considerably before the appearance of visual symptoms of senescence. The mRNA-seq approach used here was able to detect small differences in transcript levels during the early stages of senescence. This resulted in an extensive list of GPC-regulated genes, which includes transporters, hormone regulated genes, and transcription factors. These GPC-regulated genes, particularly those up-regulated during senescence, provide valuable entry points to dissect the early stages of monocarpic senescence and nutrient remobilization in wheat.

...read moreread less

86 citations

Cites methods from "RNA-Seq: a revolutionary tool for t..."

...Direct cDNA sequencing approaches (mRNA-seq) for transcriptome profiling using Next Generation Sequencing technologies provide high-resolution methods for quantifying gene expression levels on a genome-wide scale [30]....
[...]

Journal Article•DOI•

Somatic sex-specific transcriptome differences in Drosophila revealed by whole transcriptome sequencing

[...]

Peter L. Chang¹, Joseph P. Dunham¹, Sergey V. Nuzhdin¹, Michelle N. Arbeitman¹, Michelle N. Arbeitman² - Show less +1 more•Institutions (2)

University of Southern California¹, Florida State University²

14 Jul 2011-BMC Genomics

TL;DR: Deep RNA sequencing is used to gain insight into how the Drosophila sex hierarchy generates somatic sex differences, by examining gene and transcript isoform expression differences between the sexes in adult head tissues and identifies thousands of genes that show sex-specific differences in overall gene expression levels.

...read moreread less

Abstract: Understanding animal development and physiology at a molecular-biological level has been advanced by the ability to determine at high resolution the repertoire of mRNA molecules by whole transcriptome resequencing. This includes the ability to detect and quantify rare abundance transcripts and isoform-specific mRNA variants produced from a gene. The sex hierarchy consists of a pre-mRNA splicing cascade that directs the production of sex-specific transcription factors that specify nearly all sexual dimorphism. We have used deep RNA sequencing to gain insight into how the Drosophila sex hierarchy generates somatic sex differences, by examining gene and transcript isoform expression differences between the sexes in adult head tissues. Here we find 1,381 genes that differ in overall expression levels and 1,370 isoform-specific transcripts that differ between males and females. Additionally, we find 512 genes not regulated downstream of transformer that are significantly more highly expressed in males than females. These 512 genes are enriched on the × chromosome and reside adjacent to dosage compensation complex entry sites, which taken together suggests that their residence on the × chromosome might be sufficient to confer male-biased expression. There are no transcription unit structural features, from a set of features, that are robustly significantly different in the genes with significant sex differences in the ratio of isoform-specific transcripts, as compared to random isoform-specific transcripts, suggesting that there is no single molecular mechanism that generates isoform-specific transcript differences between the sexes, even though the sex hierarchy is known to include three pre-mRNA splicing factors. We identify thousands of genes that show sex-specific differences in overall gene expression levels, and identify hundreds of additional genes that have differences in the abundance of isoform-specific transcripts. No transcription unit structural feature was robustly enriched in the sex-differentially expressed transcript isoforms. Additionally, we found that many genes with male-biased expression were enriched on the × chromosome and reside adjacent to dosage compensation entry sites, suggesting that differences in sex chromosome composition contributes to dimorphism in gene expression. Taken together, this study provides new insight into the molecular underpinnings of sexual differentiation.

...read moreread less

86 citations

Cites background from "RNA-Seq: a revolutionary tool for t..."

...It is not unexpected to have more variability in the number of sequence reads from the ends of a transcription unit [reviewed in [37]]....
[...]

Journal Article•DOI•

eQTL Mapping Using RNA-seq Data

[...]

Wei Sun¹, Yi-Juan Hu²•Institutions (2)

University of North Carolina at Chapel Hill¹, Emory University²

01 May 2013-Statistics in Biosciences

TL;DR: Current methods for eQTL mapping using ASE are reviewed, some future directions are discussed and existing works that use RNA-seq data to study RNA-isoform expression are reviewed and the gaps between these works and isoform-specific eZTL mapping are discussed.

...read moreread less

Abstract: As RNA-seq is replacing gene expression microarrays to assess genome-wide transcription abundance, gene expression Quantitative Trait Locus (eQTL) studies using RNA-seq have emerged RNA-seq delivers two novel features that are important for eQTL studies First, it provides information on allele-specific expression (ASE), which is not available from gene expression microarrays Second, it generates unprecedentedly rich data to study RNA-isoform expression In this paper, we review current methods for eQTL mapping using ASE and discuss some future directions We also review existing works that use RNA-seq data to study RNA-isoform expression and we discuss the gaps between these works and isoform-specific eQTL mapping

...read moreread less

86 citations

Cites background from "RNA-Seq: a revolutionary tool for t..."

...while requiring less RNA materials [90]....
[...]
...RNA-seq data provide unprecedentedly rich information to study alternative splicing events [54,76,85,90]....
[...]

Journal Article•DOI•

Linear amplification for deep sequencing.

[...]

Wieteke A. M. Hoeijmakers¹, Richárd Bártfai¹, Kees-Jan Francoijs¹, Hendrik G. Stunnenberg¹•Institutions (1)

Radboud University Nijmegen¹

01 Jun 2011-Nature Protocols

TL;DR: Linear amplification for deep sequencing (LADS) as mentioned in this paper is an amplification method that produces representative libraries for Illumina next-generation sequencing within 2 d. The method relies on attaching two different sequencing adapters to blunt-end repaired and A-tailed DNA fragments, wherein one of the adapters is extended with the sequence for the T7 RNA polymerase promoter.

...read moreread less

Abstract: Linear amplification for deep sequencing (LADS) is an amplification method that produces representative libraries for Illumina next-generation sequencing within 2 d. The method relies on attaching two different sequencing adapters to blunt-end repaired and A-tailed DNA fragments, wherein one of the adapters is extended with the sequence for the T7 RNA polymerase promoter. Ligated and size-selected DNA fragments are transcribed in vitro with high RNA yields. Subsequent cDNA synthesis is initiated from a primer complementary to the first adapter, ensuring that the library will only contain full-length fragments with two distinct adapters. Contrary to the severely biased representation of AT- or GC-rich fragments in standard PCR-amplified libraries, the sequence coverage in T7-amplified libraries is indistinguishable from that of nonamplified libraries. Moreover, in contrast to amplification-free methods, LADS can generate sequencing libraries from a few nanograms of DNA, which is essential for all applications in which the starting material is limited.

...read moreread less

86 citations

Journal Article•DOI•

Organization and Maintenance of Molecular Domains in Myelinated Axons

[...]

Elizabeth D. Buttermore¹, Courtney Thaxton¹, Manzoor A. Bhat•Institutions (1)

University of North Carolina at Chapel Hill¹

01 May 2013-Journal of Neuroscience Research

TL;DR: Recent advances on the molecular nature and functions of some of the components of each axonal domain and their roles in axonaldomain organization and maintenance for proper neuronal communication are highlighted.

...read moreread less

Abstract: Over a century ago, Ramon y Cajal first proposed the idea of a directionality involved in nerve conduction and neuronal communication. Decades later, it was discovered that myelin, produced by glial cells, insulated axons with periodic breaks where nodes of Ranvier (nodes) form to allow for saltatory conduction. In the peripheral nervous system (PNS), Schwann cells are the glia that can either individually myelinate the axon from one neuron or ensheath axons of many neurons. In the central nervous system (CNS), oligodendrocytes are the glia that myelinate axons from different neurons. Review of more recent studies revealed that this myelination created polarized domains adjacent to the nodes. However, the molecular mechanisms responsible for the organization of axonal domains are only now beginning to be elucidated. The molecular domains in myelinated axons include the axon initial segment (AIS), where various ion channels are clustered and action potentials are initiated; the node, where sodium channels are clustered and action potentials are propagated; the paranode, where myelin loops contact with the axolemma; the juxtaparanode (JXP), where delayed-rectifier potassium channels are clustered; and the internode, where myelin is compactly wrapped. Each domain contains a unique subset of proteins critical for the domain’s function. However, the roles of these proteins in axonal domain organization are not fully understood. In this review, we highlight recent advances on the molecular nature and functions of some of the components of each axonal domain and their roles in axonal domain organization and maintenance for proper neuronal communication.

...read moreread less

86 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
…
192
193
194
195
196
197
198
…
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Mapping and quantifying mammalian transcriptomes by RNA-Seq.

[...]

Ali Mortazavi¹, Brian A. Williams¹, Kenneth McCue¹, Lorian Schaeffer¹, Barbara J. Wold¹ - Show less +1 more•Institutions (1)

California Institute of Technology¹

29 Jun 2008-Nature Methods

TL;DR: Although >90% of uniquely mapped reads fell within known exons, the remaining data suggest new and revised gene models, including changed or additional promoters, exons and 3′ untranscribed regions, as well as new candidate microRNA precursors.

...read moreread less

Abstract: We have mapped and quantified mouse transcriptomes by deeply sequencing them and recording how frequently each gene is represented in the sequence sample (RNA-Seq). This provides a digital measure of the presence and prevalence of transcripts from known and previously unknown genes. We report reference measurements composed of 41–52 million mapped 25-base-pair reads for poly(A)-selected RNA from adult mouse brain, liver and skeletal muscle tissues. We used RNA standards to quantify transcript prevalence and to test the linear range of transcript detection, which spanned five orders of magnitude. Although >90% of uniquely mapped reads fell within known exons, the remaining data suggest new and revised gene models, including changed or additional promoters, exons and 3′ untranscribed regions, as well as new candidate microRNA precursors. RNA splice events, which are not readily measured by standard gene expression microarray or serial analysis of gene expression methods, were detected directly by mapping splice-crossing sequence reads. We observed 1.45 × 10 5 distinct splices, and alternative splices were prominent, with 3,500 different genes expressing one or more alternate internal splices. The mRNA population specifies a cell’s identity and helps to govern its present and future activities. This has made transcriptome analysis a general phenotyping method, with expression microarrays of many kinds in routine use. Here we explore the possibility that transcriptome analysis, transcript discovery and transcript refinement can be done effectively in large and complex mammalian genomes by ultra-high-throughput sequencing. Expression microarrays are currently the most widely used methodology for transcriptome analysis, although some limitations persist. These include hybridization and cross-hybridization artifacts 1–3 , dye-based detection issues and design constraints that preclude or seriously limit the detection of RNA splice patterns and previously unmapped genes. These issues have made it difficult for standard array designs to provide full sequence comprehensiveness (coverage of all possible genes, including unknown ones, in large genomes) or transcriptome comprehensiveness (reliable detection of all RNAs of all prevalence classes, including the least abundant ones that are physiologically relevant). Other

...read moreread less

12,293 citations

Patent•DOI•

Serial analysis of gene expression

[...]

Kenneth W. Kinzler¹, Victor Velculescu², Bert Vogelstein², Lin Zhang², ヴェルヴレスク，ヴィクター，イー．, ヴォゲルステイン，バート, キンズラー，ケネス，ダブリュ．, ツァン，リン - Show less +4 more•Institutions (2)

Johns Hopkins University¹, Howard Hughes Medical Institute²

04 Oct 2000-Science

TL;DR: Serial analysis of gene expression (SAGE) should provide a broadly applicable means for the quantitative cataloging and comparison of expressed genes in a variety of normal, developmental, and disease states.

...read moreread less

Abstract: PROBLEM TO BE SOLVED: To provide a method for preparing a short nucleotide sequence (tag) which is useful to identify a cDNA oligonucleotide and is derived from a restricted position in a mRNA or a cDNA. SOLUTION: This is the method of preparing a tag for identifying the cDNA oligonucleotide. The above method comprises preparing the cDNA oligonucleotide bearing 5' and 3' terminals, collecting cDNA fragments by cutting the cDNA oligonucleotide with a restriction enzyme at the first restriction endonuclease site, separating a cDNA oligonucleotide bearing 5' or 3' terminal and connecting an oligonucleotide linker to the isolated cDNA fragment bearing the cDNA oligonucleotide 5' or 3' terminal. Here, the oligonucleotide linker contains the recognition site of the second restriction endonuclease enzyme and the isolated cDNA fragment is cut with the second restriction endonuclease enzyme which cuts the cDNA fragment in a section separated from the recognition site to obtain the tag for identifying the cDNA oligonucleotide.

...read moreread less

4,437 citations

Journal Article•DOI•

Mapping short DNA sequencing reads and calling variants using mapping quality scores

[...]

Heng Li¹, Jue Ruan, Richard Durbin•Institutions (1)

Wellcome Trust Sanger Institute¹

01 Nov 2008-Genome Research

TL;DR: This work describes the software MAQ, software that can build assemblies by mapping shotgun short reads to a reference genome, using quality scores to derive genotype calls of the consensus sequence of a diploid genome, e.g., from a human sample.

...read moreread less

Abstract: New sequencing technologies promise a new era in the use of DNA sequence. However, some of these technologies produce very short reads, typically of a few tens of base pairs, and to use these reads effectively requires new algorithms and software. In particular, there is a major issue in efficiently aligning short reads to a reference genome and handling ambiguity or lack of accuracy in this alignment. Here we introduce the concept of mapping quality, a measure of the confidence that a read actually comes from the position it is aligned to by the mapping algorithm. We describe the software MAQ that can build assemblies by mapping shotgun short reads to a reference genome, using quality scores to derive genotype calls of the consensus sequence of a diploid genome, e.g., from a human sample. MAQ makes full use of mate-pair information and estimates the error probability of each read alignment. Error probabilities are also derived for the final genotype calls, using a Bayesian statistical model that incorporates the mapping qualities, error probabilities from the raw sequence quality scores, sampling of the two haplotypes, and an empirical model for correlated errors at a site. Both read mapping and genotype calling are evaluated on simulated data and real data. MAQ is accurate, efficient, versatile, and user-friendly. It is freely available at http://maq.sourceforge.net.

...read moreread less

2,927 citations

Journal Article•DOI•

RNA-seq: An assessment of technical reproducibility and comparison with gene expression arrays

[...]

John C. Marioni¹, Christopher E. Mason, Shrikant Mane, Matthew Stephens, Yoav Gilad - Show less +1 more•Institutions (1)

University of Chicago¹

01 Sep 2008-Genome Research

TL;DR: It is found that the Illumina sequencing data are highly replicable, with relatively little technical variation, and thus, for many purposes, it may suffice to sequence each mRNA sample only once (i.e., using one lane).

...read moreread less

Abstract: Ultra-high-throughput sequencing is emerging as an attractive alternative to microarrays for genotyping, analysis of methylation patterns, and identification of transcription factor binding sites. Here, we describe an application of the Illumina sequencing (formerly Solexa sequencing) platform to study mRNA expression levels. Our goals were to estimate technical variance associated with Illumina sequencing in this context and to compare its ability to identify differentially expressed genes with existing array technologies. To do so, we estimated gene expression differences between liver and kidney RNA samples using multiple sequencing replicates, and compared the sequencing data to results obtained from Affymetrix arrays using the same RNA samples. We find that the Illumina sequencing data are highly replicable, with relatively little technical variation, and thus, for many purposes, it may suffice to sequence each mRNA sample only once (i.e., using one lane). The information in a single lane of Illumina sequencing data appears comparable to that in a single array in enabling identification of differentially expressed genes, while allowing for additional analyses such as detection of low-expressed genes, alternative splice variants, and novel transcripts. Based on our observations, we propose an empirical protocol and a statistical framework for the analysis of gene expression using ultra-high-throughput sequencing technology.

...read moreread less

2,834 citations

Journal Article•DOI•

SOAP: short oligonucleotide alignment program

[...]

Ruiqiang Li¹, Yingrui Li², Karsten Kristiansen², Jun Wang²•Institutions (2)

Beijing Genomics Institute¹, University of Southern Denmark²

01 Mar 2008-Bioinformatics

TL;DR: The program SOAP is designed to handle the huge amounts of short reads generated by parallel sequencing using the new generation Illumina-Solexa sequencing technology, which supports multi-threaded parallel computing and has a batch module for multiple query sets.

...read moreread less

Abstract: Summary: We have developed a program SOAP for efficient gapped and ungapped alignment of short oligonucleotides onto reference sequences. The program is designed to handle the huge amounts of short reads generated by parallel sequencing using the new generation Illumina-Solexa sequencing technology. SOAP is compatible with numerous applications, including single-read or pair-end resequencing, small RNA discovery and mRNA tag sequence mapping. SOAP is a command-driven program, which supports multi-threaded parallel computing, and has a batch module for multiple query sets. Availability: http://soap.genomics.org.cn Contact: soap@genomics.org.cn

...read moreread less

2,729 citations