RNA-Seq: a revolutionary tool for transcriptomics

doi:10.1038/NRG2484

Home
/
Papers
/
RNA-Seq: a revolutionary tool for transcriptomics

Journal Article•DOI•

RNA-Seq: a revolutionary tool for transcriptomics

Zhong Wang¹, Mark Gerstein¹, Michael Snyder¹•Institutions (1)

Yale University¹

01 Jan 2009-Nature Reviews Genetics (Nature Publishing Group)-Vol. 10, Iss: 1, pp 57-63

TL;DR: The RNA-Seq approach to transcriptome profiling that uses deep-sequencing technologies provides a far more precise measurement of levels of transcripts and their isoforms than other methods.

read less

Abstract: RNA-Seq is a recently developed approach to transcriptome profiling that uses deep-sequencing technologies. Studies using this method have already altered our view of the extent and complexity of eukaryotic transcriptomes. RNA-Seq also provides a far more precise measurement of levels of transcripts and their isoforms than other methods. This article describes the RNA-Seq approach, the challenges associated with its application, and the advances made so far in characterizing several eukaryote transcriptomes.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Integrated Analysis of Transcriptomic and Proteomic Data

[...]

Saad Haider¹, Ranadip Pal¹•Institutions (1)

Texas Tech University¹

31 Mar 2013-Current Genomics

TL;DR: This article reviews the existing major approaches for joint analysis of transcriptomic and proteomic data and categorizes the different approaches into eight main categories based on the initial algorithm and final analysis goal.

...read moreread less

Abstract: Until recently, understanding the regulatory behavior of cells has been pursued through independent analysis of the transcriptome or the proteome. Based on the central dogma, it was generally assumed that there exist a direct correspondence between mRNA transcripts and generated protein expressions. However, recent studies have shown that the correlation between mRNA and Protein expressions can be low due to various factors such as different half lives and post transcription machinery. Thus, a joint analysis of the transcriptomic and proteomic data can provide useful insights that may not be deciphered from individual analysis of mRNA or protein expressions. This article reviews the existing major approaches for joint analysis of transcriptomic and proteomic data. We categorize the different approaches into eight main categories based on the initial algorithm and final analysis goal. We further present analogies with other domains and discuss the existing research problems in this area.

...read moreread less

328 citations

Cites background from "RNA-Seq: a revolutionary tool for t..."

...The most recent technology for transcriptomic profiling is RNA-Seq [23] which is considered as a revolutionary tool for this purpose....
[...]
...Overall comparison of existing technologies and most recent RNA-Seq technology can be found in recent reviews by Nicole Roy et al. [31] and Schirmer et al. [32]....
[...]
...RNA-Seq technology shows clear advantages over existing profiling technologies in terms of amount of sequence coverage, revealing new transcriptomic insights, accuracy of defining transcription level, etc....
[...]

Journal Article•DOI•

Homoeolog expression bias and expression level dominance in allopolyploid cotton

[...]

Mi-Jeong Yoo¹, Emmanuel Szadkowski¹, Jonathan F. Wendel¹•Institutions (1)

Iowa State University¹

01 Feb 2013-Heredity

TL;DR: Gene expression patterns in interspecific hybrid F1, and synthetic and natural allopolyploid cotton using RNA-Seq reads from leaf transcriptomes suggest that natural selection reconciles the regulatory mismatches caused by initial genomic merger, while new gene expression conditions are generated for evaluation by selection.

...read moreread less

Abstract: Allopolyploidy is an evolutionary and mechanistically intriguing process, in that it entails the reconciliation of two or more sets of diverged genomes and regulatory interactions. In this study, we explored gene expression patterns in interspecific hybrid F(1), and synthetic and natural allopolyploid cotton using RNA-Seq reads from leaf transcriptomes. We determined how the extent and direction of expression level dominance (total level of expression for both homoeologs) and homoeolog expression bias (relative contribution of homoeologs to the transcriptome) changed from hybridization through evolution at the polyploid level and following cotton domestication. Genome-wide expression level dominance was biased toward the A-genome in the diploid hybrid and natural allopolyploids, whereas the direction was reversed in the synthetic allopolyploid. This biased expression level dominance was mainly caused by up- or downregulation of the homoeolog from the 'non-dominant' parent. Extensive alterations in homoeolog expression bias and expression level dominance accompany the initial merger of two diverged diploid genomes, suggesting a combination of regulatory (cis or trans) and epigenetic interactions that may arise and propagate through the transcriptome network. The extent of homoeolog expression bias and expression level dominance increases over time, from genome merger through evolution at the polyploid level. Higher rates of transgressive and novel gene expression patterns as well as homoeolog silencing were observed in natural allopolyploids than in F(1) hybrid and synthetic allopolyploid cottons. These observations suggest that natural selection reconciles the regulatory mismatches caused by initial genomic merger, while new gene expression conditions are generated for evaluation by selection.

...read moreread less

328 citations

Cites background from "RNA-Seq: a revolutionary tool for t..."

...…key advantages for transcriptome profiling, including the lack of a priori information on genome sequences, no upper limit for quantification, higher accuracy for distinguishing and quantifying expression levels of homoeologous copies, and a high level of reproducibility (Wang et al., 2009)....
[...]
...Homoeolog expression bias and expression level dominance in allopolyploid cotton...
[...]

Journal Article•DOI•

Arabidopsis REF6 is a histone H3 lysine 27 demethylase

[...]

Falong Lu¹, Xia Cui¹, Shuaibin Zhang¹, Thomas Jenuwein², Xiaofeng Cao¹ - Show less +1 more•Institutions (2)

Chinese Academy of Sciences¹, Max Planck Society²

01 Jul 2011-Nature Genetics

TL;DR: It is shown that RELATIVE of EARLY FLOWERing 6 (REF6), also known as Jumonji domain–containing protein 12 (JMJ12), specifically demethylates H3K27me3 and H3k27me2, whereas its metazoan counterparts, the KDM4 proteins, are H 3K9 and H4K36 demethylases.

...read moreread less

Abstract: Xiaofeng Cao and colleagues report that REF6 is a histone H3 lysine 27 demethylase in Arabidopsis. REF6 demethylates H3K27me3 and H3K27me2 and ref6 mutant plants resemble mutations in H3K27me3-mediated gene silencing. Polycomb group (PcG)-mediated histone H3 lysine 27 trimethylation (H3K27me3) has a key role in gene repression and developmental regulation1,2,3,4. There is evidence that H3K27me3 is actively removed in plants5,6,7,8, but it is not known how this occurs. Here we show that RELATIVE OF EARLY FLOWERING 6 (REF6), also known as Jumonji domain–containing protein 12 (JMJ12), specifically demethylates H3K27me3 and H3K27me2, whereas its metazoan counterparts, the KDM4 proteins, are H3K9 and H3K36 demethylases9,10. Plants overexpressing REF6 resembled mutants defective in H3K27me3-mediated gene silencing. Genetic interaction tests indicated that REF6 acts downstream of H3K27me3 methyltransferases. Mutations in REF6 caused ectopic and increased H3K27me3 level and decreased mRNA expression of hundreds of genes involved in regulating developmental patterning and responses to various stimuli. Our work shows that plants and metazoans use conserved mechanisms to regulate H3K27me3 dynamics but use distinct subfamilies of enzymes.

...read moreread less

328 citations

Journal Article•DOI•

RNAi-based treatment of chronically infected patients and chimpanzees reveals that integrated hepatitis B virus DNA is a source of HBsAg

[...]

Christine I. Wooddell, Man-Fung Yuen¹, Henry Lik-Yuen Chan², Robert G. Gish³, Stephen Locarnini⁴, Deborah Chavez⁵, Carl Ferrari⁶, Bruce D. Given, James Hamilton, Steven Kanner, Ching-Lung Lai¹, Johnson Y.N. Lau⁷, T. Schluep, Zhao Xu, Robert E. Lanford⁵, David L. Lewis - Show less +12 more•Institutions (7)

University of Hong Kong¹, The Chinese University of Hong Kong², Stanford University³, University of Melbourne⁴, Texas Biomedical Research Institute⁵, University of Parma⁶, Hong Kong Polytechnic University⁷

27 Sep 2017-Science Translational Medicine

TL;DR: A previously unappreciated source of viral antigen is uncovered that may represent a strategy adopted by HBV to maintain chronicity in the presence of host immunosurveillance and could inform disease pathogenesis and help guide development of future HBV treatments.

...read moreread less

Abstract: Chronic hepatitis B virus (HBV) infection is a major health concern worldwide, frequently leading to liver cirrhosis, liver failure, and hepatocellular carcinoma Evidence suggests that high viral antigen load may play a role in chronicity Production of viral proteins is thought to depend on transcription of viral covalently closed circular DNA (cccDNA) In a human clinical trial with an RNA interference (RNAi)-based therapeutic targeting HBV transcripts, ARC-520, HBV S antigen (HBsAg) was strongly reduced in treatment-naive patients positive for HBV e antigen (HBeAg) but was reduced significantly less in patients who were HBeAg-negative or had received long-term therapy with nucleos(t)ide viral replication inhibitors (NUCs) HBeAg positivity is associated with greater disease risk that may be moderately reduced upon HBeAg loss The molecular basis for this unexpected differential response was investigated in chimpanzees chronically infected with HBV Several lines of evidence demonstrated that HBsAg was expressed not only from the episomal cccDNA minichromosome but also from transcripts arising from HBV DNA integrated into the host genome, which was the dominant source in HBeAg-negative chimpanzees Many of the integrants detected in chimpanzees lacked target sites for the small interfering RNAs in ARC-520, explaining the reduced response in HBeAg-negative chimpanzees and, by extension, in HBeAg-negative patients Our results uncover a heretofore underrecognized source of HBsAg that may represent a strategy adopted by HBV to maintain chronicity in the presence of host immunosurveillance These results could alter trial design and endpoint expectations of new therapies for chronic HBV

...read moreread less

327 citations

Journal Article•DOI•

Normalization, testing, and false discovery rate estimation for RNA-sequencing data

[...]

Jun Li¹, Daniela Witten², Iain M. Johnstone¹, Robert Tibshirani¹•Institutions (2)

Stanford University¹, University of Washington²

01 Jul 2012-Biostatistics

TL;DR: This work uses a log-linear model with a new approach to normalization to derive a novel procedure to estimate the false discovery rate (FDR), and demonstrates that the method has potential advantages over existing methods that are based on a Poisson or negative binomial model.

...read moreread less

Abstract: We discuss the identification of genes that are associated with an outcome in RNA sequencing and other sequence-based comparative genomic experiments. RNA-sequencing data take the form of counts, so models based on the Gaussian distribution are unsuitable. Moreover, normalization is challenging because different sequencing experiments may generate quite different total numbers of reads. To overcome these difficulties, we use a log-linear model with a new approach to normalization. We derive a novel procedure to estimate the false discovery rate (FDR). Our method can be applied to data with quantitative, two-class, or multiple-class outcomes, and the computation is fast even for large data sets. We study the accuracy of our approaches for significance calculation and FDR estimation, and we demonstrate that our method has potential advantages over existing methods that are based on a Poisson or negative binomial model. In summary, this work provides a pipeline for the significance analysis of sequencing data.

...read moreread less

325 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
…
34
35
36
37
38
39
40
…
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Mapping and quantifying mammalian transcriptomes by RNA-Seq.

[...]

Ali Mortazavi¹, Brian A. Williams¹, Kenneth McCue¹, Lorian Schaeffer¹, Barbara J. Wold¹ - Show less +1 more•Institutions (1)

California Institute of Technology¹

29 Jun 2008-Nature Methods

TL;DR: Although >90% of uniquely mapped reads fell within known exons, the remaining data suggest new and revised gene models, including changed or additional promoters, exons and 3′ untranscribed regions, as well as new candidate microRNA precursors.

...read moreread less

Abstract: We have mapped and quantified mouse transcriptomes by deeply sequencing them and recording how frequently each gene is represented in the sequence sample (RNA-Seq). This provides a digital measure of the presence and prevalence of transcripts from known and previously unknown genes. We report reference measurements composed of 41–52 million mapped 25-base-pair reads for poly(A)-selected RNA from adult mouse brain, liver and skeletal muscle tissues. We used RNA standards to quantify transcript prevalence and to test the linear range of transcript detection, which spanned five orders of magnitude. Although >90% of uniquely mapped reads fell within known exons, the remaining data suggest new and revised gene models, including changed or additional promoters, exons and 3′ untranscribed regions, as well as new candidate microRNA precursors. RNA splice events, which are not readily measured by standard gene expression microarray or serial analysis of gene expression methods, were detected directly by mapping splice-crossing sequence reads. We observed 1.45 × 10 5 distinct splices, and alternative splices were prominent, with 3,500 different genes expressing one or more alternate internal splices. The mRNA population specifies a cell’s identity and helps to govern its present and future activities. This has made transcriptome analysis a general phenotyping method, with expression microarrays of many kinds in routine use. Here we explore the possibility that transcriptome analysis, transcript discovery and transcript refinement can be done effectively in large and complex mammalian genomes by ultra-high-throughput sequencing. Expression microarrays are currently the most widely used methodology for transcriptome analysis, although some limitations persist. These include hybridization and cross-hybridization artifacts 1–3 , dye-based detection issues and design constraints that preclude or seriously limit the detection of RNA splice patterns and previously unmapped genes. These issues have made it difficult for standard array designs to provide full sequence comprehensiveness (coverage of all possible genes, including unknown ones, in large genomes) or transcriptome comprehensiveness (reliable detection of all RNAs of all prevalence classes, including the least abundant ones that are physiologically relevant). Other

...read moreread less

12,293 citations

Patent•DOI•

Serial analysis of gene expression

[...]

Kenneth W. Kinzler¹, Victor Velculescu², Bert Vogelstein², Lin Zhang², ヴェルヴレスク，ヴィクター，イー．, ヴォゲルステイン，バート, キンズラー，ケネス，ダブリュ．, ツァン，リン - Show less +4 more•Institutions (2)

Johns Hopkins University¹, Howard Hughes Medical Institute²

04 Oct 2000-Science

TL;DR: Serial analysis of gene expression (SAGE) should provide a broadly applicable means for the quantitative cataloging and comparison of expressed genes in a variety of normal, developmental, and disease states.

...read moreread less

Abstract: PROBLEM TO BE SOLVED: To provide a method for preparing a short nucleotide sequence (tag) which is useful to identify a cDNA oligonucleotide and is derived from a restricted position in a mRNA or a cDNA. SOLUTION: This is the method of preparing a tag for identifying the cDNA oligonucleotide. The above method comprises preparing the cDNA oligonucleotide bearing 5' and 3' terminals, collecting cDNA fragments by cutting the cDNA oligonucleotide with a restriction enzyme at the first restriction endonuclease site, separating a cDNA oligonucleotide bearing 5' or 3' terminal and connecting an oligonucleotide linker to the isolated cDNA fragment bearing the cDNA oligonucleotide 5' or 3' terminal. Here, the oligonucleotide linker contains the recognition site of the second restriction endonuclease enzyme and the isolated cDNA fragment is cut with the second restriction endonuclease enzyme which cuts the cDNA fragment in a section separated from the recognition site to obtain the tag for identifying the cDNA oligonucleotide.

...read moreread less

4,437 citations

Journal Article•DOI•

Mapping short DNA sequencing reads and calling variants using mapping quality scores

[...]

Heng Li¹, Jue Ruan, Richard Durbin•Institutions (1)

Wellcome Trust Sanger Institute¹

01 Nov 2008-Genome Research

TL;DR: This work describes the software MAQ, software that can build assemblies by mapping shotgun short reads to a reference genome, using quality scores to derive genotype calls of the consensus sequence of a diploid genome, e.g., from a human sample.

...read moreread less

Abstract: New sequencing technologies promise a new era in the use of DNA sequence. However, some of these technologies produce very short reads, typically of a few tens of base pairs, and to use these reads effectively requires new algorithms and software. In particular, there is a major issue in efficiently aligning short reads to a reference genome and handling ambiguity or lack of accuracy in this alignment. Here we introduce the concept of mapping quality, a measure of the confidence that a read actually comes from the position it is aligned to by the mapping algorithm. We describe the software MAQ that can build assemblies by mapping shotgun short reads to a reference genome, using quality scores to derive genotype calls of the consensus sequence of a diploid genome, e.g., from a human sample. MAQ makes full use of mate-pair information and estimates the error probability of each read alignment. Error probabilities are also derived for the final genotype calls, using a Bayesian statistical model that incorporates the mapping qualities, error probabilities from the raw sequence quality scores, sampling of the two haplotypes, and an empirical model for correlated errors at a site. Both read mapping and genotype calling are evaluated on simulated data and real data. MAQ is accurate, efficient, versatile, and user-friendly. It is freely available at http://maq.sourceforge.net.

...read moreread less

2,927 citations

Journal Article•DOI•

RNA-seq: An assessment of technical reproducibility and comparison with gene expression arrays

[...]

John C. Marioni¹, Christopher E. Mason, Shrikant Mane, Matthew Stephens, Yoav Gilad - Show less +1 more•Institutions (1)

University of Chicago¹

01 Sep 2008-Genome Research

TL;DR: It is found that the Illumina sequencing data are highly replicable, with relatively little technical variation, and thus, for many purposes, it may suffice to sequence each mRNA sample only once (i.e., using one lane).

...read moreread less

Abstract: Ultra-high-throughput sequencing is emerging as an attractive alternative to microarrays for genotyping, analysis of methylation patterns, and identification of transcription factor binding sites. Here, we describe an application of the Illumina sequencing (formerly Solexa sequencing) platform to study mRNA expression levels. Our goals were to estimate technical variance associated with Illumina sequencing in this context and to compare its ability to identify differentially expressed genes with existing array technologies. To do so, we estimated gene expression differences between liver and kidney RNA samples using multiple sequencing replicates, and compared the sequencing data to results obtained from Affymetrix arrays using the same RNA samples. We find that the Illumina sequencing data are highly replicable, with relatively little technical variation, and thus, for many purposes, it may suffice to sequence each mRNA sample only once (i.e., using one lane). The information in a single lane of Illumina sequencing data appears comparable to that in a single array in enabling identification of differentially expressed genes, while allowing for additional analyses such as detection of low-expressed genes, alternative splice variants, and novel transcripts. Based on our observations, we propose an empirical protocol and a statistical framework for the analysis of gene expression using ultra-high-throughput sequencing technology.

...read moreread less

2,834 citations

Journal Article•DOI•

SOAP: short oligonucleotide alignment program

[...]

Ruiqiang Li¹, Yingrui Li², Karsten Kristiansen², Jun Wang²•Institutions (2)

Beijing Genomics Institute¹, University of Southern Denmark²

01 Mar 2008-Bioinformatics

TL;DR: The program SOAP is designed to handle the huge amounts of short reads generated by parallel sequencing using the new generation Illumina-Solexa sequencing technology, which supports multi-threaded parallel computing and has a batch module for multiple query sets.

...read moreread less

Abstract: Summary: We have developed a program SOAP for efficient gapped and ungapped alignment of short oligonucleotides onto reference sequences. The program is designed to handle the huge amounts of short reads generated by parallel sequencing using the new generation Illumina-Solexa sequencing technology. SOAP is compatible with numerous applications, including single-read or pair-end resequencing, small RNA discovery and mRNA tag sequence mapping. SOAP is a command-driven program, which supports multi-threaded parallel computing, and has a batch module for multiple query sets. Availability: http://soap.genomics.org.cn Contact: soap@genomics.org.cn

...read moreread less

2,729 citations