RNA-Seq: a revolutionary tool for transcriptomics

doi:10.1038/NRG2484

Home
/
Papers
/
RNA-Seq: a revolutionary tool for transcriptomics

Journal Article•DOI•

RNA-Seq: a revolutionary tool for transcriptomics

Zhong Wang¹, Mark Gerstein¹, Michael Snyder¹•Institutions (1)

Yale University¹

01 Jan 2009-Nature Reviews Genetics (Nature Publishing Group)-Vol. 10, Iss: 1, pp 57-63

TL;DR: The RNA-Seq approach to transcriptome profiling that uses deep-sequencing technologies provides a far more precise measurement of levels of transcripts and their isoforms than other methods.

read less

Abstract: RNA-Seq is a recently developed approach to transcriptome profiling that uses deep-sequencing technologies. Studies using this method have already altered our view of the extent and complexity of eukaryotic transcriptomes. RNA-Seq also provides a far more precise measurement of levels of transcripts and their isoforms than other methods. This article describes the RNA-Seq approach, the challenges associated with its application, and the advances made so far in characterizing several eukaryote transcriptomes.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Comparison of normalization methods for differential gene expression analysis in RNA-Seq experiments: A matter of relative size of studied transcriptomes.

[...]

Elie Maza¹, Pierre Frasse², Pavel Senin, Mondher Bouzayen, Mohamed Zouine³ - Show less +1 more•Institutions (3)

University of Toulouse¹, Entertainments National Service Association², Institut national de la recherche agronomique³

30 Jul 2013-Communicative & Integrative Biology

TL;DR: This study compares the most widespread normalization procedures and proposes a novel one aiming at removing an inherent bias of studied transcriptomes related to their relative size, named “Median Ratio Normalization” (MRN).

...read moreread less

Abstract: In recent years, RNA-Seq technologies became a powerful tool for transcriptome studies. However, computational methods dedicated to the analysis of high-throughput sequencing data are yet to be standardized. In particular, it is known that the choice of a normalization procedure leads to a great variability in results of differential gene expression analysis. The present study compares the most widespread normalization procedures and proposes a novel one aiming at removing an inherent bias of studied transcriptomes related to their relative size. Comparisons of the normalization procedures are performed on real and simulated data sets. Real RNA-Seq data sets analyses, performed with all the different normalization methods, show that only 50% of significantly differentially expressed genes are common. This result highlights the influence of the normalization step on the differential expression analysis. Real and simulated data sets analyses give similar results showing 3 different groups of procedures having the same behavior. The group including the novel method named "Median Ratio Normalization" (MRN) gives the lower number of false discoveries. Within this group the MRN method is less sensitive to the modification of parameters related to the relative size of transcriptomes such as the number of down- and upregulated genes and the gene expression levels. The newly proposed MRN method efficiently deals with intrinsic bias resulting from relative size of studied transcriptomes. Validation with real and simulated data sets confirmed that MRN is more consistent and robust than existing methods.

...read moreread less

87 citations

Cites background from "RNA-Seq: a revolutionary tool for t..."

...Of particular interest, the notion of relative size is also considered for high throughput metabonomics and proteomics data analyses where count data are commonly used.(23,24) Based on poor benchmark performance, it clearly emerges that the FPKM normalization method should be largely avoided in DE analysis....
[...]

Journal Article•DOI•

TomExpress, a unified tomato RNA-Seq platform for visualization of expression data, clustering and correlation networks.

[...]

Mohammed Zouine¹, Mohammed Zouine², Elie Maza², Elie Maza¹, Anis Djari¹, Anis Djari², Mattieu Lauvernier², Mattieu Lauvernier¹, Pierre Frasse², Pierre Frasse¹, Abdelaziz Smouni, Julien Pirrello², Julien Pirrello¹, Mondher Bouzayen², Mondher Bouzayen¹ - Show less +11 more•Institutions (2)

Institut national de la recherche agronomique¹, University of Toulouse²

25 Oct 2017-Plant Journal

TL;DR: TomExpress as mentioned in this paper is a web platform for tomato RNA-Seq data visualization and data mining, which allows for the identification of coexpressed genes and the visualization of correlation networks of co-regulated gene groups.

...read moreread less

Abstract: The TomExpress platform was developed to provide the tomato research community with a browser and integrated web tools for public RNA-Seq data visualization and data mining. To avoid major biases that can result from the use of different mapping and statistical processing methods, RNA-Seq raw sequence data available in public databases were mapped de novo on a unique tomato reference genome sequence and post-processed using the same pipeline with accurate parameters. Following the calculation of the number of counts per gene in each RNA-Seq sample, a communal global normalization method was applied to all expression values. This unifies the whole set of expression data and makes them comparable. A database was designed where each expression value is associated with corresponding experimental annotations. Sample details were manually curated to be easily understandable by biologists. To make the data easily searchable, a user-friendly web interface was developed that provides versatile data mining web tools via on-the-fly generation of output graphics, such as expression bar plots, comprehensive in planta representations and heatmaps of hierarchically clustered expression data. In addition, it allows for the identification of co-expressed genes and the visualization of correlation networks of co-regulated gene groups. TomExpress provides one of the most complete free resources of publicly available tomato RNA-Seq data, and allows for the immediate interrogation of transcriptional programs that regulate vegetative and reproductive development in tomato under diverse conditions. The design of the pipeline developed in this project enables easy updating of the database with newly published RNA-Seq data, thereby allowing for continuous enrichment of the resource.

...read moreread less

87 citations

Journal Article•DOI•

A comparative transcriptomic study of an allotetraploid and its diploid progenitors illustrates the unique advantages and challenges of RNA-seq in plant species.

[...]

Daniel C. Ilut¹, Jeremy E. Coate¹, Amelia K. Luciano², Thomas G. Owens¹, Gregory D. May³, Andrew Farmer³, Jeff J. Doyle¹ - Show less +3 more•Institutions (3)

Cornell University¹, Yale University², National Center for Genome Resources³

01 Feb 2012-American Journal of Botany

TL;DR: A novel metric developed to address issues arising from high degrees of gene space duplication and a method for dissecting a gene's measured transcript level in a polyploid species into the relative contribution of its homoeologues are presented.

...read moreread less

Abstract: Premise of the study: RNA-seq analysis of plant transcriptomes poses unique challenges due to the highly duplicated nature of plant genomes. We address these challenges in the context of recently formed polyploid species and detail an RNA-seq experiment comparing the leaf transcriptome profi le of an allopolyploid relative of soybean with the diploid species that contributed its homoeologous genomes. Methods: RNA-seq reads were obtained from the three species and were aligned against the genome sequence of Glycine max. Transcript levels were estimated for each gene, relative contributions of polyploidy-duplicated loci (homoeologues) in the tetraploid were identifi ed, and comparisons of transcript profi les and individual genes were used to analyze the regulation of transcript levels. Key results: We present a novel metric developed to address issues arising from high degrees of gene space duplication and a method for dissecting a gene ’ s measured transcript level in a polyploid species into the relative contribution of its homoeologues. We identify the gene family likely contributing to differences in photosynthetic rate between the allotetraploid and its progenitors and show that the tetraploid appears to be using the “ redundant ” gene copies in novel ways. Conclusions: Given the prevalence of polyploidy events in plants, we believe many of the approaches developed here to be applicable, and often necessary, in most plant RNA-seq experiments. The deep sampling provided by RNA-seq allows us to dissect the genetic underpinnings of specifi c phenotypes as well as examine complex interactions within polyploid genomes.

...read moreread less

87 citations

Journal Article•DOI•

Space, the final frontier: A critical review of recent experiments performed in microgravity

[...]

Joshua P. Vandenbrink¹, John Z. Kiss¹•Institutions (1)

University of Mississippi¹

01 Feb 2016-Plant Science

TL;DR: A critical review of recent spaceflight experiments is provided and ways in which future experiments can be designed to improve the value and applicability of the results generated are suggested.

...read moreread less

87 citations

Journal Article•DOI•

Genome-wide analytical approaches for reverse metabolic engineering of industrially relevant phenotypes in yeast.

[...]

Bart Oud¹, Antonius J. A. van Maris¹, Jean-Marc Daran¹, Jack T. Pronk¹•Institutions (1)

Delft University of Technology¹

01 Mar 2012-Fems Yeast Research

TL;DR: It is concluded that for the initial identification of relevant genetic changes, whole genome sequencing is superior to other analytical techniques, such as transcriptome, metabolome, proteome, or array-based genome analysis.

...read moreread less

Abstract: Successful reverse engineering of mutants that have been obtained by nontargeted strain improvement has long presented a major challenge in yeast biotechnology. This paper reviews the use of genome-wide approaches for analysis of Saccharomyces cerevisiae strains originating from evolutionary engineering or random mutagenesis. On the basis of an evaluation of the strengths and weaknesses of different methods, we conclude that for the initial identification of relevant genetic changes, whole genome sequencing is superior to other analytical techniques, such as transcriptome, metabolome, proteome, or array-based genome analysis. Key advantages of this technique over gene expression analysis include the independency of genome sequences on experimental context and the possibility to directly and precisely reproduce the identified changes in naive strains. The predictive value of genome-wide analysis of strains with industrially relevant characteristics can be further improved by classical genetics or simultaneous analysis of strains derived from parallel, independent strain improvement lineages.

...read moreread less

87 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
…
190
191
192
193
194
195
196
…
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Mapping and quantifying mammalian transcriptomes by RNA-Seq.

[...]

Ali Mortazavi¹, Brian A. Williams¹, Kenneth McCue¹, Lorian Schaeffer¹, Barbara J. Wold¹ - Show less +1 more•Institutions (1)

California Institute of Technology¹

29 Jun 2008-Nature Methods

TL;DR: Although >90% of uniquely mapped reads fell within known exons, the remaining data suggest new and revised gene models, including changed or additional promoters, exons and 3′ untranscribed regions, as well as new candidate microRNA precursors.

...read moreread less

Abstract: We have mapped and quantified mouse transcriptomes by deeply sequencing them and recording how frequently each gene is represented in the sequence sample (RNA-Seq). This provides a digital measure of the presence and prevalence of transcripts from known and previously unknown genes. We report reference measurements composed of 41–52 million mapped 25-base-pair reads for poly(A)-selected RNA from adult mouse brain, liver and skeletal muscle tissues. We used RNA standards to quantify transcript prevalence and to test the linear range of transcript detection, which spanned five orders of magnitude. Although >90% of uniquely mapped reads fell within known exons, the remaining data suggest new and revised gene models, including changed or additional promoters, exons and 3′ untranscribed regions, as well as new candidate microRNA precursors. RNA splice events, which are not readily measured by standard gene expression microarray or serial analysis of gene expression methods, were detected directly by mapping splice-crossing sequence reads. We observed 1.45 × 10 5 distinct splices, and alternative splices were prominent, with 3,500 different genes expressing one or more alternate internal splices. The mRNA population specifies a cell’s identity and helps to govern its present and future activities. This has made transcriptome analysis a general phenotyping method, with expression microarrays of many kinds in routine use. Here we explore the possibility that transcriptome analysis, transcript discovery and transcript refinement can be done effectively in large and complex mammalian genomes by ultra-high-throughput sequencing. Expression microarrays are currently the most widely used methodology for transcriptome analysis, although some limitations persist. These include hybridization and cross-hybridization artifacts 1–3 , dye-based detection issues and design constraints that preclude or seriously limit the detection of RNA splice patterns and previously unmapped genes. These issues have made it difficult for standard array designs to provide full sequence comprehensiveness (coverage of all possible genes, including unknown ones, in large genomes) or transcriptome comprehensiveness (reliable detection of all RNAs of all prevalence classes, including the least abundant ones that are physiologically relevant). Other

...read moreread less

12,293 citations

Patent•DOI•

Serial analysis of gene expression

[...]

Kenneth W. Kinzler¹, Victor Velculescu², Bert Vogelstein², Lin Zhang², ヴェルヴレスク，ヴィクター，イー．, ヴォゲルステイン，バート, キンズラー，ケネス，ダブリュ．, ツァン，リン - Show less +4 more•Institutions (2)

Johns Hopkins University¹, Howard Hughes Medical Institute²

04 Oct 2000-Science

TL;DR: Serial analysis of gene expression (SAGE) should provide a broadly applicable means for the quantitative cataloging and comparison of expressed genes in a variety of normal, developmental, and disease states.

...read moreread less

Abstract: PROBLEM TO BE SOLVED: To provide a method for preparing a short nucleotide sequence (tag) which is useful to identify a cDNA oligonucleotide and is derived from a restricted position in a mRNA or a cDNA. SOLUTION: This is the method of preparing a tag for identifying the cDNA oligonucleotide. The above method comprises preparing the cDNA oligonucleotide bearing 5' and 3' terminals, collecting cDNA fragments by cutting the cDNA oligonucleotide with a restriction enzyme at the first restriction endonuclease site, separating a cDNA oligonucleotide bearing 5' or 3' terminal and connecting an oligonucleotide linker to the isolated cDNA fragment bearing the cDNA oligonucleotide 5' or 3' terminal. Here, the oligonucleotide linker contains the recognition site of the second restriction endonuclease enzyme and the isolated cDNA fragment is cut with the second restriction endonuclease enzyme which cuts the cDNA fragment in a section separated from the recognition site to obtain the tag for identifying the cDNA oligonucleotide.

...read moreread less

4,437 citations

Journal Article•DOI•

Mapping short DNA sequencing reads and calling variants using mapping quality scores

[...]

Heng Li¹, Jue Ruan, Richard Durbin•Institutions (1)

Wellcome Trust Sanger Institute¹

01 Nov 2008-Genome Research

TL;DR: This work describes the software MAQ, software that can build assemblies by mapping shotgun short reads to a reference genome, using quality scores to derive genotype calls of the consensus sequence of a diploid genome, e.g., from a human sample.

...read moreread less

Abstract: New sequencing technologies promise a new era in the use of DNA sequence. However, some of these technologies produce very short reads, typically of a few tens of base pairs, and to use these reads effectively requires new algorithms and software. In particular, there is a major issue in efficiently aligning short reads to a reference genome and handling ambiguity or lack of accuracy in this alignment. Here we introduce the concept of mapping quality, a measure of the confidence that a read actually comes from the position it is aligned to by the mapping algorithm. We describe the software MAQ that can build assemblies by mapping shotgun short reads to a reference genome, using quality scores to derive genotype calls of the consensus sequence of a diploid genome, e.g., from a human sample. MAQ makes full use of mate-pair information and estimates the error probability of each read alignment. Error probabilities are also derived for the final genotype calls, using a Bayesian statistical model that incorporates the mapping qualities, error probabilities from the raw sequence quality scores, sampling of the two haplotypes, and an empirical model for correlated errors at a site. Both read mapping and genotype calling are evaluated on simulated data and real data. MAQ is accurate, efficient, versatile, and user-friendly. It is freely available at http://maq.sourceforge.net.

...read moreread less

2,927 citations

Journal Article•DOI•

RNA-seq: An assessment of technical reproducibility and comparison with gene expression arrays

[...]

John C. Marioni¹, Christopher E. Mason, Shrikant Mane, Matthew Stephens, Yoav Gilad - Show less +1 more•Institutions (1)

University of Chicago¹

01 Sep 2008-Genome Research

TL;DR: It is found that the Illumina sequencing data are highly replicable, with relatively little technical variation, and thus, for many purposes, it may suffice to sequence each mRNA sample only once (i.e., using one lane).

...read moreread less

Abstract: Ultra-high-throughput sequencing is emerging as an attractive alternative to microarrays for genotyping, analysis of methylation patterns, and identification of transcription factor binding sites. Here, we describe an application of the Illumina sequencing (formerly Solexa sequencing) platform to study mRNA expression levels. Our goals were to estimate technical variance associated with Illumina sequencing in this context and to compare its ability to identify differentially expressed genes with existing array technologies. To do so, we estimated gene expression differences between liver and kidney RNA samples using multiple sequencing replicates, and compared the sequencing data to results obtained from Affymetrix arrays using the same RNA samples. We find that the Illumina sequencing data are highly replicable, with relatively little technical variation, and thus, for many purposes, it may suffice to sequence each mRNA sample only once (i.e., using one lane). The information in a single lane of Illumina sequencing data appears comparable to that in a single array in enabling identification of differentially expressed genes, while allowing for additional analyses such as detection of low-expressed genes, alternative splice variants, and novel transcripts. Based on our observations, we propose an empirical protocol and a statistical framework for the analysis of gene expression using ultra-high-throughput sequencing technology.

...read moreread less

2,834 citations

Journal Article•DOI•

SOAP: short oligonucleotide alignment program

[...]

Ruiqiang Li¹, Yingrui Li², Karsten Kristiansen², Jun Wang²•Institutions (2)

Beijing Genomics Institute¹, University of Southern Denmark²

01 Mar 2008-Bioinformatics

TL;DR: The program SOAP is designed to handle the huge amounts of short reads generated by parallel sequencing using the new generation Illumina-Solexa sequencing technology, which supports multi-threaded parallel computing and has a batch module for multiple query sets.

...read moreread less

Abstract: Summary: We have developed a program SOAP for efficient gapped and ungapped alignment of short oligonucleotides onto reference sequences. The program is designed to handle the huge amounts of short reads generated by parallel sequencing using the new generation Illumina-Solexa sequencing technology. SOAP is compatible with numerous applications, including single-read or pair-end resequencing, small RNA discovery and mRNA tag sequence mapping. SOAP is a command-driven program, which supports multi-threaded parallel computing, and has a batch module for multiple query sets. Availability: http://soap.genomics.org.cn Contact: soap@genomics.org.cn

...read moreread less

2,729 citations