RNA-Seq: a revolutionary tool for transcriptomics

doi:10.1038/NRG2484

Home
/
Papers
/
RNA-Seq: a revolutionary tool for transcriptomics

Journal Article•DOI•

RNA-Seq: a revolutionary tool for transcriptomics

Zhong Wang¹, Mark Gerstein¹, Michael Snyder¹•Institutions (1)

Yale University¹

01 Jan 2009-Nature Reviews Genetics (Nature Publishing Group)-Vol. 10, Iss: 1, pp 57-63

TL;DR: The RNA-Seq approach to transcriptome profiling that uses deep-sequencing technologies provides a far more precise measurement of levels of transcripts and their isoforms than other methods.

read less

Abstract: RNA-Seq is a recently developed approach to transcriptome profiling that uses deep-sequencing technologies. Studies using this method have already altered our view of the extent and complexity of eukaryotic transcriptomes. RNA-Seq also provides a far more precise measurement of levels of transcripts and their isoforms than other methods. This article describes the RNA-Seq approach, the challenges associated with its application, and the advances made so far in characterizing several eukaryote transcriptomes.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Differential expression of miRNAs in colorectal cancer: comparison of paired tumor tissue and adjacent normal mucosa using high-throughput sequencing.

[...]

Julian Hamfjord¹, Astrid M. Stangeland¹, Timothy P. Hughes¹, Martina Skrede¹, Kjell Magne Tveit², Kjell Magne Tveit¹, Tone Ikdahl¹, Elin H. Kure - Show less +4 more•Institutions (2)

Oslo University Hospital¹, University of Oslo²

17 Apr 2012-PLOS ONE

TL;DR: The results would serve as a robust training set for validation of potential biomarkers in a larger cohort study and the hypothesis that there are differences in miRNA expression between adenocarcinomas and neuroendocrine tumors of the colon is supported.

...read moreread less

Abstract: We present the results of a global study of dysregulated miRNAs in paired samples of normal mucosa and tumor from eight patients with colorectal cancer. Although there is existing data of miRNA contribution to colorectal tumorigenesis, these studies are typically small to medium scale studies of cell lines or non-paired tumor samples. The present study is to our knowledge unique in two respects. Firstly, the normal and adjacent tumor tissue samples are paired, thus taking into account the baseline differences between individuals when testing for differential expression. Secondly, we use high-throughput sequencing, thus enabling a comprehensive survey of all miRNAs expressed in the tissues. We use Illumina sequencing technology to perform sequencing and two different tools to statistically test for differences in read counts per gene between samples: edgeR when using the pair information and DESeq when ignoring this information, i.e., treating tumor and normal samples as independent groups. We identify 37 miRNAs that are significantly dysregulated in both statistical approaches, 19 down-regulated and 18 up-regulated. Some of these miRNAs are previously published as potential regulators in colorectal adenocarcinomas such as miR-1, miR-96 and miR-145. Our comprehensive survey of differentially expressed miRNAs thus confirms some existing findings. We have also discovered 16 dysregulated miRNAs, which to our knowledge have not previously been associated with colorectal carcinogenesis: the following significantly down-regulated miR-490-3p, -628-3p/-5p, -1297, -3151, -3163, -3622a-5p, -3656 and the up-regulated miR-105, -549, -1269, -1827, -3144-3p, -3177, -3180-3p, -4326. Although the study is preliminary with only eight patients included, we believe the results add to the present knowledge on miRNA dysregulation in colorectal carcinogenesis. As such the results would serve as a robust training set for validation of potential biomarkers in a larger cohort study. Finally, we also present data supporting the hypothesis that there are differences in miRNA expression between adenocarcinomas and neuroendocrine tumors of the colon.

...read moreread less

151 citations

Cites background from "RNA-Seq: a revolutionary tool for t..."

...Examples of this include expression of unknown target sequences, RNA editing events and other RNA sequence variations such as polymorphisms [21,22,23]....
[...]
...This is partly due to the greatly increased dynamic range for quantification of gene expression provided by the high-throughput sequencing method [22]....
[...]

Journal Article•DOI•

RNA-seq and microarray complement each other in transcriptome profiling

[...]

Sunitha Kogenaru¹, Qing Yan¹, Yinping Guo¹, Nian Wang¹•Institutions (1)

University of Florida¹

15 Nov 2012-BMC Genomics

TL;DR: This study demonstrated that RNA-seq and microarray complement each other in transcriptome profiling and significantly advanced the understanding of the regulome of the critical transcriptional factor HrpX.

...read moreread less

Abstract: Background: RNA-seq and microarray are the two popular methods employed for genome-wide transcriptome profiling. Current comparison studies have shown that transcriptome quantified by these two methods correlated well. However, none of them have addressed if they complement each other, considering the strengths and the limitations inherent with them. The pivotal requirement to address this question is the knowledge of a well known data set. In this regard, HrpX regulome from pathogenic bacteria serves as an ideal choice as the target genes of HrpX transcription factor are well studied due to their central role in pathogenicity. Results: We compared the performance of RNA-seq and microarray in their ability to detect known HrpX target genes by profiling the transcriptome from the wild-type and the hrpX mutant strains of γ-Proteobacterium Xanthomonas citri subsp. citri. Our comparative analysis indicated that gene expression levels quantified by RNA-seq and microarray well-correlated both at absolute as well as relative levels (Spearman correlation-coefficient, rs > 0.76). Further, the expression levels quantified by RNA-seq and microarray for the significantly differentially expressed genes (DEGs) also well-correlated with qRT-PCR based quantification (rs= 0.58 to 0.94). Finally, in addition to the 55 newly identified DEGs, 72% of the already known HrpX target genes were detected by both RNA-seq and microarray, while, the remaining 28% could only be detected by either one of the methods. Conclusions: This study has significantly advanced our understanding of the regulome of the critical transcriptional factor HrpX. RNA-seq and microarray together provide a more comprehensive picture of HrpX regulome by uniquely identifying new DEGs. Our study demonstrated that RNA-seq and microarray complement each other in transcriptome profiling.

...read moreread less

151 citations

Cites background or methods from "RNA-Seq: a revolutionary tool for t..."

...Even though, initially microarray has been instrumental in whole transcriptome analysis, currently RNA-seq is becoming a preferred method of choice, since it is considered to effectively surmount the limitations of microarray [1,21-23]....
[...]
...Currently several studies have been conducted to compare the performance of RNA-seq and microarray in quantifying the expression level of genes, by focusing on various aspects like reproducibility, accuracy, statistical issues, technical and biological variabilities [1,15,21,27-30]....
[...]
...The sheer ability to simultaneously quantify the expression levels for a vast number of genes has revolutionized the biomedical research, facilitating the analysis of global gene expression patterns at the genome-wide scale [1]....
[...]
...; licensee BioMed Cent Commons Attribution License (http://creativec reproduction in any medium, provided the or profiling methods, RNA-seq and DNA microarray stand out as the two widely used genome-wide gene expression quantification methods [1-17]....
[...]
...Further, RNA-seq data contains very low background signal, a higher dynamic range of expression levels, and also relatively small amount of total RNA required for quantification, when compared to microarray [1,23]....
[...]

Journal Article•DOI•

SpliceGrapher: detecting patterns of alternative splicing from RNA-Seq data in the context of gene models and EST data

[...]

Mark F. Rogers¹, Julie Thomas¹, Anireddy S. N. Reddy¹, Asa Ben-Hur¹•Institutions (1)

Colorado State University¹

31 Jan 2012-Genome Biology

TL;DR: Analysis of plant and human data indicates that the machine learning approach used by SpliceGrapher is useful for discriminating between real and spurious splice sites, and can improve the reliability of detection of alternative splicing.

...read moreread less

Abstract: We propose a method for predicting splice graphs that enhances curated gene models using evidence from RNA-Seq and EST alignments. Results obtained using RNA-Seq experiments in Arabidopsis thaliana show that predictions made by our SpliceGrapher method are more consistent with current gene models than predictions made by TAU and Cufflinks. Furthermore, analysis of plant and human data indicates that the machine learning approach used by SpliceGrapher is useful for discriminating between real and spurious splice sites, and can improve the reliability of detection of alternative splicing. SpliceGrapher is available for download at http://SpliceGrapher.sf.net.

...read moreread less

151 citations

Cites background from "RNA-Seq: a revolutionary tool for t..."

...Background Deep transcriptome sequencing (RNA-Seq) with nextgeneration sequencing (NGS) technologies is providing unprecedented opportunities for researchers to probe the transcriptomes of many species [1-5]....
[...]

Journal Article•DOI•

Unifying immunology with informatics and multiscale biology

[...]

Brian A. Kidd¹, Lauren A. Peters¹, Eric E. Schadt¹, Joel T. Dudley¹•Institutions (1)

Icahn School of Medicine at Mount Sinai¹

01 Feb 2014-Nature Immunology

TL;DR: Some of the computational analysis tools for high-dimensional data and how they can be applied to immunology are reviewed.

...read moreread less

Abstract: Dudley and colleagues review some of the computational analysis tools for high-dimensional data and how they can be applied to immunology.

...read moreread less

151 citations

Journal Article•DOI•

Full-length de novo assembly of RNA-seq data in pea (Pisum sativum L.) provides a gene expression atlas and gives insights into root nodulation in this species.

[...]

Susete Alves-Carvalho¹, Grégoire Aubert¹, Sébastien Carrère¹, Corinne Cruaud, Anne-Lise Brochot¹, Françoise Jacquin¹, Anthony Klein¹, Chantal Martin¹, Karen Boucherot¹, Jonathan Kreplak¹, Corinne Da Silva, Sandra Moreau¹, Pascal Gamas¹, Patrick Wincker, Jérôme Gouzy¹, Judith Burstin¹ - Show less +12 more•Institutions (1)

Institut national de la recherche agronomique¹

01 Oct 2015-Plant Journal

TL;DR: This resource has allowed identification of the pea orthologs of major nodulation genes characterized in recent years in model species, as a major step towards deciphering unresolved pea nodulation phenotypes.

...read moreread less

Abstract: Next-generation sequencing technologies allow an almost exhaustive survey of the transcriptome, even in species with no available genome sequence. To produce a Unigene set representing most of the expressed genes of pea, 20 cDNA libraries produced from various plant tissues harvested at various developmental stages from plants grown under contrasting nitrogen conditions were sequenced. Around one billion reads and 100 Gb of sequence were de novo assembled. Following several steps of redundancy reduction, 46 099 contigs with N50 length of 1667 nt were identified. These constitute the 'Cameor' Unigene set. The high depth of sequencing allowed identification of rare transcripts and detected expression for approximately 80% of contigs in each library. The Unigene set is now available online (http://bios.dijon.inra.fr/FATAL/cgi/pscam.cgi), allowing (i) searches for pea orthologs of candidate genes based on gene sequences from other species, or based on annotation, (ii) determination of transcript expression patterns using various metrics, (iii) identification of uncharacterized genes with interesting patterns of expression, and (iv) comparison of gene ontology pathways between tissues. This resource has allowed identification of the pea orthologs of major nodulation genes characterized in recent years in model species, as a major step towards deciphering unresolved pea nodulation phenotypes. In addition to a remarkable conservation of the early transcriptome nodulation apparatus between pea and Medicago truncatula, some specific features were highlighted. The resource provides a reference for the pea exome, and will facilitate transcriptome and proteome approaches as well as SNP discovery in pea.

...read moreread less

151 citations

Cites methods or result from "RNA-Seq: a revolutionary tool for t..."

...Probably due to this very high depth of sequencing, expression of most PsCam_LowCopy contigs was detected in almost all plant tissues (Figure 3), as previously reported in other experiments (Wang et al., 2009), and more than 80% of contigs appeared to be expressed in each library (Figure 3)....
[...]
...RNA-seq has been described as a very robust and sensitive tool for transcriptomics (’t Hoen et al., 2008; Wang et al., 2009; Garg and Jain, 2013)....
[...]

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
…
97
98
99
100
101
102
103
…
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Mapping and quantifying mammalian transcriptomes by RNA-Seq.

[...]

Ali Mortazavi¹, Brian A. Williams¹, Kenneth McCue¹, Lorian Schaeffer¹, Barbara J. Wold¹ - Show less +1 more•Institutions (1)

California Institute of Technology¹

29 Jun 2008-Nature Methods

TL;DR: Although >90% of uniquely mapped reads fell within known exons, the remaining data suggest new and revised gene models, including changed or additional promoters, exons and 3′ untranscribed regions, as well as new candidate microRNA precursors.

...read moreread less

Abstract: We have mapped and quantified mouse transcriptomes by deeply sequencing them and recording how frequently each gene is represented in the sequence sample (RNA-Seq). This provides a digital measure of the presence and prevalence of transcripts from known and previously unknown genes. We report reference measurements composed of 41–52 million mapped 25-base-pair reads for poly(A)-selected RNA from adult mouse brain, liver and skeletal muscle tissues. We used RNA standards to quantify transcript prevalence and to test the linear range of transcript detection, which spanned five orders of magnitude. Although >90% of uniquely mapped reads fell within known exons, the remaining data suggest new and revised gene models, including changed or additional promoters, exons and 3′ untranscribed regions, as well as new candidate microRNA precursors. RNA splice events, which are not readily measured by standard gene expression microarray or serial analysis of gene expression methods, were detected directly by mapping splice-crossing sequence reads. We observed 1.45 × 10 5 distinct splices, and alternative splices were prominent, with 3,500 different genes expressing one or more alternate internal splices. The mRNA population specifies a cell’s identity and helps to govern its present and future activities. This has made transcriptome analysis a general phenotyping method, with expression microarrays of many kinds in routine use. Here we explore the possibility that transcriptome analysis, transcript discovery and transcript refinement can be done effectively in large and complex mammalian genomes by ultra-high-throughput sequencing. Expression microarrays are currently the most widely used methodology for transcriptome analysis, although some limitations persist. These include hybridization and cross-hybridization artifacts 1–3 , dye-based detection issues and design constraints that preclude or seriously limit the detection of RNA splice patterns and previously unmapped genes. These issues have made it difficult for standard array designs to provide full sequence comprehensiveness (coverage of all possible genes, including unknown ones, in large genomes) or transcriptome comprehensiveness (reliable detection of all RNAs of all prevalence classes, including the least abundant ones that are physiologically relevant). Other

...read moreread less

12,293 citations

Patent•DOI•

Serial analysis of gene expression

[...]

Kenneth W. Kinzler¹, Victor Velculescu², Bert Vogelstein², Lin Zhang², ヴェルヴレスク，ヴィクター，イー．, ヴォゲルステイン，バート, キンズラー，ケネス，ダブリュ．, ツァン，リン - Show less +4 more•Institutions (2)

Johns Hopkins University¹, Howard Hughes Medical Institute²

04 Oct 2000-Science

TL;DR: Serial analysis of gene expression (SAGE) should provide a broadly applicable means for the quantitative cataloging and comparison of expressed genes in a variety of normal, developmental, and disease states.

...read moreread less

Abstract: PROBLEM TO BE SOLVED: To provide a method for preparing a short nucleotide sequence (tag) which is useful to identify a cDNA oligonucleotide and is derived from a restricted position in a mRNA or a cDNA. SOLUTION: This is the method of preparing a tag for identifying the cDNA oligonucleotide. The above method comprises preparing the cDNA oligonucleotide bearing 5' and 3' terminals, collecting cDNA fragments by cutting the cDNA oligonucleotide with a restriction enzyme at the first restriction endonuclease site, separating a cDNA oligonucleotide bearing 5' or 3' terminal and connecting an oligonucleotide linker to the isolated cDNA fragment bearing the cDNA oligonucleotide 5' or 3' terminal. Here, the oligonucleotide linker contains the recognition site of the second restriction endonuclease enzyme and the isolated cDNA fragment is cut with the second restriction endonuclease enzyme which cuts the cDNA fragment in a section separated from the recognition site to obtain the tag for identifying the cDNA oligonucleotide.

...read moreread less

4,437 citations

Journal Article•DOI•

Mapping short DNA sequencing reads and calling variants using mapping quality scores

[...]

Heng Li¹, Jue Ruan, Richard Durbin•Institutions (1)

Wellcome Trust Sanger Institute¹

01 Nov 2008-Genome Research

TL;DR: This work describes the software MAQ, software that can build assemblies by mapping shotgun short reads to a reference genome, using quality scores to derive genotype calls of the consensus sequence of a diploid genome, e.g., from a human sample.

...read moreread less

Abstract: New sequencing technologies promise a new era in the use of DNA sequence. However, some of these technologies produce very short reads, typically of a few tens of base pairs, and to use these reads effectively requires new algorithms and software. In particular, there is a major issue in efficiently aligning short reads to a reference genome and handling ambiguity or lack of accuracy in this alignment. Here we introduce the concept of mapping quality, a measure of the confidence that a read actually comes from the position it is aligned to by the mapping algorithm. We describe the software MAQ that can build assemblies by mapping shotgun short reads to a reference genome, using quality scores to derive genotype calls of the consensus sequence of a diploid genome, e.g., from a human sample. MAQ makes full use of mate-pair information and estimates the error probability of each read alignment. Error probabilities are also derived for the final genotype calls, using a Bayesian statistical model that incorporates the mapping qualities, error probabilities from the raw sequence quality scores, sampling of the two haplotypes, and an empirical model for correlated errors at a site. Both read mapping and genotype calling are evaluated on simulated data and real data. MAQ is accurate, efficient, versatile, and user-friendly. It is freely available at http://maq.sourceforge.net.

...read moreread less

2,927 citations

Journal Article•DOI•

RNA-seq: An assessment of technical reproducibility and comparison with gene expression arrays

[...]

John C. Marioni¹, Christopher E. Mason, Shrikant Mane, Matthew Stephens, Yoav Gilad - Show less +1 more•Institutions (1)

University of Chicago¹

01 Sep 2008-Genome Research

TL;DR: It is found that the Illumina sequencing data are highly replicable, with relatively little technical variation, and thus, for many purposes, it may suffice to sequence each mRNA sample only once (i.e., using one lane).

...read moreread less

Abstract: Ultra-high-throughput sequencing is emerging as an attractive alternative to microarrays for genotyping, analysis of methylation patterns, and identification of transcription factor binding sites. Here, we describe an application of the Illumina sequencing (formerly Solexa sequencing) platform to study mRNA expression levels. Our goals were to estimate technical variance associated with Illumina sequencing in this context and to compare its ability to identify differentially expressed genes with existing array technologies. To do so, we estimated gene expression differences between liver and kidney RNA samples using multiple sequencing replicates, and compared the sequencing data to results obtained from Affymetrix arrays using the same RNA samples. We find that the Illumina sequencing data are highly replicable, with relatively little technical variation, and thus, for many purposes, it may suffice to sequence each mRNA sample only once (i.e., using one lane). The information in a single lane of Illumina sequencing data appears comparable to that in a single array in enabling identification of differentially expressed genes, while allowing for additional analyses such as detection of low-expressed genes, alternative splice variants, and novel transcripts. Based on our observations, we propose an empirical protocol and a statistical framework for the analysis of gene expression using ultra-high-throughput sequencing technology.

...read moreread less

2,834 citations

Journal Article•DOI•

SOAP: short oligonucleotide alignment program

[...]

Ruiqiang Li¹, Yingrui Li², Karsten Kristiansen², Jun Wang²•Institutions (2)

Beijing Genomics Institute¹, University of Southern Denmark²

01 Mar 2008-Bioinformatics

TL;DR: The program SOAP is designed to handle the huge amounts of short reads generated by parallel sequencing using the new generation Illumina-Solexa sequencing technology, which supports multi-threaded parallel computing and has a batch module for multiple query sets.

...read moreread less

Abstract: Summary: We have developed a program SOAP for efficient gapped and ungapped alignment of short oligonucleotides onto reference sequences. The program is designed to handle the huge amounts of short reads generated by parallel sequencing using the new generation Illumina-Solexa sequencing technology. SOAP is compatible with numerous applications, including single-read or pair-end resequencing, small RNA discovery and mRNA tag sequence mapping. SOAP is a command-driven program, which supports multi-threaded parallel computing, and has a batch module for multiple query sets. Availability: http://soap.genomics.org.cn Contact: soap@genomics.org.cn

...read moreread less

2,729 citations