RNA-Seq: a revolutionary tool for transcriptomics

doi:10.1038/NRG2484

Home
/
Papers
/
RNA-Seq: a revolutionary tool for transcriptomics

Journal Article•DOI•

RNA-Seq: a revolutionary tool for transcriptomics

Zhong Wang¹, Mark Gerstein¹, Michael Snyder¹•Institutions (1)

Yale University¹

01 Jan 2009-Nature Reviews Genetics (Nature Publishing Group)-Vol. 10, Iss: 1, pp 57-63

TL;DR: The RNA-Seq approach to transcriptome profiling that uses deep-sequencing technologies provides a far more precise measurement of levels of transcripts and their isoforms than other methods.

read less

Abstract: RNA-Seq is a recently developed approach to transcriptome profiling that uses deep-sequencing technologies. Studies using this method have already altered our view of the extent and complexity of eukaryotic transcriptomes. RNA-Seq also provides a far more precise measurement of levels of transcripts and their isoforms than other methods. This article describes the RNA-Seq approach, the challenges associated with its application, and the advances made so far in characterizing several eukaryote transcriptomes.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Methods for comprehensive experimental identification of RNA-protein interactions

[...]

Colleen A. McHugh¹, Pamela Russell¹, Mitchell Guttman¹•Institutions (1)

California Institute of Technology¹

27 Jan 2014-Genome Biology

TL;DR: A variety of methods exist to comprehensively define RNA-protein interactions and the considerations required for designing and interpreting these experiments are described.

...read moreread less

Abstract: The importance of RNA-protein interactions in controlling mRNA regulation and non-coding RNA function is increasingly appreciated. A variety of methods exist to comprehensively define RNA-protein interactions. We describe these methods and the considerations required for designing and interpreting these experiments.

...read moreread less

148 citations

Cites background from "RNA-Seq: a revolutionary tool for t..."

...The explosion in sequencing technologies has enabled exploration of the transcriptome at unprecedented depth [3]....
[...]

Journal Article•DOI•

RNA sequencing reveals a diverse and dynamic repertoire of the Xenopus tropicalis transcriptome over development

[...]

Meng How Tan¹, Kin Fai Au¹, Arielle L. Yablonovitch¹, Andrea E. Wills¹, Jason Chuang¹, Julie C. Baker¹, Wing Hung Wong¹, Jin Billy Li¹ - Show less +4 more•Institutions (1)

Stanford University¹

01 Jan 2013-Genome Research

TL;DR: The Xenopus embryo has provided key insights into fate specification, the cell cycle, and other fundamental developmental and cellular processes, yet a comprehensive understanding of its transcriptome is lacking, and paired end RNA sequencing is used to explore the transcriptome in 23 distinct developmental stages.

...read moreread less

Abstract: Xenopus is one of the major model systems for the study of vertebrate embryogenesis and basic cell biological processes. There are multiple advantages to the use of Xenopus as an experimental system, such as the availability of large abundant eggs that are easily manipulated, ready accessibility to any developmental stage, and conservation of cellular pathways between Xenopus and mammals. In the past 50 years, landmark studies on Xenopus have been critical toward our understanding of nuclear reprogramming (Gurdon et al. 1958), embryonic patterning (Harland and Gerhart 1997; De Robertis 2006), membrane channels and receptors (Kusano et al. 1977), and cell cycle control (Murray and Kirschner 1989; Murray et al. 1989; Glotzer et al. 1991). Genomics resources for Xenopus research have emerged in the past 10–15 years. During the early days of the genomics era, several cDNA sequencing efforts, such as EST (expressed sequence tag) projects, have allowed the construction of full length cDNA clones and identification of Xenopus open reading frames (ORFs) (Gilchrist et al. 2004; Morin et al. 2006; Fierro et al. 2007). Microarrays have also been used to investigate the expression levels of annotated genes and gave some insights into transcriptome changes over development as well as expression differences between two closely related frog species, Xenopus laevis and Xenopus tropicalis (Yanai et al. 2011). In addition, forward and reverse genetic screens have uncovered mutations that affect a myriad of organogenesis and differentiation processes in Xenopus (Goda et al. 2006), while a genetic map based on simple sequence length polymorphism (SSLP) markers, which can be used to clone genes identified by mutation, has recently been generated (Wells et al. 2011). Notably, while early developmental and molecular studies have been performed on Xenopus laevis, its closely related cousin Xenopus tropicalis has proven to be more widely used for genetic and genomic research. This is mainly because Xenopus laevis has a more complex pseudotetraploid genome, while Xenopus tropicalis has a smaller and more amenable diploid genome. Hence, the initial genome sequencing effort has been directed mostly at Xenopus tropicalis, whose genome has recently been published (Hellsten et al. 2010). Strikingly, the frog genome is highly syntenic with the human genome, with regions of synteny frequently spanning more than a hundred genes. Nevertheless, although it is largely assembled into multiple scaffolds, the Xenopus tropicalis genome is yet to be sequenced at the same depth and annotated at the same level of details and accuracy as the genomes of human and mouse. Importantly, annotations of protein-coding and noncoding genes are strikingly incomplete, including the widely used RefSeq and Ensembl annotations. The advent of high-throughput sequencing technologies has had an enormous impact on genomics. In particular, such technologies have revolutionized studies of the transcriptome in many species from yeast to humans and have revealed tremendous amounts of complexities and gaps in our understanding of any transcriptome (Wang et al. 2009). Not only does RNA sequencing (RNA-seq) provide a more accurate measurement of expression levels, it provides single nucleotide resolution and has the ability to reveal novel splice junctions, unannotated transcripts, and allele-specific expression. Here, we present the first comprehensive study of the transcriptome of Xenopus tropicalis using RNA-seq over development from a two-cell fertilized embryo to a feeding tadpole. We report evidence for transcription of more than a hundred genes prior to the midblastula transition, when the embryonic genome is generally believed to be transcriptionally silent. We also discovered thousands of novel splicing events, including exon skipping in annotated genes, as well as thousands of unannotated, potentially noncoding transcripts. Hence, our data serve as a valuable resource for developmental biologists and the general genomics community. Furthermore, to extend the reach of our work, we have created an interactive website (http://hci.stanford.edu/∼jcchuang/frog-genes/latest/) that allows users to not only browse the heatmaps in this manuscript but to also query the expression profile of any RefSeq or Ensembl annotated gene with ease.

...read moreread less

147 citations

Cites background from "RNA-Seq: a revolutionary tool for t..."

...In particular, such technologies have revolutionized studies of the transcriptome inmany species from yeast to humans and have revealed tremendous amounts of complexities and gaps in our understanding of any transcriptome (Wang et al. 2009)....
[...]

Journal Article•DOI•

De novo origin of human protein-coding genes.

[...]

Dong-Dong Wu¹, David M. Irwin¹, David M. Irwin², Ya-Ping Zhang¹, Ya-Ping Zhang³ - Show less +1 more•Institutions (3)

Kunming Institute of Zoology¹, University of Toronto², Yunnan University³

10 Nov 2011-PLOS Genetics

TL;DR: RNA–seq data indicate that 60 new protein-coding genes that originated de novo on the human lineage since divergence from the chimpanzee have their highest expression levels in the cerebral cortex and testes, which might suggest that these genes contribute to phenotypic traits that are unique to humans, such as improved cognitive ability.

...read moreread less

Abstract: The de novo origin of a new protein-coding gene from non-coding DNA is considered to be a very rare occurrence in genomes. Here we identify 60 new protein-coding genes that originated de novo on the human lineage since divergence from the chimpanzee. The functionality of these genes is supported by both transcriptional and proteomic evidence. RNA–seq data indicate that these genes have their highest expression levels in the cerebral cortex and testes, which might suggest that these genes contribute to phenotypic traits that are unique to humans, such as improved cognitive ability. Our results are inconsistent with the traditional view that the de novo origin of new genes is very rare, thus there should be greater appreciation of the importance of the de novo origination of genes.

...read moreread less

147 citations

Cites methods from "RNA-Seq: a revolutionary tool for t..."

...using previously described RNA-seq align data [22,23] from 11 human tissues: adipose, whole brain, cerebral cortex, breast, colon, heart, liver, lymph node, skeletal muscle, lung and testes....
[...]
...The recently developed RNA-seq technique has proven to be a powerful approach to detect the expression of genes [23]....
[...]
...RNA-Seq is a recently developed approach for transcriptome profiling using high-throughput sequencing technologies, and is powerful for detecting the expression of genes [23]....
[...]

Journal Article•DOI•

Combining RNA-seq data and homology-based gene prediction for plants, animals and fungi

[...]

Jens Keilwagen¹, Frank Hartung¹, Michael Paulini², Sven Twardziok, Jan Grau³ - Show less +1 more•Institutions (3)

Julius Kühn-Institut¹, European Bioinformatics Institute², Martin Luther University of Halle-Wittenberg³

30 May 2018-BMC Bioinformatics

TL;DR: An extension of the gene prediction program GeMoMa that utilizes amino acid sequence conservation, intron position conservation and optionally RNA-seq data for homology-based gene prediction and might be of great utility for annotating newly sequenced genomes but also for finding homologs of a specific gene or gene family.

...read moreread less

Abstract: Genome annotation is of key importance in many research questions. The identification of protein-coding genes is often based on transcriptome sequencing data, ab-initio or homology-based prediction. Recently, it was demonstrated that intron position conservation improves homology-based gene prediction, and that experimental data improves ab-initio gene prediction. Here, we present an extension of the gene prediction program GeMoMa that utilizes amino acid sequence conservation, intron position conservation and optionally RNA-seq data for homology-based gene prediction. We show on published benchmark data for plants, animals and fungi that GeMoMa performs better than the gene prediction programs BRAKER1, MAKER2, and CodingQuarry, and purely RNA-seq-based pipelines for transcript identification. In addition, we demonstrate that using multiple reference organisms may help to further improve the performance of GeMoMa. Finally, we apply GeMoMa to four nematode species and to the recently published barley reference genome indicating that current annotations of protein-coding genes may be refined using GeMoMa predictions. GeMoMa might be of great utility for annotating newly sequenced genomes but also for finding homologs of a specific gene or gene family. GeMoMa has been published under GNU GPL3 and is freely available at http://www.jstacs.de/index.php/GeMoMa .

...read moreread less

147 citations

Journal Article•DOI•

De novo transcriptome sequencing in Anopheles funestus using illumina RNA-seq technology.

[...]

Jacob E. Crawford¹, Wamdaogo M. Guelbeogo, Antoine Sanou, Alphonse Traoré, Kenneth D. Vernick², Kenneth D. Vernick³, N’Fale Sagnon, Brian P. Lazzaro¹ - Show less +4 more•Institutions (3)

Cornell University¹, University of Minnesota², Centre national de la recherche scientifique³

02 Dec 2010-PLOS ONE

TL;DR: This work sequenced the adult female transcriptome of An.

...read moreread less

Abstract: Background: Anopheles funestus is one of the primary vectors of human malaria, which causes a million deaths each year in sub-Saharan Africa. Few scientific resources are available to facilitate studies of this mosquito species and relatively little is known about its basic biology and evolution, making development and implementation of novel disease control efforts more difficult. The An. funestus genome has not been sequenced, so in order to facilitate genome-scale experimental biology, we have sequenced the adult female transcriptome of An. funestus from a newly founded colony in Burkina Faso, West Africa, using the Illumina GAIIx next generation sequencing platform. Methodology/Principal Findings: We assembled short Illumina reads de novo using a novel approach involving iterative de novo assemblies and ‘‘target-based’’ contig clustering. We then selected a conservative set of 15,527 contigs through comparisons to four Dipteran transcriptomes as well as multiple functional and conserved protein domain databases. Comparison to the Anopheles gambiae immune system identified 339 contigs as putative immune genes, thus identifying a large portion of the immune system that can form the basis for subsequent studies of this important malaria vector. We identified 5,434 1:1 orthologues between An. funestus and An. gambiae and found that among these 1:1 orthologues, the protein sequence of those with putative immune function were significantly more diverged than the transcriptome as a whole. Short read alignments to the contig set revealed almost 367,000 genetic polymorphisms segregating in the An. funestus colony and demonstrated the utility of the assembled transcriptome for use in RNA-seq based measurements of gene expression. Conclusions/Significance: We developed a pipeline that makes de novo transcriptome sequencing possible in virtually any organism at a very reasonable cost ($6,300 in sequencing costs in our case). We anticipate that our approach could be used to develop genomic resources in a diversity of systems for which full genome sequence is currently unavailable. Our An. funestus contig set and analytical results provide a valuable resource for future studies in this non-model, but epidemiologically critical, vector insect.

...read moreread less

147 citations

Cites background from "RNA-Seq: a revolutionary tool for t..."

...RNAseq provides a powerful means of measuring gene-expression because the depth of sequence coverage of a transcript should be proportional to its expression level [4]....
[...]

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
…
100
101
102
103
104
105
106
…
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Mapping and quantifying mammalian transcriptomes by RNA-Seq.

[...]

Ali Mortazavi¹, Brian A. Williams¹, Kenneth McCue¹, Lorian Schaeffer¹, Barbara J. Wold¹ - Show less +1 more•Institutions (1)

California Institute of Technology¹

29 Jun 2008-Nature Methods

TL;DR: Although >90% of uniquely mapped reads fell within known exons, the remaining data suggest new and revised gene models, including changed or additional promoters, exons and 3′ untranscribed regions, as well as new candidate microRNA precursors.

...read moreread less

Abstract: We have mapped and quantified mouse transcriptomes by deeply sequencing them and recording how frequently each gene is represented in the sequence sample (RNA-Seq). This provides a digital measure of the presence and prevalence of transcripts from known and previously unknown genes. We report reference measurements composed of 41–52 million mapped 25-base-pair reads for poly(A)-selected RNA from adult mouse brain, liver and skeletal muscle tissues. We used RNA standards to quantify transcript prevalence and to test the linear range of transcript detection, which spanned five orders of magnitude. Although >90% of uniquely mapped reads fell within known exons, the remaining data suggest new and revised gene models, including changed or additional promoters, exons and 3′ untranscribed regions, as well as new candidate microRNA precursors. RNA splice events, which are not readily measured by standard gene expression microarray or serial analysis of gene expression methods, were detected directly by mapping splice-crossing sequence reads. We observed 1.45 × 10 5 distinct splices, and alternative splices were prominent, with 3,500 different genes expressing one or more alternate internal splices. The mRNA population specifies a cell’s identity and helps to govern its present and future activities. This has made transcriptome analysis a general phenotyping method, with expression microarrays of many kinds in routine use. Here we explore the possibility that transcriptome analysis, transcript discovery and transcript refinement can be done effectively in large and complex mammalian genomes by ultra-high-throughput sequencing. Expression microarrays are currently the most widely used methodology for transcriptome analysis, although some limitations persist. These include hybridization and cross-hybridization artifacts 1–3 , dye-based detection issues and design constraints that preclude or seriously limit the detection of RNA splice patterns and previously unmapped genes. These issues have made it difficult for standard array designs to provide full sequence comprehensiveness (coverage of all possible genes, including unknown ones, in large genomes) or transcriptome comprehensiveness (reliable detection of all RNAs of all prevalence classes, including the least abundant ones that are physiologically relevant). Other

...read moreread less

12,293 citations

Patent•DOI•

Serial analysis of gene expression

[...]

Kenneth W. Kinzler¹, Victor Velculescu², Bert Vogelstein², Lin Zhang², ヴェルヴレスク，ヴィクター，イー．, ヴォゲルステイン，バート, キンズラー，ケネス，ダブリュ．, ツァン，リン - Show less +4 more•Institutions (2)

Johns Hopkins University¹, Howard Hughes Medical Institute²

04 Oct 2000-Science

TL;DR: Serial analysis of gene expression (SAGE) should provide a broadly applicable means for the quantitative cataloging and comparison of expressed genes in a variety of normal, developmental, and disease states.

...read moreread less

Abstract: PROBLEM TO BE SOLVED: To provide a method for preparing a short nucleotide sequence (tag) which is useful to identify a cDNA oligonucleotide and is derived from a restricted position in a mRNA or a cDNA. SOLUTION: This is the method of preparing a tag for identifying the cDNA oligonucleotide. The above method comprises preparing the cDNA oligonucleotide bearing 5' and 3' terminals, collecting cDNA fragments by cutting the cDNA oligonucleotide with a restriction enzyme at the first restriction endonuclease site, separating a cDNA oligonucleotide bearing 5' or 3' terminal and connecting an oligonucleotide linker to the isolated cDNA fragment bearing the cDNA oligonucleotide 5' or 3' terminal. Here, the oligonucleotide linker contains the recognition site of the second restriction endonuclease enzyme and the isolated cDNA fragment is cut with the second restriction endonuclease enzyme which cuts the cDNA fragment in a section separated from the recognition site to obtain the tag for identifying the cDNA oligonucleotide.

...read moreread less

4,437 citations

Journal Article•DOI•

Mapping short DNA sequencing reads and calling variants using mapping quality scores

[...]

Heng Li¹, Jue Ruan, Richard Durbin•Institutions (1)

Wellcome Trust Sanger Institute¹

01 Nov 2008-Genome Research

TL;DR: This work describes the software MAQ, software that can build assemblies by mapping shotgun short reads to a reference genome, using quality scores to derive genotype calls of the consensus sequence of a diploid genome, e.g., from a human sample.

...read moreread less

Abstract: New sequencing technologies promise a new era in the use of DNA sequence. However, some of these technologies produce very short reads, typically of a few tens of base pairs, and to use these reads effectively requires new algorithms and software. In particular, there is a major issue in efficiently aligning short reads to a reference genome and handling ambiguity or lack of accuracy in this alignment. Here we introduce the concept of mapping quality, a measure of the confidence that a read actually comes from the position it is aligned to by the mapping algorithm. We describe the software MAQ that can build assemblies by mapping shotgun short reads to a reference genome, using quality scores to derive genotype calls of the consensus sequence of a diploid genome, e.g., from a human sample. MAQ makes full use of mate-pair information and estimates the error probability of each read alignment. Error probabilities are also derived for the final genotype calls, using a Bayesian statistical model that incorporates the mapping qualities, error probabilities from the raw sequence quality scores, sampling of the two haplotypes, and an empirical model for correlated errors at a site. Both read mapping and genotype calling are evaluated on simulated data and real data. MAQ is accurate, efficient, versatile, and user-friendly. It is freely available at http://maq.sourceforge.net.

...read moreread less

2,927 citations

Journal Article•DOI•

RNA-seq: An assessment of technical reproducibility and comparison with gene expression arrays

[...]

John C. Marioni¹, Christopher E. Mason, Shrikant Mane, Matthew Stephens, Yoav Gilad - Show less +1 more•Institutions (1)

University of Chicago¹

01 Sep 2008-Genome Research

TL;DR: It is found that the Illumina sequencing data are highly replicable, with relatively little technical variation, and thus, for many purposes, it may suffice to sequence each mRNA sample only once (i.e., using one lane).

...read moreread less

Abstract: Ultra-high-throughput sequencing is emerging as an attractive alternative to microarrays for genotyping, analysis of methylation patterns, and identification of transcription factor binding sites. Here, we describe an application of the Illumina sequencing (formerly Solexa sequencing) platform to study mRNA expression levels. Our goals were to estimate technical variance associated with Illumina sequencing in this context and to compare its ability to identify differentially expressed genes with existing array technologies. To do so, we estimated gene expression differences between liver and kidney RNA samples using multiple sequencing replicates, and compared the sequencing data to results obtained from Affymetrix arrays using the same RNA samples. We find that the Illumina sequencing data are highly replicable, with relatively little technical variation, and thus, for many purposes, it may suffice to sequence each mRNA sample only once (i.e., using one lane). The information in a single lane of Illumina sequencing data appears comparable to that in a single array in enabling identification of differentially expressed genes, while allowing for additional analyses such as detection of low-expressed genes, alternative splice variants, and novel transcripts. Based on our observations, we propose an empirical protocol and a statistical framework for the analysis of gene expression using ultra-high-throughput sequencing technology.

...read moreread less

2,834 citations

Journal Article•DOI•

SOAP: short oligonucleotide alignment program

[...]

Ruiqiang Li¹, Yingrui Li², Karsten Kristiansen², Jun Wang²•Institutions (2)

Beijing Genomics Institute¹, University of Southern Denmark²

01 Mar 2008-Bioinformatics

TL;DR: The program SOAP is designed to handle the huge amounts of short reads generated by parallel sequencing using the new generation Illumina-Solexa sequencing technology, which supports multi-threaded parallel computing and has a batch module for multiple query sets.

...read moreread less

Abstract: Summary: We have developed a program SOAP for efficient gapped and ungapped alignment of short oligonucleotides onto reference sequences. The program is designed to handle the huge amounts of short reads generated by parallel sequencing using the new generation Illumina-Solexa sequencing technology. SOAP is compatible with numerous applications, including single-read or pair-end resequencing, small RNA discovery and mRNA tag sequence mapping. SOAP is a command-driven program, which supports multi-threaded parallel computing, and has a batch module for multiple query sets. Availability: http://soap.genomics.org.cn Contact: soap@genomics.org.cn

...read moreread less

2,729 citations