RNA-Seq: a revolutionary tool for transcriptomics

doi:10.1038/NRG2484

Home
/
Papers
/
RNA-Seq: a revolutionary tool for transcriptomics

Journal Article•DOI•

RNA-Seq: a revolutionary tool for transcriptomics

Zhong Wang¹, Mark Gerstein¹, Michael Snyder¹•Institutions (1)

Yale University¹

01 Jan 2009-Nature Reviews Genetics (Nature Publishing Group)-Vol. 10, Iss: 1, pp 57-63

TL;DR: The RNA-Seq approach to transcriptome profiling that uses deep-sequencing technologies provides a far more precise measurement of levels of transcripts and their isoforms than other methods.

read less

Abstract: RNA-Seq is a recently developed approach to transcriptome profiling that uses deep-sequencing technologies. Studies using this method have already altered our view of the extent and complexity of eukaryotic transcriptomes. RNA-Seq also provides a far more precise measurement of levels of transcripts and their isoforms than other methods. This article describes the RNA-Seq approach, the challenges associated with its application, and the advances made so far in characterizing several eukaryote transcriptomes.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

RNA sequencing atopic dermatitis transcriptome profiling provides insights into novel disease mechanisms with potential therapeutic implications

[...]

Mayte Suárez-Fariñas¹, Benjamin Ungar¹, Benjamin Ungar², Joel Correa da Rosa¹, David Adrian Ewald¹, David Adrian Ewald³, Mariya Rozenblit¹, Mariya Rozenblit², Juana Gonzalez¹, Hui Xu¹, Hui Xu², Xiuzhong Zheng¹, Xiangyu Peng², Xiangyu Peng¹, Yeriel Estrada¹, Yeriel Estrada², Stacey R. Dillon⁴, James G. Krueger¹, Emma Guttman-Yassky¹, Emma Guttman-Yassky² - Show less +16 more•Institutions (4)

Rockefeller University¹, Icahn School of Medicine at Mount Sinai², Technical University of Denmark³, Bristol-Myers Squibb⁴

01 May 2015-The Journal of Allergy and Clinical Immunology

TL;DR: Through RNA-seq, novel disease pathology is unraveled, including increased expression of the novel TREM-1 pathway and the IL-36 cytokine in patients with AD in the first report of a lesional AD phenotype using RNA- sequencing and the first direct comparison between platforms in this disease.

...read moreread less

Abstract: Background Genomic profiling of lesional and nonlesional skin of patients with atopic dermatitis (AD) using microarrays has led to increased understanding of AD and identification of novel therapeutic targets. However, the limitations of microarrays might decrease detection of AD genes. These limitations might be lessened with next-generation RNA sequencing (RNA-seq). Objective We sought to define the lesional AD transcriptome using RNA-seq and compare it using microarrays performed on the same cohort. Methods RNA-seq and microarrays were performed to identify differentially expressed genes (criteria: fold change, ≥2.0; false discovery rate ≤0.05) in lesional versus nonlesional skin from 18 patients with moderate-to-severe AD, with real-time PCR (RT-PCR) and immunohistochemistry used for validation. Results Both platforms showed robust disease transcriptomes and correlated well with RT-PCR. The common AD transcriptome identified by using both techniques contained 217 genes, including inflammatory ( S100A8/A9/A12, CXCL1 , and 2′-5′-oligoadenylate synthetase-like [ OASL ]) and barrier ( MKi67 , keratin 16 [ K16 ], and claudin 8 [ CLDN8 ]) AD-related genes. Although fold change estimates determined by using RNA-seq showed somewhat better agreement with RT-PCR (intraclass correlation coefficient, 0.57 and 0.70 for microarrays and RNA-seq vs RT-PCR, respectively), bias was not eliminated. Among genes uniquely identified by using RNA-seq were triggering receptor expressed on myeloid cells 1 (TREM-1) signaling (eg, CCL2 , CCL3 , and single immunoglobulin domain IL1R1 related [ SIGIRR ]) and IL-36 isoform genes. TREM-1 is a surface receptor implicated in innate and adaptive immunity that amplifies infection-related inflammation. Conclusions This is the first report of a lesional AD phenotype using RNA-seq and the first direct comparison between platforms in this disease. Both platforms robustly characterize the AD transcriptome. Through RNA-seq, we unraveled novel disease pathology, including increased expression of the novel TREM-1 pathway and the IL-36 cytokine in patients with AD.

...read moreread less

216 citations

Journal Article•DOI•

Maintenance of duplicate genes and their functional redundancy by reduced expression

[...]

Wenfeng Qian¹, Ben-Yang Liao², Andrew Ying-Fei Chang², Jianzhi Zhang¹•Institutions (2)

University of Michigan¹, National Health Research Institutes²

01 Oct 2010-Trends in Genetics

TL;DR: It is proposed that expression reduction, a special type of subfunctionalization, facilitates the retention of duplicates and the conservation of their ancestral functions, and gene expression data from both yeasts and mammals show a substantial decrease in the level of gene expression after duplication.

...read moreread less

216 citations

Journal Article•DOI•

A powerful and flexible statistical framework for testing hypotheses of allele-specific gene expression from RNA-seq data

[...]

Daniel A. Skelly¹, Marnie Johansson¹, Jennifer Madeoy¹, Jon Wakefield¹, Joshua M. Akey¹ - Show less +1 more•Institutions (1)

University of Washington¹

01 Oct 2011-Genome Research

TL;DR: This work developed a powerful and flexible hierarchical Bayesian model that combines information across loci to allow both global and locus-specific inferences about allele-specific expression (ASE) and provides a rigorous, quantitative, and high-resolution tool for profiling ASE across whole genomes.

...read moreread less

Abstract: Variation in gene expression is thought to make a significant contribution to phenotypic diversity among individuals within populations. Although high-throughput cDNA sequencing offers a unique opportunity to delineate the genome-wide architecture of regulatory variation, new statistical methods need to be developed to capitalize on the wealth of information contained in RNA-seq data sets. To this end, we developed a powerful and flexible hierarchical Bayesian model that combines information across loci to allow both global and locus-specific inferences about allele-specific expression (ASE). We applied our methodology to a large RNA-seq data set obtained in a diploid hybrid of two diverse Saccharomyces cerevisiae strains, as well as to RNA-seq data from an individual human genome. Our statistical framework accurately quantifies levels of ASE with specified false-discovery rates, achieving high reproducibility between independent sequencing platforms. We pinpoint loci that show unusual and biologically interesting patterns of ASE, including allele-specific alternative splicing and transcription termination sites. Our methodology provides a rigorous, quantitative, and high-resolution tool for profiling ASE across whole genomes.

...read moreread less

215 citations

Journal Article•DOI•

Machine Learning in Genomic Medicine: A Review of Computational Problems and Data Sets

[...]

Michael K. K. Leung¹, Andrew Delong¹, Babak Alipanahi¹, Brendan J. Frey²•Institutions (2)

University of Toronto¹, Canadian Institute for Advanced Research²

01 Jan 2016

TL;DR: In this paper, the authors focus on how machine learning can help to model the relationship between DNA and the quantities of key molecules in the cell, with the premise that these quantities, which they refer to as cell variables, may be associated with disease risks.

...read moreread less

Abstract: In this paper, we provide an introduction to machine learning tasks that address important problems in genomic medicine. One of the goals of genomic medicine is to determine how variations in the DNA of individuals can affect the risk of different diseases, and to find causal explanations so that targeted therapies can be designed. Here we focus on how machine learning can help to model the relationship between DNA and the quantities of key molecules in the cell, with the premise that these quantities, which we refer to as cell variables, may be associated with disease risks. Modern biology allows high-throughput measurement of many such cell variables, including gene expression, splicing, and proteins binding to nucleic acids, which can all be treated as training targets for predictive models. With the growing availability of large-scale data sets and advanced computational techniques such as deep learning, researchers can help to usher in a new era of effective genomic medicine.

...read moreread less

214 citations

Journal Article•DOI•

Cassava Genome From a Wild Ancestor to Cultivated Varieties

[...]

Wenquan Wang¹, Binxiao Feng¹, Jingfa Xiao², Zhiqiang Xia¹, Xincheng Zhou¹, Pinghua Li¹, Weixiong Zhang³, Ying Wang, Birger Lindberg Møller⁴, Peng Zhang, Ming-Cheng Luo⁵, Gong Xiao, Jingxing Liu², Jun Yang, Songbi Chen, Pablo D. Rabinowicz⁶, Xin Chen¹, Hong-Bin Zhang⁷, Henan Ceballos⁸, Qunfeng Lou⁹, Meiling Zou¹, Luiz Joaquim Castelo Branco Carvalho¹⁰, Changying Zeng¹, Jing Xia³, Shixiang Sun², Fu Yuhua¹, Haiyan Wang¹, Cheng Lu¹, Mengbin Ruan¹, Shuigeng Zhou¹¹, Zhicheng Wu¹¹, Hui Liu¹¹, Rubini Kannangara⁴, Kirsten Jørgensen⁴, Rebecca Louise Neale⁴, Maya Bonde⁴, Nanna Heinz⁴, Wenli Zhu, Shujuan Wang¹, Yang Zhang¹, Kun Pan¹, Mingfu Wen¹, Ping-An Ma¹, Zhengxu Li¹, Meizhen Hu¹, Wenbin Liao¹, Wenbin Hu¹, Shengkui Zhang¹, Jinli Pei¹, Anping Guo¹, Jianchun Guo¹, Jiaming Zhang¹, Zhengwen Zhang, Jianqiu Ye, Wenjun Ou, Yaqin Ma⁵, Xinyue Liu⁶, Luke J. Tallon⁶, Kevin Galens⁶, Sandra Ott⁶, Jie Huang, Jingjing Xue, Feifei An, Qingqun Yao, Xiaojing Lu, Martin A. Fregene⁸, L. Augusto Becerra Lopez-Lavalle⁸, Jiajie Wu⁵, Frank M. You⁵, Meili Chen², Songnian Hu², Guojiang Wu, Silin Zhong¹², Peng Ling¹³, Chen Yeyuan, Qinghuang Wang¹, Guodao Liu, Bin Liu, Kaimian Li, Ming Peng¹ - Show less +76 more•Institutions (13)

Chinese Academy of Tropical Agricultural Sciences¹, Beijing Institute of Genomics², Washington University in St. Louis³, University of Copenhagen⁴, University of California, Davis⁵, University of Maryland, Baltimore⁶, Texas A&M University⁷, International Center for Tropical Agriculture⁸, Nanjing Agricultural University⁹, Empresa Brasileira de Pesquisa Agropecuária¹⁰, Fudan University¹¹, The Chinese University of Hong Kong¹², University of Florida¹³

10 Oct 2014-Nature Communications

TL;DR: The analyses reveal that genes involved in photosynthesis, starch accumulation and abiotic stresses have been positively selected, whereas those involved in cell wall biosynthesis and secondary metabolism have been negatively selected in the cultivated varieties, reflecting the result of natural selection and domestication.

...read moreread less

Abstract: Cassava is a major tropical food crop in the Euphorbiaceae family that has high carbohydrate production potential and adaptability to diverse environments. Here we present the draft genome sequences of a wild ancestor and a domesticated variety of cassava and comparative analyses with a partial inbred line. We identify 1,584 and 1,678 gene models specific to the wild and domesticated varieties, respectively, and discover high heterozygosity and millions of single-nucleotide variations. Our analyses reveal that genes involved in photosynthesis, starch accumulation and abiotic stresses have been positively selected, whereas those involved in cell wall biosynthesis and secondary metabolism, including cyanogenic glucoside formation, have been negatively selected in the cultivated varieties, reflecting the result of natural selection and domestication. Differences in microRNA genes and retrotransposon regulation could partly explain an increased carbon flux towards starch accumulation and reduced cyanogenic glucoside accumulation in domesticated cassava. These results may contribute to genetic improvement of cassava through better understanding of its biology.

...read moreread less

213 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
…
61
62
63
64
65
66
67
…
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Mapping and quantifying mammalian transcriptomes by RNA-Seq.

[...]

Ali Mortazavi¹, Brian A. Williams¹, Kenneth McCue¹, Lorian Schaeffer¹, Barbara J. Wold¹ - Show less +1 more•Institutions (1)

California Institute of Technology¹

29 Jun 2008-Nature Methods

TL;DR: Although >90% of uniquely mapped reads fell within known exons, the remaining data suggest new and revised gene models, including changed or additional promoters, exons and 3′ untranscribed regions, as well as new candidate microRNA precursors.

...read moreread less

Abstract: We have mapped and quantified mouse transcriptomes by deeply sequencing them and recording how frequently each gene is represented in the sequence sample (RNA-Seq). This provides a digital measure of the presence and prevalence of transcripts from known and previously unknown genes. We report reference measurements composed of 41–52 million mapped 25-base-pair reads for poly(A)-selected RNA from adult mouse brain, liver and skeletal muscle tissues. We used RNA standards to quantify transcript prevalence and to test the linear range of transcript detection, which spanned five orders of magnitude. Although >90% of uniquely mapped reads fell within known exons, the remaining data suggest new and revised gene models, including changed or additional promoters, exons and 3′ untranscribed regions, as well as new candidate microRNA precursors. RNA splice events, which are not readily measured by standard gene expression microarray or serial analysis of gene expression methods, were detected directly by mapping splice-crossing sequence reads. We observed 1.45 × 10 5 distinct splices, and alternative splices were prominent, with 3,500 different genes expressing one or more alternate internal splices. The mRNA population specifies a cell’s identity and helps to govern its present and future activities. This has made transcriptome analysis a general phenotyping method, with expression microarrays of many kinds in routine use. Here we explore the possibility that transcriptome analysis, transcript discovery and transcript refinement can be done effectively in large and complex mammalian genomes by ultra-high-throughput sequencing. Expression microarrays are currently the most widely used methodology for transcriptome analysis, although some limitations persist. These include hybridization and cross-hybridization artifacts 1–3 , dye-based detection issues and design constraints that preclude or seriously limit the detection of RNA splice patterns and previously unmapped genes. These issues have made it difficult for standard array designs to provide full sequence comprehensiveness (coverage of all possible genes, including unknown ones, in large genomes) or transcriptome comprehensiveness (reliable detection of all RNAs of all prevalence classes, including the least abundant ones that are physiologically relevant). Other

...read moreread less

12,293 citations

Patent•DOI•

Serial analysis of gene expression

[...]

Kenneth W. Kinzler¹, Victor Velculescu², Bert Vogelstein², Lin Zhang², ヴェルヴレスク，ヴィクター，イー．, ヴォゲルステイン，バート, キンズラー，ケネス，ダブリュ．, ツァン，リン - Show less +4 more•Institutions (2)

Johns Hopkins University¹, Howard Hughes Medical Institute²

04 Oct 2000-Science

TL;DR: Serial analysis of gene expression (SAGE) should provide a broadly applicable means for the quantitative cataloging and comparison of expressed genes in a variety of normal, developmental, and disease states.

...read moreread less

Abstract: PROBLEM TO BE SOLVED: To provide a method for preparing a short nucleotide sequence (tag) which is useful to identify a cDNA oligonucleotide and is derived from a restricted position in a mRNA or a cDNA. SOLUTION: This is the method of preparing a tag for identifying the cDNA oligonucleotide. The above method comprises preparing the cDNA oligonucleotide bearing 5' and 3' terminals, collecting cDNA fragments by cutting the cDNA oligonucleotide with a restriction enzyme at the first restriction endonuclease site, separating a cDNA oligonucleotide bearing 5' or 3' terminal and connecting an oligonucleotide linker to the isolated cDNA fragment bearing the cDNA oligonucleotide 5' or 3' terminal. Here, the oligonucleotide linker contains the recognition site of the second restriction endonuclease enzyme and the isolated cDNA fragment is cut with the second restriction endonuclease enzyme which cuts the cDNA fragment in a section separated from the recognition site to obtain the tag for identifying the cDNA oligonucleotide.

...read moreread less

4,437 citations

Journal Article•DOI•

Mapping short DNA sequencing reads and calling variants using mapping quality scores

[...]

Heng Li¹, Jue Ruan, Richard Durbin•Institutions (1)

Wellcome Trust Sanger Institute¹

01 Nov 2008-Genome Research

TL;DR: This work describes the software MAQ, software that can build assemblies by mapping shotgun short reads to a reference genome, using quality scores to derive genotype calls of the consensus sequence of a diploid genome, e.g., from a human sample.

...read moreread less

Abstract: New sequencing technologies promise a new era in the use of DNA sequence. However, some of these technologies produce very short reads, typically of a few tens of base pairs, and to use these reads effectively requires new algorithms and software. In particular, there is a major issue in efficiently aligning short reads to a reference genome and handling ambiguity or lack of accuracy in this alignment. Here we introduce the concept of mapping quality, a measure of the confidence that a read actually comes from the position it is aligned to by the mapping algorithm. We describe the software MAQ that can build assemblies by mapping shotgun short reads to a reference genome, using quality scores to derive genotype calls of the consensus sequence of a diploid genome, e.g., from a human sample. MAQ makes full use of mate-pair information and estimates the error probability of each read alignment. Error probabilities are also derived for the final genotype calls, using a Bayesian statistical model that incorporates the mapping qualities, error probabilities from the raw sequence quality scores, sampling of the two haplotypes, and an empirical model for correlated errors at a site. Both read mapping and genotype calling are evaluated on simulated data and real data. MAQ is accurate, efficient, versatile, and user-friendly. It is freely available at http://maq.sourceforge.net.

...read moreread less

2,927 citations

Journal Article•DOI•

RNA-seq: An assessment of technical reproducibility and comparison with gene expression arrays

[...]

John C. Marioni¹, Christopher E. Mason, Shrikant Mane, Matthew Stephens, Yoav Gilad - Show less +1 more•Institutions (1)

University of Chicago¹

01 Sep 2008-Genome Research

TL;DR: It is found that the Illumina sequencing data are highly replicable, with relatively little technical variation, and thus, for many purposes, it may suffice to sequence each mRNA sample only once (i.e., using one lane).

...read moreread less

Abstract: Ultra-high-throughput sequencing is emerging as an attractive alternative to microarrays for genotyping, analysis of methylation patterns, and identification of transcription factor binding sites. Here, we describe an application of the Illumina sequencing (formerly Solexa sequencing) platform to study mRNA expression levels. Our goals were to estimate technical variance associated with Illumina sequencing in this context and to compare its ability to identify differentially expressed genes with existing array technologies. To do so, we estimated gene expression differences between liver and kidney RNA samples using multiple sequencing replicates, and compared the sequencing data to results obtained from Affymetrix arrays using the same RNA samples. We find that the Illumina sequencing data are highly replicable, with relatively little technical variation, and thus, for many purposes, it may suffice to sequence each mRNA sample only once (i.e., using one lane). The information in a single lane of Illumina sequencing data appears comparable to that in a single array in enabling identification of differentially expressed genes, while allowing for additional analyses such as detection of low-expressed genes, alternative splice variants, and novel transcripts. Based on our observations, we propose an empirical protocol and a statistical framework for the analysis of gene expression using ultra-high-throughput sequencing technology.

...read moreread less

2,834 citations

Journal Article•DOI•

SOAP: short oligonucleotide alignment program

[...]

Ruiqiang Li¹, Yingrui Li², Karsten Kristiansen², Jun Wang²•Institutions (2)

Beijing Genomics Institute¹, University of Southern Denmark²

01 Mar 2008-Bioinformatics

TL;DR: The program SOAP is designed to handle the huge amounts of short reads generated by parallel sequencing using the new generation Illumina-Solexa sequencing technology, which supports multi-threaded parallel computing and has a batch module for multiple query sets.

...read moreread less

Abstract: Summary: We have developed a program SOAP for efficient gapped and ungapped alignment of short oligonucleotides onto reference sequences. The program is designed to handle the huge amounts of short reads generated by parallel sequencing using the new generation Illumina-Solexa sequencing technology. SOAP is compatible with numerous applications, including single-read or pair-end resequencing, small RNA discovery and mRNA tag sequence mapping. SOAP is a command-driven program, which supports multi-threaded parallel computing, and has a batch module for multiple query sets. Availability: http://soap.genomics.org.cn Contact: soap@genomics.org.cn

...read moreread less

2,729 citations