scispace - formally typeset
Search or ask a question
Journal ArticleDOI

RNA-Seq: a revolutionary tool for transcriptomics

01 Jan 2009-Nature Reviews Genetics (Nature Publishing Group)-Vol. 10, Iss: 1, pp 57-63
TL;DR: The RNA-Seq approach to transcriptome profiling that uses deep-sequencing technologies provides a far more precise measurement of levels of transcripts and their isoforms than other methods.
Abstract: RNA-Seq is a recently developed approach to transcriptome profiling that uses deep-sequencing technologies. Studies using this method have already altered our view of the extent and complexity of eukaryotic transcriptomes. RNA-Seq also provides a far more precise measurement of levels of transcripts and their isoforms than other methods. This article describes the RNA-Seq approach, the challenges associated with its application, and the advances made so far in characterizing several eukaryote transcriptomes.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
25 Apr 2014-eLife
TL;DR: This work identifies a candidate set of 508 variance associated SNPs from lymphoblastoid cell lines and shows that GxE plays a role in ∼70% of these associations, and investigates 57 epistatic interactions that replicated in a smaller dataset, explaining on average 4.3% of phenotypic variance.
Abstract: Every person has two copies of each gene: one is inherited from their mother and the other from their father. These two copies are often not identical because there can be many different variants of the same gene in the human population. Traits (such as height, body mass and risk of disease) vary from one person to the next—and for many traits this variation depends in part on the different gene variants that each person has inherited. Studies seeking to find the differences in DNA that can predict this variation have often assumed that the changes in DNA act on traits independently of the effect of environment and of other genetic variants. In contrast, studies with animals have shown that some genetic variants can interact to produce a bigger (or smaller) effect than would be expected from simply ‘adding together’ their individual effects—a phenomenon called epistasis. But how much does epistasis contribute to variation in human traits, if at all? This question has been much disputed, and is difficult to test, not least because of the sheer number of interactions to assess: tens of millions of changes in DNA have been observed in the human genome, and so there are many more than billions of possible combinations of these changes to investigate. Here, Brown et al. have examined the sequences of all the genes that were expressed in cells taken from a cohort of twins and searched for genetic variants that show these epistatic interactions. By studying gene expression, which can be greatly affected by small changes in the DNA code, Brown et al. were able to identify 508 variants that had a bigger than expected effect on the level of gene expression. This may be a sign that these variants act in combinations: if within one genome a variant increased expression and in another it decreased expression, then this would cause greater variation in gene expression. Further investigation of these 508 variants led to the discovery of 256 examples of epistasis, and 57 of these were replicated in samples from another cohort. Brown et al. calculated that these epistatic interactions explained up to 16% of the variation in gene expression. Furthermore, as well as being involved in epistatic interactions, about 70% of the genetic variants that had an effect on the variation in gene expression were also involved in interactions between genes and the environment. In addition to showing that epistasis contributes to variation in human traits, the work of Brown et al. could help to uncover interactions behind complex traits—beyond the expression level of a gene—that could not previously be investigated.

144 citations


Additional excerpts

  • ...We note that microarray data are also less suitable than RNA-seq for the purpose of detecting v-eQTL, because saturation of signal limits discrimination at extremes (Wang et al., 2009)....

    [...]

Journal ArticleDOI
TL;DR: Identifying and characterizing the proximate mechanisms involved in phenotypic plasticity and genetic assimilation promises to help advance the basic understanding of evolutionary innovation and diversification.

144 citations


Cites background from "RNA-Seq: a revolutionary tool for t..."

  • ...…at multiple stages of development, can be used to quantify the abundances of all transcripts in an organism’s genome, even for non-model species (Wang et al., 2009), providing a foundation for defining modules of co-expressed genes and examining the regulatory relationships among these modules…...

    [...]

Journal ArticleDOI
TL;DR: The described extraction protocol provides a simple and straightforward method for the efficient extraction of lipids, metabolites and proteins from minute amounts of a single sample, enabling the targeted but also untargeted high-throughput analyses of diverse biological tissues and samples.
Abstract: The elucidation of complex biological systems requires integration of multiple molecular parameters. Accordingly, high throughput methods like transcriptomics, proteomics, metabolomics and lipidomics have emerged to provide the tools for successful system-wide investigations. Unfortunately, optimized analysis of different compounds requires specific extraction procedures in combination with specific analytical instrumentation. However, the most efficient extraction protocols often only cover a restricted number of compounds due to the different physico-chemical properties of these biological compounds. Consequently, comprehensive analysis of several molecular components like polar primary metabolites next to lipids or proteins require multiple aliquots to enable the specific extraction procedures required to cover these diverse compound classes. This multi-parallel sample handling of different sample aliquots is therefore not only more sample intensive, it also requires more time and effort to obtain the required extracts. To circumvent large sample amounts, distributed into several aliquots for the comprehensive extraction of most relevant biological compounds, we developed a simple, robust and reproducible two-phase liquid–liquid extraction protocol. This one-step extraction protocol allows for the analysis of polar-, semi-polar and hydrophobic metabolites, next to insoluble or precipitated compounds, including proteins, starch and plant cell wall components, from a single sample. The method is scalable regarding the used sample amounts but also the employed volumes and can be performed in microcentrifuge tubes, enabling high throughput analysis. The obtained fractions are fully compatible with common analytical methods, including spectroscopic, chromatographic and mass spectrometry-based techniques. To document the utility of the described protocol, we used 25 mg of Arabidopsis thaliana rosette leaves for the generation of multi-omics data sets, covering lipidomics, metabolomics and proteomics. The obtained data allowed us to measure and annotate more than 200 lipid compounds, 100 primary metabolites, 50 secondary metabolites and 2000 proteins. The described extraction protocol provides a simple and straightforward method for the efficient extraction of lipids, metabolites and proteins from minute amounts of a single sample, enabling the targeted but also untargeted high-throughput analyses of diverse biological tissues and samples.

144 citations


Cites background from "RNA-Seq: a revolutionary tool for t..."

  • ...To obtain the analytical data for the diverse molecular constituents, ‘-omic’ platforms, including transcriptomics [2], metabolomics [3, 4], lipidomics [5, 6] and proteomics [7, 8] have emerged to provide the ever growing tool-box for successful systems biology investigations [9]....

    [...]

Journal ArticleDOI
TL;DR: It is shown here that the large majority of protein-coding genes express normal levels of mRNA in PABPN1–deficient cells, arguing that P ABPN1 may not be required for the bulk of mRNA expression, and a novel function for PABN1 in lncRNA turnover is identified.
Abstract: The poly(A)-binding protein nuclear 1 (PABPN1) is a ubiquitously expressed protein that is thought to function during mRNA poly(A) tail synthesis in the nucleus. Despite the predicted role of PABPN1 in mRNA polyadenylation, little is known about the impact of PABPN1 deficiency on human gene expression. Specifically, it remains unclear whether PABPN1 is required for general mRNA expression or for the regulation of specific transcripts. Using RNA sequencing (RNA–seq), we show here that the large majority of protein-coding genes express normal levels of mRNA in PABPN1–deficient cells, arguing that PABPN1 may not be required for the bulk of mRNA expression. Unexpectedly, and contrary to the view that PABPN1 functions exclusively at protein-coding genes, we identified a class of PABPN1–sensitive long noncoding RNAs (lncRNAs), the majority of which accumulated in conditions of PABPN1 deficiency. Using the spliced transcript produced from a snoRNA host gene as a model lncRNA, we show that PABPN1 promotes lncRNA turnover via a polyadenylation-dependent mechanism. PABPN1–sensitive lncRNAs are targeted by the exosome and the RNA helicase MTR4/SKIV2L2; yet, the polyadenylation activity of TRF4-2, a putative human TRAMP subunit, appears to be dispensable for PABPN1–dependent regulation. In addition to identifying a novel function for PABPN1 in lncRNA turnover, our results provide new insights into the post-transcriptional regulation of human lncRNAs.

144 citations


Cites background from "RNA-Seq: a revolutionary tool for t..."

  • ...accurately measure changes in gene expression [46]....

    [...]

  • ...of transcripts, which are not well suited to measure overall mRNA expression [46]....

    [...]

Journal ArticleDOI
TL;DR: Widespread AS changes in NSCLC that impact cell signaling in a manner that likely contributes to tumorigenesis are revealed.
Abstract: Alternative splicing (AS) is a widespread mechanism underlying the generation of proteomic and regulatory complexity. However, which of the myriad of human AS events play important roles in disease is largely unknown. To identify frequently occurring AS events in lung cancer, we used AS microarray profiling and reverse transcription-PCR (RT-PCR) assays to survey patient-matched normal and adenocarcinoma tumor tissues from the lungs of 29 individuals diagnosed with non-small cell lung cancer (NSCLC). Of 5,183 profiled alternative exons, four displayed tumor-associated changes in the majority of the patients. These events affected transcripts from the VEGFA, MACF1, APP, and NUMB genes. Similar AS changes were detected in NUMB and APP transcripts in primary breast and colon tumors. Tumor-associated increases in NUMB exon 9 inclusion correlated with reduced levels of NUMB protein expression and activation of the Notch signaling pathway, an event that has been linked to tumorigenesis. Moreover, short hairpin RNA (shRNA) knockdown of NUMB followed by isoform-specific rescue revealed that expression of the exon 9-skipped (nontumor) isoform represses Notch target gene activation whereas expression of the exon 9-included (tumor) isoform lacks this activity and is capable of promoting cell proliferation. The results thus reveal widespread AS changes in NSCLC that impact cell signaling in a manner that likely contributes to tumorigenesis.

144 citations


Cites background from "RNA-Seq: a revolutionary tool for t..."

  • ...Future cancer and other disease profiling studies will benefit from the use of high-throughput RNA sequencing, since this technology can yield data sets with greater coverage of the transcriptome as well as greater quantitative accuracy (4, 55)....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: Although >90% of uniquely mapped reads fell within known exons, the remaining data suggest new and revised gene models, including changed or additional promoters, exons and 3′ untranscribed regions, as well as new candidate microRNA precursors.
Abstract: We have mapped and quantified mouse transcriptomes by deeply sequencing them and recording how frequently each gene is represented in the sequence sample (RNA-Seq). This provides a digital measure of the presence and prevalence of transcripts from known and previously unknown genes. We report reference measurements composed of 41–52 million mapped 25-base-pair reads for poly(A)-selected RNA from adult mouse brain, liver and skeletal muscle tissues. We used RNA standards to quantify transcript prevalence and to test the linear range of transcript detection, which spanned five orders of magnitude. Although >90% of uniquely mapped reads fell within known exons, the remaining data suggest new and revised gene models, including changed or additional promoters, exons and 3′ untranscribed regions, as well as new candidate microRNA precursors. RNA splice events, which are not readily measured by standard gene expression microarray or serial analysis of gene expression methods, were detected directly by mapping splice-crossing sequence reads. We observed 1.45 × 10 5 distinct splices, and alternative splices were prominent, with 3,500 different genes expressing one or more alternate internal splices. The mRNA population specifies a cell’s identity and helps to govern its present and future activities. This has made transcriptome analysis a general phenotyping method, with expression microarrays of many kinds in routine use. Here we explore the possibility that transcriptome analysis, transcript discovery and transcript refinement can be done effectively in large and complex mammalian genomes by ultra-high-throughput sequencing. Expression microarrays are currently the most widely used methodology for transcriptome analysis, although some limitations persist. These include hybridization and cross-hybridization artifacts 1–3 , dye-based detection issues and design constraints that preclude or seriously limit the detection of RNA splice patterns and previously unmapped genes. These issues have made it difficult for standard array designs to provide full sequence comprehensiveness (coverage of all possible genes, including unknown ones, in large genomes) or transcriptome comprehensiveness (reliable detection of all RNAs of all prevalence classes, including the least abundant ones that are physiologically relevant). Other

12,293 citations

PatentDOI
04 Oct 2000-Science
TL;DR: Serial analysis of gene expression (SAGE) should provide a broadly applicable means for the quantitative cataloging and comparison of expressed genes in a variety of normal, developmental, and disease states.
Abstract: PROBLEM TO BE SOLVED: To provide a method for preparing a short nucleotide sequence (tag) which is useful to identify a cDNA oligonucleotide and is derived from a restricted position in a mRNA or a cDNA. SOLUTION: This is the method of preparing a tag for identifying the cDNA oligonucleotide. The above method comprises preparing the cDNA oligonucleotide bearing 5' and 3' terminals, collecting cDNA fragments by cutting the cDNA oligonucleotide with a restriction enzyme at the first restriction endonuclease site, separating a cDNA oligonucleotide bearing 5' or 3' terminal and connecting an oligonucleotide linker to the isolated cDNA fragment bearing the cDNA oligonucleotide 5' or 3' terminal. Here, the oligonucleotide linker contains the recognition site of the second restriction endonuclease enzyme and the isolated cDNA fragment is cut with the second restriction endonuclease enzyme which cuts the cDNA fragment in a section separated from the recognition site to obtain the tag for identifying the cDNA oligonucleotide.

4,437 citations

Journal ArticleDOI
TL;DR: This work describes the software MAQ, software that can build assemblies by mapping shotgun short reads to a reference genome, using quality scores to derive genotype calls of the consensus sequence of a diploid genome, e.g., from a human sample.
Abstract: New sequencing technologies promise a new era in the use of DNA sequence. However, some of these technologies produce very short reads, typically of a few tens of base pairs, and to use these reads effectively requires new algorithms and software. In particular, there is a major issue in efficiently aligning short reads to a reference genome and handling ambiguity or lack of accuracy in this alignment. Here we introduce the concept of mapping quality, a measure of the confidence that a read actually comes from the position it is aligned to by the mapping algorithm. We describe the software MAQ that can build assemblies by mapping shotgun short reads to a reference genome, using quality scores to derive genotype calls of the consensus sequence of a diploid genome, e.g., from a human sample. MAQ makes full use of mate-pair information and estimates the error probability of each read alignment. Error probabilities are also derived for the final genotype calls, using a Bayesian statistical model that incorporates the mapping qualities, error probabilities from the raw sequence quality scores, sampling of the two haplotypes, and an empirical model for correlated errors at a site. Both read mapping and genotype calling are evaluated on simulated data and real data. MAQ is accurate, efficient, versatile, and user-friendly. It is freely available at http://maq.sourceforge.net.

2,927 citations

Journal ArticleDOI
TL;DR: It is found that the Illumina sequencing data are highly replicable, with relatively little technical variation, and thus, for many purposes, it may suffice to sequence each mRNA sample only once (i.e., using one lane).
Abstract: Ultra-high-throughput sequencing is emerging as an attractive alternative to microarrays for genotyping, analysis of methylation patterns, and identification of transcription factor binding sites. Here, we describe an application of the Illumina sequencing (formerly Solexa sequencing) platform to study mRNA expression levels. Our goals were to estimate technical variance associated with Illumina sequencing in this context and to compare its ability to identify differentially expressed genes with existing array technologies. To do so, we estimated gene expression differences between liver and kidney RNA samples using multiple sequencing replicates, and compared the sequencing data to results obtained from Affymetrix arrays using the same RNA samples. We find that the Illumina sequencing data are highly replicable, with relatively little technical variation, and thus, for many purposes, it may suffice to sequence each mRNA sample only once (i.e., using one lane). The information in a single lane of Illumina sequencing data appears comparable to that in a single array in enabling identification of differentially expressed genes, while allowing for additional analyses such as detection of low-expressed genes, alternative splice variants, and novel transcripts. Based on our observations, we propose an empirical protocol and a statistical framework for the analysis of gene expression using ultra-high-throughput sequencing technology.

2,834 citations

Journal ArticleDOI
TL;DR: The program SOAP is designed to handle the huge amounts of short reads generated by parallel sequencing using the new generation Illumina-Solexa sequencing technology, which supports multi-threaded parallel computing and has a batch module for multiple query sets.
Abstract: Summary: We have developed a program SOAP for efficient gapped and ungapped alignment of short oligonucleotides onto reference sequences. The program is designed to handle the huge amounts of short reads generated by parallel sequencing using the new generation Illumina-Solexa sequencing technology. SOAP is compatible with numerous applications, including single-read or pair-end resequencing, small RNA discovery and mRNA tag sequence mapping. SOAP is a command-driven program, which supports multi-threaded parallel computing, and has a batch module for multiple query sets. Availability: http://soap.genomics.org.cn Contact: soap@genomics.org.cn

2,729 citations


"RNA-Seq: a revolutionary tool for t..." refers methods in this paper

  • ...There are several programs for mapping reads to the genome, including ELAND, SOA...

    [...]