scispace - formally typeset
Search or ask a question
Journal ArticleDOI

RNA-Seq: a revolutionary tool for transcriptomics

01 Jan 2009-Nature Reviews Genetics (Nature Publishing Group)-Vol. 10, Iss: 1, pp 57-63
TL;DR: The RNA-Seq approach to transcriptome profiling that uses deep-sequencing technologies provides a far more precise measurement of levels of transcripts and their isoforms than other methods.
Abstract: RNA-Seq is a recently developed approach to transcriptome profiling that uses deep-sequencing technologies. Studies using this method have already altered our view of the extent and complexity of eukaryotic transcriptomes. RNA-Seq also provides a far more precise measurement of levels of transcripts and their isoforms than other methods. This article describes the RNA-Seq approach, the challenges associated with its application, and the advances made so far in characterizing several eukaryote transcriptomes.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: By analyzing more than 1700 sequences, this work provides an updated and comprehensive phylogeny of MIPs and shows that while bacteria and archaea generally function with one copy of each a water channel, recurrent independent expansions have greatly diversified the structures and functions of the different members of both MIP paralog subfamilies throughout eukaryote evolution.

196 citations


Cites background from "RNA-Seq: a revolutionary tool for t..."

  • ...Moreover, available transcriptomes allow investigating expression profiles of protein family members in an evolutionary context [28,29]....

    [...]

Journal ArticleDOI
TL;DR: The results significantly improved soybean gene annotation, and also provide valuable resources for functional genomics and studies of the evolution of duplicated genes from WGDs in soybean.
Abstract: Soybean is one of the most important crops, providing large amounts of dietary proteins and edible oil, and is also an excellent model for studying evolution of duplicated genes. However, relative to the model plants Arabidopsis and rice, the present knowledge about soybean transcriptome is quite limited. In this study, we employed RNA-seq to investigate transcriptomes of 11 soybean tissues, for genome-wide discovery of truly expressed genes, and novel and alternative transcripts, as well as analyses of conservation and divergence of duplicated genes and their functional implications. We detected a total of 54,132 high-confidence expressed genes, and identified 6,718 novel transcriptional regions with a mean length of 372 bp. We also provided strong evidence for alternative splicing (AS) events for ~15.9% of the genes with two or more exons. Among them, 1,834 genes exhibited stage-dependent AS, and 202 genes had tissue-biased exon-skipping events. We further defined the conservation and divergence in expression patterns between duplicated gene pairs from recent whole genome duplications (WGDs); differentially expressed genes, tissue preferentially expressed genes, transcription factors and specific gene family members were identified for shoot apical meristem and flower development. Our results significantly improved soybean gene annotation, and also provide valuable resources for functional genomics and studies of the evolution of duplicated genes from WGDs in soybean.

195 citations


Cites background from "RNA-Seq: a revolutionary tool for t..."

  • ...The recent development of high-throughput RNA sequencing (RNA-seq) technologies has greatly improved sensitivity of transcriptomics and allowed detection of transcripts without a priori gene models [10-12]....

    [...]

Journal ArticleDOI
TL;DR: Recommendations include the following: (1) animal model selection, with commentary on the fidelity of mimicking facets of the human disease; (2) experimental design and its impact on the interpretation of data; and (3) standard methods to enhance accuracy of measurements and characterization of atherosclerotic lesions.
Abstract: Animal studies are a foundation for defining mechanisms of atherosclerosis and potential targets of drugs to prevent lesion development or reverse the disease. In the current literature, it is common to see contradictions of outcomes in animal studies from different research groups, leading to the paucity of extrapolations of experimental findings into understanding the human disease. The purpose of this statement is to provide guidelines for development and execution of experimental design and interpretation in animal studies. Recommendations include the following: (1) animal model selection, with commentary on the fidelity of mimicking facets of the human disease; (2) experimental design and its impact on the interpretation of data; and (3) standard methods to enhance accuracy of measurements and characterization of atherosclerotic lesions.

194 citations


Cites background from "RNA-Seq: a revolutionary tool for t..."

  • ...information about long noncoding RNAs, RNA splicing, and allele-specific expression.(246) RNA sequencing is consider-...

    [...]

Journal ArticleDOI
TL;DR: The results indicate that P. tricornutum continued carbon dioxide reduction when population growth was arrested and different carbon-concentrating mechanisms were used dependent upon exogenous DIC levels, and suggest that the build-up of precursors to the acetyl-CoA carboxylases may play a more significant role in TAG synthesis rather than the actual enzyme levels of acetyl
Abstract: Phaeodactylum tricornutum is a unicellular diatom in the class Bacillariophyceae. The full genome has been sequenced (<30 Mb), and approximately 20 to 30% triacylglyceride (TAG) accumulation on a dry cell basis has been reported under different growth conditions. To elucidate P. tricornutum gene expression profiles during nutrient-deprivation and lipid-accumulation, cell cultures were grown with a nitrate to phosphate ratio of 20:1 (N:P) and whole-genome transcripts were monitored over time via RNA-sequence determination. The specific Nile Red (NR) fluorescence (NR fluorescence per cell) increased over time; however, the increase in NR fluorescence was initiated before external nitrate was completely exhausted. Exogenous phosphate was depleted before nitrate, and these results indicated that the depletion of exogenous phosphate might be an early trigger for lipid accumulation that is magnified upon nitrate depletion. As expected, many of the genes associated with nitrate and phosphate utilization were up-expressed. The diatom-specific cyclins cyc 7 and cyc 10 were down-expressed during the nutrient-deplete state, and cyclin B1 was up-expressed during lipid-accumulation after growth cessation. While many of the genes associated with the C3 pathway for photosynthetic carbon reduction were not significantly altered, genes involved in a putative C4 pathway for photosynthetic carbon assimilation were up-expressed as the cells depleted nitrate, phosphate, and exogenous dissolved inorganic carbon (DIC) levels. P. tricornutum has multiple, putative carbonic anhydrases, but only two were significantly up-expressed (2-fold and 4-fold) at the last time point when exogenous DIC levels had increased after the cessation of growth. Alternative pathways that could utilize HCO3- were also suggested by the gene expression profiles (e.g., putative propionyl-CoA and methylmalonyl-CoA decarboxylases). The results indicate that P. tricornutum continued carbon dioxide reduction when population growth was arrested and different carbon-concentrating mechanisms were used dependent upon exogenous DIC levels. Based upon overall low gene expression levels for fatty acid synthesis, the results also suggest that the build-up of precursors to the acetyl-CoA carboxylases may play a more significant role in TAG synthesis rather than the actual enzyme levels of acetyl-CoA carboxylases per se. The presented insights into the types and timing of cellular responses to inorganic carbon will help maximize photoautotrophic carbon flow to lipid accumulation.

194 citations


Cites methods from "RNA-Seq: a revolutionary tool for t..."

  • ...Similar pipelines have been validated as well as the accuracy of RNA-seq with spike in experiments and qPCR comparisons [28,29,63-66]....

    [...]

Journal ArticleDOI
TL;DR: This review focuses on the inverse Ising problem and closely related problems, namely how to infer the coupling strengths between spins given observed spin correlations, magnetizations, or other data.
Abstract: Inverse problems in statistical physics are motivated by the challenges of ‘big data’ in different fields, in particular high-throughput experiments in biology. In inverse problems, the usual procedure of statistical physics needs to be reversed: Instead of calculating observables on the basis of model parameters, we seek to infer parameters of a model based on observations. In this review, we focus on the inverse Ising problem and closely related problems, namely how to infer the coupling strengths between spins given observed spin correlations, magnetizations, or other data. We review applications of the inverse Ising problem, including the reconstruction of neural connections, protein structure determination, and the inference of gene regulatory networks. For the inverse Ising problem in equilibrium, a number of controlled and uncontrolled approximate solutions have been developed in the statistical mechanics community. A particularly strong method, pseudolikelihood, stems from statistics. We also revi...

193 citations


Cites background from "RNA-Seq: a revolutionary tool for t..."

  • ...Then the relative mRNA levels follow directly from counts of sequence reads [237]....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: Although >90% of uniquely mapped reads fell within known exons, the remaining data suggest new and revised gene models, including changed or additional promoters, exons and 3′ untranscribed regions, as well as new candidate microRNA precursors.
Abstract: We have mapped and quantified mouse transcriptomes by deeply sequencing them and recording how frequently each gene is represented in the sequence sample (RNA-Seq). This provides a digital measure of the presence and prevalence of transcripts from known and previously unknown genes. We report reference measurements composed of 41–52 million mapped 25-base-pair reads for poly(A)-selected RNA from adult mouse brain, liver and skeletal muscle tissues. We used RNA standards to quantify transcript prevalence and to test the linear range of transcript detection, which spanned five orders of magnitude. Although >90% of uniquely mapped reads fell within known exons, the remaining data suggest new and revised gene models, including changed or additional promoters, exons and 3′ untranscribed regions, as well as new candidate microRNA precursors. RNA splice events, which are not readily measured by standard gene expression microarray or serial analysis of gene expression methods, were detected directly by mapping splice-crossing sequence reads. We observed 1.45 × 10 5 distinct splices, and alternative splices were prominent, with 3,500 different genes expressing one or more alternate internal splices. The mRNA population specifies a cell’s identity and helps to govern its present and future activities. This has made transcriptome analysis a general phenotyping method, with expression microarrays of many kinds in routine use. Here we explore the possibility that transcriptome analysis, transcript discovery and transcript refinement can be done effectively in large and complex mammalian genomes by ultra-high-throughput sequencing. Expression microarrays are currently the most widely used methodology for transcriptome analysis, although some limitations persist. These include hybridization and cross-hybridization artifacts 1–3 , dye-based detection issues and design constraints that preclude or seriously limit the detection of RNA splice patterns and previously unmapped genes. These issues have made it difficult for standard array designs to provide full sequence comprehensiveness (coverage of all possible genes, including unknown ones, in large genomes) or transcriptome comprehensiveness (reliable detection of all RNAs of all prevalence classes, including the least abundant ones that are physiologically relevant). Other

12,293 citations

PatentDOI
04 Oct 2000-Science
TL;DR: Serial analysis of gene expression (SAGE) should provide a broadly applicable means for the quantitative cataloging and comparison of expressed genes in a variety of normal, developmental, and disease states.
Abstract: PROBLEM TO BE SOLVED: To provide a method for preparing a short nucleotide sequence (tag) which is useful to identify a cDNA oligonucleotide and is derived from a restricted position in a mRNA or a cDNA. SOLUTION: This is the method of preparing a tag for identifying the cDNA oligonucleotide. The above method comprises preparing the cDNA oligonucleotide bearing 5' and 3' terminals, collecting cDNA fragments by cutting the cDNA oligonucleotide with a restriction enzyme at the first restriction endonuclease site, separating a cDNA oligonucleotide bearing 5' or 3' terminal and connecting an oligonucleotide linker to the isolated cDNA fragment bearing the cDNA oligonucleotide 5' or 3' terminal. Here, the oligonucleotide linker contains the recognition site of the second restriction endonuclease enzyme and the isolated cDNA fragment is cut with the second restriction endonuclease enzyme which cuts the cDNA fragment in a section separated from the recognition site to obtain the tag for identifying the cDNA oligonucleotide.

4,437 citations

Journal ArticleDOI
TL;DR: This work describes the software MAQ, software that can build assemblies by mapping shotgun short reads to a reference genome, using quality scores to derive genotype calls of the consensus sequence of a diploid genome, e.g., from a human sample.
Abstract: New sequencing technologies promise a new era in the use of DNA sequence. However, some of these technologies produce very short reads, typically of a few tens of base pairs, and to use these reads effectively requires new algorithms and software. In particular, there is a major issue in efficiently aligning short reads to a reference genome and handling ambiguity or lack of accuracy in this alignment. Here we introduce the concept of mapping quality, a measure of the confidence that a read actually comes from the position it is aligned to by the mapping algorithm. We describe the software MAQ that can build assemblies by mapping shotgun short reads to a reference genome, using quality scores to derive genotype calls of the consensus sequence of a diploid genome, e.g., from a human sample. MAQ makes full use of mate-pair information and estimates the error probability of each read alignment. Error probabilities are also derived for the final genotype calls, using a Bayesian statistical model that incorporates the mapping qualities, error probabilities from the raw sequence quality scores, sampling of the two haplotypes, and an empirical model for correlated errors at a site. Both read mapping and genotype calling are evaluated on simulated data and real data. MAQ is accurate, efficient, versatile, and user-friendly. It is freely available at http://maq.sourceforge.net.

2,927 citations

Journal ArticleDOI
TL;DR: It is found that the Illumina sequencing data are highly replicable, with relatively little technical variation, and thus, for many purposes, it may suffice to sequence each mRNA sample only once (i.e., using one lane).
Abstract: Ultra-high-throughput sequencing is emerging as an attractive alternative to microarrays for genotyping, analysis of methylation patterns, and identification of transcription factor binding sites. Here, we describe an application of the Illumina sequencing (formerly Solexa sequencing) platform to study mRNA expression levels. Our goals were to estimate technical variance associated with Illumina sequencing in this context and to compare its ability to identify differentially expressed genes with existing array technologies. To do so, we estimated gene expression differences between liver and kidney RNA samples using multiple sequencing replicates, and compared the sequencing data to results obtained from Affymetrix arrays using the same RNA samples. We find that the Illumina sequencing data are highly replicable, with relatively little technical variation, and thus, for many purposes, it may suffice to sequence each mRNA sample only once (i.e., using one lane). The information in a single lane of Illumina sequencing data appears comparable to that in a single array in enabling identification of differentially expressed genes, while allowing for additional analyses such as detection of low-expressed genes, alternative splice variants, and novel transcripts. Based on our observations, we propose an empirical protocol and a statistical framework for the analysis of gene expression using ultra-high-throughput sequencing technology.

2,834 citations

Journal ArticleDOI
TL;DR: The program SOAP is designed to handle the huge amounts of short reads generated by parallel sequencing using the new generation Illumina-Solexa sequencing technology, which supports multi-threaded parallel computing and has a batch module for multiple query sets.
Abstract: Summary: We have developed a program SOAP for efficient gapped and ungapped alignment of short oligonucleotides onto reference sequences. The program is designed to handle the huge amounts of short reads generated by parallel sequencing using the new generation Illumina-Solexa sequencing technology. SOAP is compatible with numerous applications, including single-read or pair-end resequencing, small RNA discovery and mRNA tag sequence mapping. SOAP is a command-driven program, which supports multi-threaded parallel computing, and has a batch module for multiple query sets. Availability: http://soap.genomics.org.cn Contact: soap@genomics.org.cn

2,729 citations


"RNA-Seq: a revolutionary tool for t..." refers methods in this paper

  • ...There are several programs for mapping reads to the genome, including ELAND, SOA...

    [...]