scispace - formally typeset
Search or ask a question
Journal ArticleDOI

RNA-Seq: a revolutionary tool for transcriptomics

01 Jan 2009-Nature Reviews Genetics (Nature Publishing Group)-Vol. 10, Iss: 1, pp 57-63
TL;DR: The RNA-Seq approach to transcriptome profiling that uses deep-sequencing technologies provides a far more precise measurement of levels of transcripts and their isoforms than other methods.
Abstract: RNA-Seq is a recently developed approach to transcriptome profiling that uses deep-sequencing technologies. Studies using this method have already altered our view of the extent and complexity of eukaryotic transcriptomes. RNA-Seq also provides a far more precise measurement of levels of transcripts and their isoforms than other methods. This article describes the RNA-Seq approach, the challenges associated with its application, and the advances made so far in characterizing several eukaryote transcriptomes.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: This study provides a basis for further analysis and characterisation of genes shown to be highly induced in the presence of a lignocellulosic substrate and suggests a conserved strategy towards lignOcellulose degradation in both saprobic fungi.
Abstract: Background: A major part of second generation biofuel production is the enzymatic saccharification of lignocellulosic biomass into fermentable sugars. Many fungi produce enzymes that can saccarify lignocellulose and cocktails from several fungi, including well-studied species such as Trichoderma reesei and Aspergillus niger, are available commercially for this process. Such commercially-available enzyme cocktails are not necessarily representative of the array of enzymes used by the fungi themselves when faced with a complex lignocellulosic material. The global induction of genes in response to exposure of T. reesei to wheat straw was explored using RNA-seq and compared to published RNA-seq data and model of how A. niger senses and responds to wheat straw. Results: In T. reesei, levels of transcript that encode known and predicted cell-wall degrading enzymes were very high after 24 h exposure to straw (approximately 13% of the total mRNA) but were less than recorded in A. niger (approximately 19% of the total mRNA). Closer analysis revealed that enzymes from the same glycoside hydrolase families but different carbohydrate esterase and polysaccharide lyase families were up-regulated in both organisms. Accessory proteins which have been hypothesised to possibly have a role in enhancing carbohydrate deconstruction in A. niger were also uncovered in T. reesei and categories of enzymes induced were in general similar to those in A. niger .S imilarly toA. niger, antisense transcripts are present in T. reesei and their expression is regulated by the growth condition. Conclusions: T. reesei uses a similar array of enzymes, for the deconstruction of a solid lignocellulosic substrate, to A. niger. This suggests a conserved strategy towards lignocellulose degradation in both saprobic fungi. This study provides a basis for further analysis and characterisation of genes shown to be highly induced in the presence of a lignocellulosic substrate. The data will help to elucidate the mechanism of solid substrate recognition and subsequent degradation by T. reesei and provide information which could prove useful for efficient production of second generation biofuels.

92 citations


Cites background from "RNA-Seq: a revolutionary tool for t..."

  • ...As a consequence, RNA-seq allows for a large dynamic range of expression levels over which transcripts can be detected and has increased sensitivity for genes expressed at either very high or very low levels when compared to microarrays [36]....

    [...]

  • ...An advantage of RNA-seq, when compared to microarrays, is that it is not limited to detecting transcripts that correspond to known genomic sequences, background signals are low and it does not have an upper limit for quantification [36]....

    [...]

Journal ArticleDOI
TL;DR: Genome-scale metabolic models, new tools for controlling expression, and integrated -omics analysis are described as key contributors in moving the field toward Design-based Engineering.

92 citations

Journal ArticleDOI
TL;DR: This mini-review discusses the strategy for quantitative study of alternative splicing in cancers with RNA-Seq, the bioinformatics methods available and existing questions, and summarizes the current RNA- Seq studies on cancer transcriptomes.

92 citations


Cites background from "RNA-Seq: a revolutionary tool for t..."

  • ...Comparing with microarrays, RNA-Seq provides much richer information about the transcriptome, and also provides better resolution and higher accuracy when it is used for the conventional gene expression study [33]....

    [...]

Journal ArticleDOI
TL;DR: The applications on DPINs will be discussed, including protein complexes/functional modules and network organization analysis, biomarkers detection in the progression or prognosis of the disease, and network medicine.
Abstract: With more dynamic information available, researchers' attention has recently shifted from static properties to dynamic properties of protein-protein interaction networks. To compensate the limited ability of technologies of detecting dynamic protein-protein interactions, dynamic protein interaction networks (DPINs) can be constructed by involving proteomic, genomic, and transcriptome analyses. Two groups of DPIN construction methods are classified based on the different focuses on dynamic information extracted from gene expression data. The dynamics of one kind of DPINs is reflected by the changes in protein presence varying with time, while that of the other kind of DPINs is reflected by the differences of coexpression under different conditions. In this review, the applications on DPINs will be discussed, including protein complexes/functional modules and network organization analysis, biomarkers detection in the progression or prognosis of the disease, and network medicine. We also point out the challenges in DPINs construction and future directions in the research of DPINs at the end of this review.

92 citations


Cites methods from "RNA-Seq: a revolutionary tool for t..."

  • ...Generally speaking, to exploit the dynamic information from RNA-seq data, a number of experimental and computational challenges need to be addressed, including the alignment of RNA-Seq reads, identification of mRNA isoforms, and estimation of mRNA isoform expression levels....

    [...]

  • ...The identification of transcribed mRNA isoforms from the RNA-seq reads alignment and estimation of mRNA isoforms expression levels are basic problems to extract the dynamic information....

    [...]

  • ...[87] Wang, K., Singh, D., Zeng, Z., Coleman, S. J. et al., MapSplice: accurate mapping of RNA-seq reads for splice junction discovery....

    [...]

  • ...With the quick development of the next-generation sequencing technology, RNA-seq [84] is an increasingly popular method to study gene expression....

    [...]

  • ...However, the mRNA isofrom expression levels can neither directly be obtained from RNA-seq data, nor be measured by the quantity of reads directly [88]....

    [...]

Journal ArticleDOI
TL;DR: The understanding of pulmonary diseases has increased as a result of applying high-throughput omics approaches to characterize patients, uncover mechanisms underlying drug responsiveness, and identify effects of environmental exposures and interventions.
Abstract: Omics approaches are high-throughput unbiased technologies that provide snapshots of various aspects of biological systems and include: 1) genomics, the measure of DNA variation; 2) transcriptomics, the measure of RNA expression; 3) epigenomics, the measure of DNA alterations not involving sequence variation that influence RNA expression; 4) proteomics, the measure of protein expression or its chemical modifications; and 5) metabolomics, the measure of metabolite levels. Our understanding of pulmonary diseases has increased as a result of applying these omics approaches to characterize patients, uncover mechanisms underlying drug responsiveness, and identify effects of environmental exposures and interventions. As more tissue- and cell-specific omics data is analyzed and integrated for diverse patients under various conditions, there will be increased identification of key mechanisms that underlie pulmonary biological processes, disease endotypes, and novel therapeutics that are efficacious in select individuals. We provide a synopsis of how omics approaches have advanced our understanding of asthma, chronic obstructive pulmonary disease (COPD), acute respiratory distress syndrome (ARDS), idiopathic pulmonary fibrosis (IPF), and pulmonary arterial hypertension (PAH), and we highlight ongoing work that will facilitate pulmonary disease precision medicine.

91 citations


Cites background from "RNA-Seq: a revolutionary tool for t..."

  • ...RNA-Seq allows for sequencing and quantification of transcripts in a cell or tissue at unprecedented depth [77]....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: Although >90% of uniquely mapped reads fell within known exons, the remaining data suggest new and revised gene models, including changed or additional promoters, exons and 3′ untranscribed regions, as well as new candidate microRNA precursors.
Abstract: We have mapped and quantified mouse transcriptomes by deeply sequencing them and recording how frequently each gene is represented in the sequence sample (RNA-Seq). This provides a digital measure of the presence and prevalence of transcripts from known and previously unknown genes. We report reference measurements composed of 41–52 million mapped 25-base-pair reads for poly(A)-selected RNA from adult mouse brain, liver and skeletal muscle tissues. We used RNA standards to quantify transcript prevalence and to test the linear range of transcript detection, which spanned five orders of magnitude. Although >90% of uniquely mapped reads fell within known exons, the remaining data suggest new and revised gene models, including changed or additional promoters, exons and 3′ untranscribed regions, as well as new candidate microRNA precursors. RNA splice events, which are not readily measured by standard gene expression microarray or serial analysis of gene expression methods, were detected directly by mapping splice-crossing sequence reads. We observed 1.45 × 10 5 distinct splices, and alternative splices were prominent, with 3,500 different genes expressing one or more alternate internal splices. The mRNA population specifies a cell’s identity and helps to govern its present and future activities. This has made transcriptome analysis a general phenotyping method, with expression microarrays of many kinds in routine use. Here we explore the possibility that transcriptome analysis, transcript discovery and transcript refinement can be done effectively in large and complex mammalian genomes by ultra-high-throughput sequencing. Expression microarrays are currently the most widely used methodology for transcriptome analysis, although some limitations persist. These include hybridization and cross-hybridization artifacts 1–3 , dye-based detection issues and design constraints that preclude or seriously limit the detection of RNA splice patterns and previously unmapped genes. These issues have made it difficult for standard array designs to provide full sequence comprehensiveness (coverage of all possible genes, including unknown ones, in large genomes) or transcriptome comprehensiveness (reliable detection of all RNAs of all prevalence classes, including the least abundant ones that are physiologically relevant). Other

12,293 citations

PatentDOI
04 Oct 2000-Science
TL;DR: Serial analysis of gene expression (SAGE) should provide a broadly applicable means for the quantitative cataloging and comparison of expressed genes in a variety of normal, developmental, and disease states.
Abstract: PROBLEM TO BE SOLVED: To provide a method for preparing a short nucleotide sequence (tag) which is useful to identify a cDNA oligonucleotide and is derived from a restricted position in a mRNA or a cDNA. SOLUTION: This is the method of preparing a tag for identifying the cDNA oligonucleotide. The above method comprises preparing the cDNA oligonucleotide bearing 5' and 3' terminals, collecting cDNA fragments by cutting the cDNA oligonucleotide with a restriction enzyme at the first restriction endonuclease site, separating a cDNA oligonucleotide bearing 5' or 3' terminal and connecting an oligonucleotide linker to the isolated cDNA fragment bearing the cDNA oligonucleotide 5' or 3' terminal. Here, the oligonucleotide linker contains the recognition site of the second restriction endonuclease enzyme and the isolated cDNA fragment is cut with the second restriction endonuclease enzyme which cuts the cDNA fragment in a section separated from the recognition site to obtain the tag for identifying the cDNA oligonucleotide.

4,437 citations

Journal ArticleDOI
TL;DR: This work describes the software MAQ, software that can build assemblies by mapping shotgun short reads to a reference genome, using quality scores to derive genotype calls of the consensus sequence of a diploid genome, e.g., from a human sample.
Abstract: New sequencing technologies promise a new era in the use of DNA sequence. However, some of these technologies produce very short reads, typically of a few tens of base pairs, and to use these reads effectively requires new algorithms and software. In particular, there is a major issue in efficiently aligning short reads to a reference genome and handling ambiguity or lack of accuracy in this alignment. Here we introduce the concept of mapping quality, a measure of the confidence that a read actually comes from the position it is aligned to by the mapping algorithm. We describe the software MAQ that can build assemblies by mapping shotgun short reads to a reference genome, using quality scores to derive genotype calls of the consensus sequence of a diploid genome, e.g., from a human sample. MAQ makes full use of mate-pair information and estimates the error probability of each read alignment. Error probabilities are also derived for the final genotype calls, using a Bayesian statistical model that incorporates the mapping qualities, error probabilities from the raw sequence quality scores, sampling of the two haplotypes, and an empirical model for correlated errors at a site. Both read mapping and genotype calling are evaluated on simulated data and real data. MAQ is accurate, efficient, versatile, and user-friendly. It is freely available at http://maq.sourceforge.net.

2,927 citations

Journal ArticleDOI
TL;DR: It is found that the Illumina sequencing data are highly replicable, with relatively little technical variation, and thus, for many purposes, it may suffice to sequence each mRNA sample only once (i.e., using one lane).
Abstract: Ultra-high-throughput sequencing is emerging as an attractive alternative to microarrays for genotyping, analysis of methylation patterns, and identification of transcription factor binding sites. Here, we describe an application of the Illumina sequencing (formerly Solexa sequencing) platform to study mRNA expression levels. Our goals were to estimate technical variance associated with Illumina sequencing in this context and to compare its ability to identify differentially expressed genes with existing array technologies. To do so, we estimated gene expression differences between liver and kidney RNA samples using multiple sequencing replicates, and compared the sequencing data to results obtained from Affymetrix arrays using the same RNA samples. We find that the Illumina sequencing data are highly replicable, with relatively little technical variation, and thus, for many purposes, it may suffice to sequence each mRNA sample only once (i.e., using one lane). The information in a single lane of Illumina sequencing data appears comparable to that in a single array in enabling identification of differentially expressed genes, while allowing for additional analyses such as detection of low-expressed genes, alternative splice variants, and novel transcripts. Based on our observations, we propose an empirical protocol and a statistical framework for the analysis of gene expression using ultra-high-throughput sequencing technology.

2,834 citations

Journal ArticleDOI
TL;DR: The program SOAP is designed to handle the huge amounts of short reads generated by parallel sequencing using the new generation Illumina-Solexa sequencing technology, which supports multi-threaded parallel computing and has a batch module for multiple query sets.
Abstract: Summary: We have developed a program SOAP for efficient gapped and ungapped alignment of short oligonucleotides onto reference sequences. The program is designed to handle the huge amounts of short reads generated by parallel sequencing using the new generation Illumina-Solexa sequencing technology. SOAP is compatible with numerous applications, including single-read or pair-end resequencing, small RNA discovery and mRNA tag sequence mapping. SOAP is a command-driven program, which supports multi-threaded parallel computing, and has a batch module for multiple query sets. Availability: http://soap.genomics.org.cn Contact: soap@genomics.org.cn

2,729 citations


"RNA-Seq: a revolutionary tool for t..." refers methods in this paper

  • ...There are several programs for mapping reads to the genome, including ELAND, SOA...

    [...]