scispace - formally typeset
Search or ask a question
Journal ArticleDOI

RNA-Seq: a revolutionary tool for transcriptomics

01 Jan 2009-Nature Reviews Genetics (Nature Publishing Group)-Vol. 10, Iss: 1, pp 57-63
TL;DR: The RNA-Seq approach to transcriptome profiling that uses deep-sequencing technologies provides a far more precise measurement of levels of transcripts and their isoforms than other methods.
Abstract: RNA-Seq is a recently developed approach to transcriptome profiling that uses deep-sequencing technologies. Studies using this method have already altered our view of the extent and complexity of eukaryotic transcriptomes. RNA-Seq also provides a far more precise measurement of levels of transcripts and their isoforms than other methods. This article describes the RNA-Seq approach, the challenges associated with its application, and the advances made so far in characterizing several eukaryote transcriptomes.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: A literature review of the current state of knowledge aimed at providing a framework for a better understanding of fish skeletal muscle ontogeny, and its impact on larval and juvenile quality as broadly defined, focuses on fundamental biological knowledge relevant to larval phenotype and quality.
Abstract: Enhanced production of high quality and healthy fry is a key target for a successful and competitive expansion of the aquaculture industry. Although large quantities of fish larvae are produced, survival rates are often low or highly variable and growth potential is in most cases not fully exploited, indicating significant gaps in our knowledge concerning optimal nutritional and culture conditions. Understanding the mechanisms that control early development and muscle growth are critical for the identification of time windows in development that introduce growth variation, and improve the viability and quality of juveniles. This literature review of the current state of knowledge aims to provide a framework for a better understanding of fish skeletal muscle ontogeny, and its impact on larval and juvenile quality as broadly defined. It focuses on fundamental biological knowledge relevant to larval phenotype and quality and, in particular, on the factors affecting the development of skeletal muscle. It also discusses the available methodologies to assess growth and larvae/juvenile quality, identifies gaps in knowledge and suggests future research directions. The focus is primarily on the major farmed non-salmonid fish species in Europe that include gilthead sea bream, European sea bass, turbot, Atlantic cod, Senegalese sole and Atlantic halibut.

154 citations


Cites background from "RNA-Seq: a revolutionary tool for t..."

  • ...RNA-Seq is a revolutionary technology for the accurate comparison of transcriptome profiles based on next-generation sequencing (Wang et al. 2009)....

    [...]

Journal ArticleDOI
TL;DR: Next generation sequencing with the Roche/454 platform resulted in a high quality - dataset which serves well as a first comprehensive reference set for the model legume pea and suggests future deep sequencing transcriptome projects of species lacking a genomics backbone will need to concentrate mainly on resolving the issues of redundancy and paralogy during transcriptome assembly.
Abstract: The garden pea, Pisum sativum, is among the best-investigated legume plants and of significant agro-commercial relevance Pisum sativum has a large and complex genome and accordingly few comprehensive genomic resources exist We analyzed the pea transcriptome at the highest possible amount of accuracy by current technology We used next generation sequencing with the Roche/454 platform and evaluated and compared a variety of approaches, including diverse tissue libraries, normalization, alternative sequencing technologies, saturation estimation and diverse assembly strategies We generated libraries from flowers, leaves, cotyledons, epi- and hypocotyl, and etiolated and light treated etiolated seedlings, comprising a total of 450 megabases Libraries were assembled into 324,428 unigenes in a first pass assembly A second pass assembly reduced the amount to 81,449 unigenes but caused a significant number of chimeras Analyses of the assemblies identified the assembly step as a major possibility for improvement By recording frequencies of Arabidopsis orthologs hit by randomly drawn reads and fitting parameters of the saturation curve we concluded that sequencing was exhaustive For leaf libraries we found normalization allows partial recovery of expression strength aside the desired effect of increased coverage Based on theoretical and biological considerations we concluded that the sequence reads in the database tagged the vast majority of transcripts in the aerial tissues A pathway representation analysis showed the merits of sampling multiple aerial tissues to increase the number of tagged genes All results have been made available as a fully annotated database in fasta format We conclude that the approach taken resulted in a high quality - dataset which serves well as a first comprehensive reference set for the model legume pea We suggest future deep sequencing transcriptome projects of species lacking a genomics backbone will need to concentrate mainly on resolving the issues of redundancy and paralogy during transcriptome assembly

154 citations


Cites methods from "RNA-Seq: a revolutionary tool for t..."

  • ...For this RNAseq approach either fragmented mRNA or fragmented cDNA [22] can be used as input and read lengths ranging from 100 nucleotides (nts), 250 nts and 500 nts modal length can be received depending on the sequencer and sequencing kit employed, GS 20, GS FLX Standard Series and GS FLX Titanium Series, respectively (reviewed in [21,23])....

    [...]

Journal ArticleDOI
TL;DR: In the Cds2 gene, evidence for RNA editing acting to preserve the ancestral transcript sequence despite genomic sequence divergence is found, showing that despite over two million years of evolutionary divergence, the sites edited and the level of editing at each site is remarkably consistent across the 15 strains.
Abstract: Background Adenosine-to-inosine (A-to-I) editing is a site-selective post-transcriptional alteration of double-stranded RNA by ADAR deaminases that is crucial for homeostasis and development. Recently the Mouse Genomes Project generated genome sequences for 17 laboratory mouse strains and rich catalogues of variants. We also generated RNA-seq data from whole brain RNA from 15 of the sequenced strains.

154 citations


Additional excerpts

  • ...More recently, the advent of second-generation sequencing technologies, and the development of the RNA-seq method, has made it possible to sequence the entire transcriptome [10]....

    [...]

Journal ArticleDOI
TL;DR: The central goals were to improve the genetic diagnosis of LGMD, investigate whether the WES platform provides adequate coverage of known LGMD-related genes, and identify new LGMD -related genes.
Abstract: Importance To our knowledge, the efficacy of transferring next-generation sequencing from a research setting to neuromuscular clinics has never been evaluated. Objective To translate whole-exome sequencing (WES) to clinical practice for the genetic diagnosis of a large cohort of patients with limb-girdle muscular dystrophy (LGMD) for whom protein-based analyses and targeted Sanger sequencing failed to identify the genetic cause of their disorder. Design, Setting, and Participants We performed WES on 60 families with LGMDs (100 exomes). Data analysis was performed between January 6 and December 19, 2014, using the xBrowse bioinformatics interface (Broad Institute). Patients with LGMD were ascertained retrospectively through the Institute for Neuroscience and Muscle Research Biospecimen Bank between 2006 and 2014. Enrolled patients had been extensively investigated via protein studies and candidate gene sequencing and remained undiagnosed. Patients presented with more than 2 years of muscle weakness and with dystrophic or myopathic changes present in muscle biopsy specimens. Main Outcomes and Measures The diagnostic rate of LGMD in Australia and the relative frequencies of the different LGMD subtypes. Our central goals were to improve the genetic diagnosis of LGMD, investigate whether the WES platform provides adequate coverage of known LGMD-related genes, and identify new LGMD-related genes. Results With WES, we identified likely pathogenic mutations in known myopathy genes for 27 of 60 families. Twelve families had mutations in known LGMD-related genes. However, 15 families had variants in disease-related genes not typically associated with LGMD, highlighting the clinical overlap between LGMD and other myopathies. Common causes of phenotypic overlap were due to mutations in congenital muscular dystrophy–related genes (4 families) and collagen myopathy–related genes (4 families). Less common myopathies included metabolic myopathy (2 families), congenital myasthenic syndrome ( DOK7 ), congenital myopathy ( ACTA1 ), tubular aggregate myopathy ( STIM1 ), myofibrillar myopathy ( FLNC ), and mutation of CHD7 , usually associated with the CHARGE syndrome. Inclusion of family members increased the diagnostic efficacy of WES, with a diagnostic rate of 60% for “trios” (an affected proband with both parents) vs 40% for single probands. A follow-up screening of patients whose conditions were undiagnosed on a targeted neuromuscular disease–related gene panel did not improve our diagnostic yield. Conclusions and Relevance With WES, we achieved a diagnostic success rate of 45.0% in our difficult-to-diagnose cohort of patients with LGMD. We expand the clinical phenotypes associated with known myopathy genes, and we stress the importance of accurate clinical examination and histopathological results for interpretation of WES, with many diagnoses requiring follow-up review and ancillary investigations of biopsy specimens or serum samples.

153 citations

Journal ArticleDOI
27 Apr 2012-PLOS ONE
TL;DR: Characterization of sweet potato transcriptome provides an effective tool for better understanding the molecular mechanisms of cellular processes including development of leaves and storage roots, tissue-specific gene expression, potential biotic and abiotic stress response in sweet potato.
Abstract: Background Sweet potato (Ipomoea batatas L. [Lam.]) ranks among the top six most important food crops in the world. It is widely grown throughout the world with high and stable yield, strong adaptability, rich nutrient content, and multiple uses. However, little is known about the molecular biology of this important non-model organism due to lack of genomic resources. Hence, studies based on high-throughput sequencing technologies are needed to get a comprehensive and integrated genomic resource and better understanding of gene expression patterns in different tissues and at various developmental stages. Methodology/Principal Findings Illumina paired-end (PE) RNA-Sequencing was performed, and generated 48.7 million of 75 bp PE reads. These reads were de novo assembled into 128,052 transcripts (≥100 bp), which correspond to 41.1 million base pairs, by using a combined assembly strategy. Transcripts were annotated by Blast2GO and 51,763 transcripts got BLASTX hits, in which 39,677 transcripts have GO terms and 14,117 have ECs that are associated with 147 KEGG pathways. Furthermore, transcriptome differences of seven tissues were analyzed by using Illumina digital gene expression (DGE) tag profiling and numerous differentially and specifically expressed transcripts were identified. Moreover, the expression characteristics of genes involved in viral genomes, starch metabolism and potential stress tolerance and insect resistance were also identified. Conclusions/Significance The combined de novo transcriptome assembly strategy can be applied to other organisms whose reference genomes are not available. The data provided here represent the most comprehensive and integrated genomic resources for cloning and identifying genes of interest in sweet potato. Characterization of sweet potato transcriptome provides an effective tool for better understanding the molecular mechanisms of cellular processes including development of leaves and storage roots, tissue-specific gene expression, potential biotic and abiotic stress response in sweet potato.

153 citations


Cites background from "RNA-Seq: a revolutionary tool for t..."

  • ...High-throughput transcriptome sequencing and digital gene expression (DGE) tag profiling are efficient and economic choice for characterizing non-model organisms without a reference genome [9,10]....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: Although >90% of uniquely mapped reads fell within known exons, the remaining data suggest new and revised gene models, including changed or additional promoters, exons and 3′ untranscribed regions, as well as new candidate microRNA precursors.
Abstract: We have mapped and quantified mouse transcriptomes by deeply sequencing them and recording how frequently each gene is represented in the sequence sample (RNA-Seq). This provides a digital measure of the presence and prevalence of transcripts from known and previously unknown genes. We report reference measurements composed of 41–52 million mapped 25-base-pair reads for poly(A)-selected RNA from adult mouse brain, liver and skeletal muscle tissues. We used RNA standards to quantify transcript prevalence and to test the linear range of transcript detection, which spanned five orders of magnitude. Although >90% of uniquely mapped reads fell within known exons, the remaining data suggest new and revised gene models, including changed or additional promoters, exons and 3′ untranscribed regions, as well as new candidate microRNA precursors. RNA splice events, which are not readily measured by standard gene expression microarray or serial analysis of gene expression methods, were detected directly by mapping splice-crossing sequence reads. We observed 1.45 × 10 5 distinct splices, and alternative splices were prominent, with 3,500 different genes expressing one or more alternate internal splices. The mRNA population specifies a cell’s identity and helps to govern its present and future activities. This has made transcriptome analysis a general phenotyping method, with expression microarrays of many kinds in routine use. Here we explore the possibility that transcriptome analysis, transcript discovery and transcript refinement can be done effectively in large and complex mammalian genomes by ultra-high-throughput sequencing. Expression microarrays are currently the most widely used methodology for transcriptome analysis, although some limitations persist. These include hybridization and cross-hybridization artifacts 1–3 , dye-based detection issues and design constraints that preclude or seriously limit the detection of RNA splice patterns and previously unmapped genes. These issues have made it difficult for standard array designs to provide full sequence comprehensiveness (coverage of all possible genes, including unknown ones, in large genomes) or transcriptome comprehensiveness (reliable detection of all RNAs of all prevalence classes, including the least abundant ones that are physiologically relevant). Other

12,293 citations

PatentDOI
04 Oct 2000-Science
TL;DR: Serial analysis of gene expression (SAGE) should provide a broadly applicable means for the quantitative cataloging and comparison of expressed genes in a variety of normal, developmental, and disease states.
Abstract: PROBLEM TO BE SOLVED: To provide a method for preparing a short nucleotide sequence (tag) which is useful to identify a cDNA oligonucleotide and is derived from a restricted position in a mRNA or a cDNA. SOLUTION: This is the method of preparing a tag for identifying the cDNA oligonucleotide. The above method comprises preparing the cDNA oligonucleotide bearing 5' and 3' terminals, collecting cDNA fragments by cutting the cDNA oligonucleotide with a restriction enzyme at the first restriction endonuclease site, separating a cDNA oligonucleotide bearing 5' or 3' terminal and connecting an oligonucleotide linker to the isolated cDNA fragment bearing the cDNA oligonucleotide 5' or 3' terminal. Here, the oligonucleotide linker contains the recognition site of the second restriction endonuclease enzyme and the isolated cDNA fragment is cut with the second restriction endonuclease enzyme which cuts the cDNA fragment in a section separated from the recognition site to obtain the tag for identifying the cDNA oligonucleotide.

4,437 citations

Journal ArticleDOI
TL;DR: This work describes the software MAQ, software that can build assemblies by mapping shotgun short reads to a reference genome, using quality scores to derive genotype calls of the consensus sequence of a diploid genome, e.g., from a human sample.
Abstract: New sequencing technologies promise a new era in the use of DNA sequence. However, some of these technologies produce very short reads, typically of a few tens of base pairs, and to use these reads effectively requires new algorithms and software. In particular, there is a major issue in efficiently aligning short reads to a reference genome and handling ambiguity or lack of accuracy in this alignment. Here we introduce the concept of mapping quality, a measure of the confidence that a read actually comes from the position it is aligned to by the mapping algorithm. We describe the software MAQ that can build assemblies by mapping shotgun short reads to a reference genome, using quality scores to derive genotype calls of the consensus sequence of a diploid genome, e.g., from a human sample. MAQ makes full use of mate-pair information and estimates the error probability of each read alignment. Error probabilities are also derived for the final genotype calls, using a Bayesian statistical model that incorporates the mapping qualities, error probabilities from the raw sequence quality scores, sampling of the two haplotypes, and an empirical model for correlated errors at a site. Both read mapping and genotype calling are evaluated on simulated data and real data. MAQ is accurate, efficient, versatile, and user-friendly. It is freely available at http://maq.sourceforge.net.

2,927 citations

Journal ArticleDOI
TL;DR: It is found that the Illumina sequencing data are highly replicable, with relatively little technical variation, and thus, for many purposes, it may suffice to sequence each mRNA sample only once (i.e., using one lane).
Abstract: Ultra-high-throughput sequencing is emerging as an attractive alternative to microarrays for genotyping, analysis of methylation patterns, and identification of transcription factor binding sites. Here, we describe an application of the Illumina sequencing (formerly Solexa sequencing) platform to study mRNA expression levels. Our goals were to estimate technical variance associated with Illumina sequencing in this context and to compare its ability to identify differentially expressed genes with existing array technologies. To do so, we estimated gene expression differences between liver and kidney RNA samples using multiple sequencing replicates, and compared the sequencing data to results obtained from Affymetrix arrays using the same RNA samples. We find that the Illumina sequencing data are highly replicable, with relatively little technical variation, and thus, for many purposes, it may suffice to sequence each mRNA sample only once (i.e., using one lane). The information in a single lane of Illumina sequencing data appears comparable to that in a single array in enabling identification of differentially expressed genes, while allowing for additional analyses such as detection of low-expressed genes, alternative splice variants, and novel transcripts. Based on our observations, we propose an empirical protocol and a statistical framework for the analysis of gene expression using ultra-high-throughput sequencing technology.

2,834 citations

Journal ArticleDOI
TL;DR: The program SOAP is designed to handle the huge amounts of short reads generated by parallel sequencing using the new generation Illumina-Solexa sequencing technology, which supports multi-threaded parallel computing and has a batch module for multiple query sets.
Abstract: Summary: We have developed a program SOAP for efficient gapped and ungapped alignment of short oligonucleotides onto reference sequences. The program is designed to handle the huge amounts of short reads generated by parallel sequencing using the new generation Illumina-Solexa sequencing technology. SOAP is compatible with numerous applications, including single-read or pair-end resequencing, small RNA discovery and mRNA tag sequence mapping. SOAP is a command-driven program, which supports multi-threaded parallel computing, and has a batch module for multiple query sets. Availability: http://soap.genomics.org.cn Contact: soap@genomics.org.cn

2,729 citations


"RNA-Seq: a revolutionary tool for t..." refers methods in this paper

  • ...There are several programs for mapping reads to the genome, including ELAND, SOA...

    [...]