scispace - formally typeset
Search or ask a question
Journal ArticleDOI

RNA-Seq: a revolutionary tool for transcriptomics

01 Jan 2009-Nature Reviews Genetics (Nature Publishing Group)-Vol. 10, Iss: 1, pp 57-63
TL;DR: The RNA-Seq approach to transcriptome profiling that uses deep-sequencing technologies provides a far more precise measurement of levels of transcripts and their isoforms than other methods.
Abstract: RNA-Seq is a recently developed approach to transcriptome profiling that uses deep-sequencing technologies. Studies using this method have already altered our view of the extent and complexity of eukaryotic transcriptomes. RNA-Seq also provides a far more precise measurement of levels of transcripts and their isoforms than other methods. This article describes the RNA-Seq approach, the challenges associated with its application, and the advances made so far in characterizing several eukaryote transcriptomes.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
23 Dec 2013-PLOS ONE
TL;DR: Nine different trimming algorithms are evaluated in four datasets and three common NGS-based applications (RNA-Seq, SNP calling and genome assembly) to increase the quality and reliability of the analysis, with concurrent gains in terms of execution time and computational resources needed.
Abstract: Next Generation Sequencing is having an extremely strong impact in biological and medical research and diagnostics, with applications ranging from gene expression quantification to genotyping and genome reconstruction. Sequencing data is often provided as raw reads which are processed prior to analysis 1 of the most used preprocessing procedures is read trimming, which aims at removing low quality portions while preserving the longest high quality part of a NGS read. In the current work, we evaluate nine different trimming algorithms in four datasets and three common NGS-based applications (RNA-Seq, SNP calling and genome assembly). Trimming is shown to increase the quality and reliability of the analysis, with concurrent gains in terms of execution time and computational resources needed.

376 citations


Cites background from "RNA-Seq: a revolutionary tool for t..."

  • ...While de novo assembly is mainly used to blindly reconstruct an unknown genome or transcriptome, read alignment has several purposes: when the original material is mRNA (RNAseq), it allows to precisely measure levels of transcripts and to identify splicing isoforms [8]....

    [...]

01 Nov 2011
TL;DR: There are immediate opportunities to implement NGS for clinical use and the sensitivity, speed and reduced cost per sample make it a highly attractive platform compared to other sequencing modalities.
Abstract: Next-generation sequencing (NGS) is arguably one of the most significant technological advances in the biological sciences of the last 30 years. The second generation sequencing platforms have advanced rapidly to the point that several genomes can now be sequenced simultaneously in a single instrument run in under two weeks. Targeted DNA enrichment methods allow even higher genome throughput at a reduced cost per sample. Medical research has embraced the technology and the cancer field is at the forefront of these efforts given the genetic aspects of the disease. World-wide efforts to catalogue mutations in multiple cancer types are underway and this is likely to lead to new discoveries that will be translated to new diagnostic, prognostic and therapeutic targets. NGS is now maturing to the point where it is being considered by many laboratories for routine diagnostic use. The sensitivity, speed and reduced cost per sample make it a highly attractive platform compared to other sequencing modalities. Moreover, as we identify more genetic determinants of cancer there is a greater need to adopt multi-gene assays that can quickly and reliably sequence complete genes from individual patient samples. Whilst widespread and routine use of whole genome sequencing is likely to be a few years away, there are immediate opportunities to implement NGS for clinical use. Here we review the technology, methods and applications that can be immediately considered and some of the challenges that lie ahead.

372 citations

Journal ArticleDOI
TL;DR: This study used Illumina‐based massively parallel sequencing to gain new insight into the transcriptome (RNA‐Seq) of the human malaria parasite, Plasmodium falciparum, and greatly improves existing annotation of the P. falcIParum genome.
Abstract: Recent advances in high-throughput sequencing present a new opportunity to deeply probe an organism's transcriptome. In this study, we used Illumina-based massively parallel sequencing to gain new insight into the transcriptome (RNA-Seq) of the human malaria parasite, Plasmodium falciparum. Using data collected at seven time points during the intraerythrocytic developmental cycle, we (i) detect novel gene transcripts; (ii) correct hundreds of gene models; (iii) propose alternative splicing events; and (iv) predict 5′ and 3′ untranslated regions. Approximately 70% of the unique sequencing reads map to previously annotated protein-coding genes. The RNA-Seq results greatly improve existing annotation of the P. falciparum genome with over 10% of gene models modified. Our data confirm 75% of predicted splice sites and identify 202 new splice sites, including 84 previously uncharacterized alternative splicing events. We also discovered 107 novel transcripts and expression of 38 pseudogenes, with many demonstrating differential expression across the developmental time series. Our RNA-Seq results correlate well with DNA microarray analysis performed in parallel on the same samples, and provide improved resolution over the microarray-based method. These data reveal new features of the P. falciparum transcriptional landscape and significantly advance our understanding of the parasite's red blood cell-stage transcriptome.

370 citations


Cites methods from "RNA-Seq: a revolutionary tool for t..."

  • ...RNA-Seq can reliably be used to correct gene annotations, confirm new and existing splice forms, analyse UTR regions, define non-coding RNAs or find new transcripts (Wang et al., 2009)....

    [...]

Journal ArticleDOI
TL;DR: A critical role for lignin was believed to contribute to the resistance of cotton to disease and the utility of RNA-Seq for gene expression profiles during the cotton defence response was demonstrated.
Abstract: The incompatible pathosystem between resistant cotton (Gossypium barbadense cv. 7124) and Verticillium dahliae strain V991 was used to study the cotton transcriptome changes after pathogen inoculation by RNA-Seq. Of 32,774 genes detected by mapping the tags to assembly cotton contigs, 3442 defence-responsive genes were identified. Gene cluster analyses and functional assignments of differentially expressed genes indicated a significant transcriptional complexity. Quantitative real-time PCR (qPCR) was performed on selected genes with different expression levels and functional assignments to demonstrate the utility of RNA-Seq for gene expression profiles during the cotton defence response. Detailed elucidation of responses of leucine-rich repeat receptor-like kinases (LRR-RLKs), phytohormone signalling-related genes, and transcription factors described the interplay of signals that allowed the plant to fine-tune defence responses. On the basis of global gene regulation of phenylpropanoid metabolism-related genes, phenylpropanoid metabolism was deduced to be involved in the cotton defence response. A closer look at the expression of these genes, enzyme activity, and lignin levels revealed differences between resistant and susceptible cotton plants. Both types of plants showed an increased level of expression of lignin synthesis-related genes and increased phenylalanine-ammonia lyase (PAL) and peroxidase (POD) enzyme activity after inoculation with V. dahliae, but the increase was greater and faster in the resistant line. Histochemical analysis of lignin revealed that the resistant cotton not only retains its vascular structure, but also accumulates high levels of lignin. Furthermore, quantitative analysis demonstrated increased lignification and cross-linking of lignin in resistant cotton stems. Overall, a critical role for lignin was believed to contribute to the resistance of cotton to disease.

370 citations


Cites background from "RNA-Seq: a revolutionary tool for t..."

  • ...…approaches, highthroughput sequencing technologies, referring to as RNASeq, have much greater power to distinguish between paralogous genes, detect low or high abundance transcripts, and allow replicate quantification based on the number of sequences obtained (Wang et al., 2009)....

    [...]

  • ...Different from the traditional hybridization-based approaches, highthroughput sequencing technologies, referring to as RNASeq, have much greater power to distinguish between paralogous genes, detect low or high abundance transcripts, and allow replicate quantification based on the number of sequences obtained (Wang et al., 2009)....

    [...]

Journal ArticleDOI
TL;DR: Criteria that must be met during design of ligand-targeted drugs (LTDs) to achieve the required therapeutic potency with minimal toxicity are summarized.
Abstract: Safety and efficacy constitute the major criteria governing regulatory approval of any new drug. The best method to maximize safety and efficacy is to deliver a proven therapeutic agent with a targeting ligand that exhibits little affinity for healthy cells but high affinity for pathologic cells. The probability of regulatory approval can conceivably be further enhanced by exploiting the same targeting ligand, conjugated to an imaging agent, to select patients whose diseased tissues display sufficient targeted receptors for therapeutic efficacy. The focus of this Review is to summarize criteria that must be met during design of ligand-targeted drugs (LTDs) to achieve the required therapeutic potency with minimal toxicity. Because most LTDs are composed of a targeting ligand (e.g., organic molecule, aptamer, protein scaffold, or antibody), spacer, cleavable linker, and therapeutic warhead, criteria for successful design of each component will be described. Moreover, because obstacles to successful drug design can differ among human pathologies, limitations to drug delivery imposed by the unique characteristics of different diseases will be considered. With the explosion of genomic and transcriptomic data providing an ever-expanding selection of disease-specific targets, and with tools for high-throughput chemistry offering an escalating diversity of warheads, opportunities for innovating safe and effective LTDs has never been greater.

367 citations

References
More filters
Journal ArticleDOI
TL;DR: Although >90% of uniquely mapped reads fell within known exons, the remaining data suggest new and revised gene models, including changed or additional promoters, exons and 3′ untranscribed regions, as well as new candidate microRNA precursors.
Abstract: We have mapped and quantified mouse transcriptomes by deeply sequencing them and recording how frequently each gene is represented in the sequence sample (RNA-Seq). This provides a digital measure of the presence and prevalence of transcripts from known and previously unknown genes. We report reference measurements composed of 41–52 million mapped 25-base-pair reads for poly(A)-selected RNA from adult mouse brain, liver and skeletal muscle tissues. We used RNA standards to quantify transcript prevalence and to test the linear range of transcript detection, which spanned five orders of magnitude. Although >90% of uniquely mapped reads fell within known exons, the remaining data suggest new and revised gene models, including changed or additional promoters, exons and 3′ untranscribed regions, as well as new candidate microRNA precursors. RNA splice events, which are not readily measured by standard gene expression microarray or serial analysis of gene expression methods, were detected directly by mapping splice-crossing sequence reads. We observed 1.45 × 10 5 distinct splices, and alternative splices were prominent, with 3,500 different genes expressing one or more alternate internal splices. The mRNA population specifies a cell’s identity and helps to govern its present and future activities. This has made transcriptome analysis a general phenotyping method, with expression microarrays of many kinds in routine use. Here we explore the possibility that transcriptome analysis, transcript discovery and transcript refinement can be done effectively in large and complex mammalian genomes by ultra-high-throughput sequencing. Expression microarrays are currently the most widely used methodology for transcriptome analysis, although some limitations persist. These include hybridization and cross-hybridization artifacts 1–3 , dye-based detection issues and design constraints that preclude or seriously limit the detection of RNA splice patterns and previously unmapped genes. These issues have made it difficult for standard array designs to provide full sequence comprehensiveness (coverage of all possible genes, including unknown ones, in large genomes) or transcriptome comprehensiveness (reliable detection of all RNAs of all prevalence classes, including the least abundant ones that are physiologically relevant). Other

12,293 citations

PatentDOI
04 Oct 2000-Science
TL;DR: Serial analysis of gene expression (SAGE) should provide a broadly applicable means for the quantitative cataloging and comparison of expressed genes in a variety of normal, developmental, and disease states.
Abstract: PROBLEM TO BE SOLVED: To provide a method for preparing a short nucleotide sequence (tag) which is useful to identify a cDNA oligonucleotide and is derived from a restricted position in a mRNA or a cDNA. SOLUTION: This is the method of preparing a tag for identifying the cDNA oligonucleotide. The above method comprises preparing the cDNA oligonucleotide bearing 5' and 3' terminals, collecting cDNA fragments by cutting the cDNA oligonucleotide with a restriction enzyme at the first restriction endonuclease site, separating a cDNA oligonucleotide bearing 5' or 3' terminal and connecting an oligonucleotide linker to the isolated cDNA fragment bearing the cDNA oligonucleotide 5' or 3' terminal. Here, the oligonucleotide linker contains the recognition site of the second restriction endonuclease enzyme and the isolated cDNA fragment is cut with the second restriction endonuclease enzyme which cuts the cDNA fragment in a section separated from the recognition site to obtain the tag for identifying the cDNA oligonucleotide.

4,437 citations

Journal ArticleDOI
TL;DR: This work describes the software MAQ, software that can build assemblies by mapping shotgun short reads to a reference genome, using quality scores to derive genotype calls of the consensus sequence of a diploid genome, e.g., from a human sample.
Abstract: New sequencing technologies promise a new era in the use of DNA sequence. However, some of these technologies produce very short reads, typically of a few tens of base pairs, and to use these reads effectively requires new algorithms and software. In particular, there is a major issue in efficiently aligning short reads to a reference genome and handling ambiguity or lack of accuracy in this alignment. Here we introduce the concept of mapping quality, a measure of the confidence that a read actually comes from the position it is aligned to by the mapping algorithm. We describe the software MAQ that can build assemblies by mapping shotgun short reads to a reference genome, using quality scores to derive genotype calls of the consensus sequence of a diploid genome, e.g., from a human sample. MAQ makes full use of mate-pair information and estimates the error probability of each read alignment. Error probabilities are also derived for the final genotype calls, using a Bayesian statistical model that incorporates the mapping qualities, error probabilities from the raw sequence quality scores, sampling of the two haplotypes, and an empirical model for correlated errors at a site. Both read mapping and genotype calling are evaluated on simulated data and real data. MAQ is accurate, efficient, versatile, and user-friendly. It is freely available at http://maq.sourceforge.net.

2,927 citations

Journal ArticleDOI
TL;DR: It is found that the Illumina sequencing data are highly replicable, with relatively little technical variation, and thus, for many purposes, it may suffice to sequence each mRNA sample only once (i.e., using one lane).
Abstract: Ultra-high-throughput sequencing is emerging as an attractive alternative to microarrays for genotyping, analysis of methylation patterns, and identification of transcription factor binding sites. Here, we describe an application of the Illumina sequencing (formerly Solexa sequencing) platform to study mRNA expression levels. Our goals were to estimate technical variance associated with Illumina sequencing in this context and to compare its ability to identify differentially expressed genes with existing array technologies. To do so, we estimated gene expression differences between liver and kidney RNA samples using multiple sequencing replicates, and compared the sequencing data to results obtained from Affymetrix arrays using the same RNA samples. We find that the Illumina sequencing data are highly replicable, with relatively little technical variation, and thus, for many purposes, it may suffice to sequence each mRNA sample only once (i.e., using one lane). The information in a single lane of Illumina sequencing data appears comparable to that in a single array in enabling identification of differentially expressed genes, while allowing for additional analyses such as detection of low-expressed genes, alternative splice variants, and novel transcripts. Based on our observations, we propose an empirical protocol and a statistical framework for the analysis of gene expression using ultra-high-throughput sequencing technology.

2,834 citations

Journal ArticleDOI
TL;DR: The program SOAP is designed to handle the huge amounts of short reads generated by parallel sequencing using the new generation Illumina-Solexa sequencing technology, which supports multi-threaded parallel computing and has a batch module for multiple query sets.
Abstract: Summary: We have developed a program SOAP for efficient gapped and ungapped alignment of short oligonucleotides onto reference sequences. The program is designed to handle the huge amounts of short reads generated by parallel sequencing using the new generation Illumina-Solexa sequencing technology. SOAP is compatible with numerous applications, including single-read or pair-end resequencing, small RNA discovery and mRNA tag sequence mapping. SOAP is a command-driven program, which supports multi-threaded parallel computing, and has a batch module for multiple query sets. Availability: http://soap.genomics.org.cn Contact: soap@genomics.org.cn

2,729 citations


"RNA-Seq: a revolutionary tool for t..." refers methods in this paper

  • ...There are several programs for mapping reads to the genome, including ELAND, SOA...

    [...]