scispace - formally typeset
Search or ask a question
Journal ArticleDOI

RNA-Seq: a revolutionary tool for transcriptomics

01 Jan 2009-Nature Reviews Genetics (Nature Publishing Group)-Vol. 10, Iss: 1, pp 57-63
TL;DR: The RNA-Seq approach to transcriptome profiling that uses deep-sequencing technologies provides a far more precise measurement of levels of transcripts and their isoforms than other methods.
Abstract: RNA-Seq is a recently developed approach to transcriptome profiling that uses deep-sequencing technologies. Studies using this method have already altered our view of the extent and complexity of eukaryotic transcriptomes. RNA-Seq also provides a far more precise measurement of levels of transcripts and their isoforms than other methods. This article describes the RNA-Seq approach, the challenges associated with its application, and the advances made so far in characterizing several eukaryote transcriptomes.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: In this paper, the authors examine normalization methods from the perspective of their assumptions, as an understanding of methodological assumptions is necessary for choosing methods appropriate for the data at hand, and discuss why normalisation methods perform poorly when their assumptions are violated and how this causes problems in subsequent analysis.
Abstract: RNA-Seq is a widely used method for studying the behavior of genes under different biological conditions. An essential step in an RNA-Seq study is normalization, in which raw data are adjusted to account for factors that prevent direct comparison of expression measures. Errors in normalization can have a significant impact on downstream analysis, such as inflated false positives in differential expression analysis. An underemphasized feature of normalization is the assumptions on which the methods rely and how the validity of these assumptions can have a substantial impact on the performance of the methods. In this article, we explain how assumptions provide the link between raw RNA-Seq read counts and meaningful measures of gene expression. We examine normalization methods from the perspective of their assumptions, as an understanding of methodological assumptions is necessary for choosing methods appropriate for the data at hand. Furthermore, we discuss why normalization methods perform poorly when their assumptions are violated and how this causes problems in subsequent analysis. To analyze a biological experiment, researchers must select a normalization method with assumptions that are met and that produces a meaningful measure of expression for the given experiment.

193 citations

Journal ArticleDOI
TL;DR: The complete transcriptome in skin from paired itchy, lesional and nonitchy, nonlesional skin biopsies is analyzed to lead to an increased understanding of the molecular mechanisms of chronic pruritus and provide targets for itch treatment irrespective of disease state.

193 citations

Journal ArticleDOI
TL;DR: This article gives a broad overview and provides practical guidance for the many steps involved in a typical RNA‐seq work flow from sampling, to RNA extraction, library preparation and data analysis.
Abstract: Genome-wide analyses and high-throughput screening was long reserved for biomedical applications and genetic model organisms. With the rapid development of massively parallel sequencing nanotechnology (or next-generation sequencing) and simultaneous maturation of bioinformatic tools, this situation has dramatically changed. Genome-wide thinking is forging its way into disciplines like evolutionary biology or molecular ecology that were historically confined to small-scale genetic approaches. Accessibility to genome-scale information is transforming these fields, as it allows us to answer long-standing questions like the genetic basis of local adaptation and speciation or the evolution of gene expression profiles that until recently were out of reach. Many in the eco-evolutionary sciences will be working with large-scale genomic data sets, and a basic understanding of the concepts and underlying methods is necessary to judge the work of others. Here, I briefly introduce next-generation sequencing and then focus on transcriptome shotgun sequencing (RNA-seq). This article gives a broad overview and provides practical guidance for the many steps involved in a typical RNA-seq work flow from sampling, to RNA extraction, library preparation and data analysis. I focus on principles, present useful tools where appropriate and point out where caution is needed or progress to be expected. This tutorial is mostly targeted at beginners, but also contains potentially useful reflections for the more experienced.

193 citations


Cites background from "RNA-Seq: a revolutionary tool for t..."

  • ...…on high-throughput sequencing and RNA-seq 1 Principles of high throughput sequencing technology: (Metzker 2010) 2 Principles of RNA-seq: (Wang et al. 2009; Oshlack et al. 2010; Ozsolak & Milos 2011) 3 Principles of transcriptome assembly (Martin & Wang 2011) with particular reference to…...

    [...]

  • ...Box 1 Quick links to useful entry points to the field Overview on high-throughput sequencing and RNA-seq 1 Principles of high throughput sequencing technology: (Metzker 2010) 2 Principles of RNA-seq: (Wang et al. 2009; Oshlack et al. 2010; Ozsolak & Milos 2011) 3 Principles of transcriptome assembly (Martin & Wang 2011) with particular reference to plants (Jain 2012) Applications in ecology and evolutionary biology 1 General next-generation sequencing applications including RNA-seq: (Ekblom & Galindo 2011) 2 Special issues on next-generation sequencing including RNA-seq: (Stapley et al. 2010; Tautz et al. 2010; Orsini et al. 2013) Practical guidance and examples for useful tools 1 Review on computational methods and tools: (Pepke et al. 2009; Magi et al. 2010; Bao et al. 2011; Garber et al. 2011; Lee et al. 2012) 2 Guidance in the design and analysis of RNA-seq experiments: (De Wit et al. 2012; Vijay et al. 2013) 3 Statistical consideration for RNA-seq data: (Bullard et al. 2010; Kvam et al. 2012) 4 Preprocessing and quality control tools: NGSQCtoolkit (Patel & Jain 2012), fastQCtoolkit (http://www.bioinformatics. babraham.ac.uk/projects/fastqc/), 5 Mapping tools: (Trapnell & Salzberg 2009; Bao et al. 2011) 6 Gene name assignment: e.g. BLAST2GO, SATSUMA, SPINES (for details and references see Vijay et al. 2013) 7 Data visualization tools: e.g. MapView (Bao et al. 2009), IGV (Thorvaldsd ottir et al. 2013), Tablet (Milne et al. 2010) 8 Utility suites saving own effort: BEDtools (Quinlan & Hall 2010), SAMtools (Li et al. 2009) 9 Variant calling and genotyping: review (Nielsen et al. 2011), GATK (DePristo et al. 2011), freebayes (http://bioinfor- matics.bc.edu/marthlab/FreeBayes); 10 Gene function: Gene ontology (Gene Ontology Consortium 2004), gene ontology tools http://neurolex.org/wiki/Cat- egory:Resource:Gene_Ontology_Tools 11 Gene interaction pathways: e.g. KeGG pathway (Ogata et al. 1999; http://www.genome.jp/kegg/pathway.html), STRING database (Szklarczyk et al. 2011; http://string-db.org/) (http://string-db.org/) 12 Galaxy: a useful online platform to analyse RNA-seq data: (Goecks et al. 2010) 13 The Bioconductor package: (Gentleman et al. 2004) www.bioconductor.org 14 Differential expression software: e.g. DESeq (Anders & Huber 2010), edgeR (Robinson et al. 2010), baySeq (Hardcastle & Kelly 2010), NOIseq (Tarazona et al. 2011) 15 Alternative splicing software: e.g. Cufflinks (Trapnell et al. 2012), DEXSeq (Anders et al. 2012), EBSeq (Leng N et al. 2013), MISO (Katz et al. 2010) Where to find help?...

    [...]

  • ...RNA-seq: High-throughput shotgun transcriptome sequencing....

    [...]

  • ...1 Principles of high throughput sequencing technology: (Metzker 2010) 2 Principles of RNA-seq: (Wang et al. 2009; Oshlack et al. 2010; Ozsolak & Milos 2011) 3 Principles of transcriptome assembly (Martin & Wang 2011) with particular reference to plants (Jain 2012)...

    [...]

Journal ArticleDOI
TL;DR: This review, focused on the study of differential gene expression with RNA-seq, goes through the main steps of data processing and discusses open challenges and possible solutions.
Abstract: RNA-seq is a methodology for RNA profiling based on next-generation sequencing that enables to measure and compare gene expression patterns at unprecedented resolution. Although the appealing features of this technique have promoted its application to a wide panel of transcriptomics studies, the fast-evolving nature of experimental protocols and computational tools challenges the definition of a unified RNA-seq analysis pipeline. In this review, focused on the study of differential gene expression with RNA-seq, we go through the main steps of data processing and discuss open challenges and possible solutions.

192 citations


Cites background or methods from "RNA-Seq: a revolutionary tool for t..."

  • ...preferred methodology for the study of gene expression [3, 5]....

    [...]

  • ...COUNT BIASAND NORMALIZATION After the first optimistic expectation of a relative ease of analysis of RNA-seq data [3], many works have highlighted the need for a careful normalization of count data before assessing differential gene expression [9, 52–56] to correct for different sources of bias....

    [...]

  • ...Moreover, reverse-transcription can either over- or under-represent 30 end of transcripts if performed with poly-dT oligomers or random hexamers, respectively [1, 3, 19]....

    [...]

  • ...high throughput and relatively low costs [3]....

    [...]

Journal ArticleDOI
TL;DR: Biological assay has been based on analysis of all individuals collected from sample populations, finding that bulked sample analysis will facilitate plant breeding through development of diagnostic and constitutive markers, agronomic genomics, marker‐assisted selection and selective phenotyping.
Abstract: Biological assay has been based on analysis of all individuals collected from sample populations. Bulked sample analysis (BSA), which works with selected and pooled individuals, has been extensively used in gene mapping through bulked segregant analysis with biparental populations, mapping by sequencing with major gene mutants and pooled genomewide association study using extreme variants. Compared to conventional entire population analysis, BSA significantly reduces the scale and cost by simplifying the procedure. The bulks can be built by selection of extremes or representative samples from any populations and all types of segregants and variants that represent wide ranges of phenotypic variation for the target trait. Methods and procedures for sampling, bulking and multiplexing are described. The samples can be analysed using individual markers, microarrays and high-throughput sequencing at all levels of DNA, RNA and protein. The power of BSA is affected by population size, selection of extreme individuals, sequencing strategies, genetic architecture of the trait and marker density. BSA will facilitate plant breeding through development of diagnostic and constitutive markers, agronomic genomics, marker-assisted selection and selective phenotyping. Applications of BSA in genetics, genomics and crop improvement are discussed with their future perspectives.

192 citations


Cites background from "RNA-Seq: a revolutionary tool for t..."

  • ...Compared to other technologies such as microarrays, RNA-seq technology offers the following benefits (Ozsolak and Milos, 2011; Wang et al., 2009)....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: Although >90% of uniquely mapped reads fell within known exons, the remaining data suggest new and revised gene models, including changed or additional promoters, exons and 3′ untranscribed regions, as well as new candidate microRNA precursors.
Abstract: We have mapped and quantified mouse transcriptomes by deeply sequencing them and recording how frequently each gene is represented in the sequence sample (RNA-Seq). This provides a digital measure of the presence and prevalence of transcripts from known and previously unknown genes. We report reference measurements composed of 41–52 million mapped 25-base-pair reads for poly(A)-selected RNA from adult mouse brain, liver and skeletal muscle tissues. We used RNA standards to quantify transcript prevalence and to test the linear range of transcript detection, which spanned five orders of magnitude. Although >90% of uniquely mapped reads fell within known exons, the remaining data suggest new and revised gene models, including changed or additional promoters, exons and 3′ untranscribed regions, as well as new candidate microRNA precursors. RNA splice events, which are not readily measured by standard gene expression microarray or serial analysis of gene expression methods, were detected directly by mapping splice-crossing sequence reads. We observed 1.45 × 10 5 distinct splices, and alternative splices were prominent, with 3,500 different genes expressing one or more alternate internal splices. The mRNA population specifies a cell’s identity and helps to govern its present and future activities. This has made transcriptome analysis a general phenotyping method, with expression microarrays of many kinds in routine use. Here we explore the possibility that transcriptome analysis, transcript discovery and transcript refinement can be done effectively in large and complex mammalian genomes by ultra-high-throughput sequencing. Expression microarrays are currently the most widely used methodology for transcriptome analysis, although some limitations persist. These include hybridization and cross-hybridization artifacts 1–3 , dye-based detection issues and design constraints that preclude or seriously limit the detection of RNA splice patterns and previously unmapped genes. These issues have made it difficult for standard array designs to provide full sequence comprehensiveness (coverage of all possible genes, including unknown ones, in large genomes) or transcriptome comprehensiveness (reliable detection of all RNAs of all prevalence classes, including the least abundant ones that are physiologically relevant). Other

12,293 citations

PatentDOI
04 Oct 2000-Science
TL;DR: Serial analysis of gene expression (SAGE) should provide a broadly applicable means for the quantitative cataloging and comparison of expressed genes in a variety of normal, developmental, and disease states.
Abstract: PROBLEM TO BE SOLVED: To provide a method for preparing a short nucleotide sequence (tag) which is useful to identify a cDNA oligonucleotide and is derived from a restricted position in a mRNA or a cDNA. SOLUTION: This is the method of preparing a tag for identifying the cDNA oligonucleotide. The above method comprises preparing the cDNA oligonucleotide bearing 5' and 3' terminals, collecting cDNA fragments by cutting the cDNA oligonucleotide with a restriction enzyme at the first restriction endonuclease site, separating a cDNA oligonucleotide bearing 5' or 3' terminal and connecting an oligonucleotide linker to the isolated cDNA fragment bearing the cDNA oligonucleotide 5' or 3' terminal. Here, the oligonucleotide linker contains the recognition site of the second restriction endonuclease enzyme and the isolated cDNA fragment is cut with the second restriction endonuclease enzyme which cuts the cDNA fragment in a section separated from the recognition site to obtain the tag for identifying the cDNA oligonucleotide.

4,437 citations

Journal ArticleDOI
TL;DR: This work describes the software MAQ, software that can build assemblies by mapping shotgun short reads to a reference genome, using quality scores to derive genotype calls of the consensus sequence of a diploid genome, e.g., from a human sample.
Abstract: New sequencing technologies promise a new era in the use of DNA sequence. However, some of these technologies produce very short reads, typically of a few tens of base pairs, and to use these reads effectively requires new algorithms and software. In particular, there is a major issue in efficiently aligning short reads to a reference genome and handling ambiguity or lack of accuracy in this alignment. Here we introduce the concept of mapping quality, a measure of the confidence that a read actually comes from the position it is aligned to by the mapping algorithm. We describe the software MAQ that can build assemblies by mapping shotgun short reads to a reference genome, using quality scores to derive genotype calls of the consensus sequence of a diploid genome, e.g., from a human sample. MAQ makes full use of mate-pair information and estimates the error probability of each read alignment. Error probabilities are also derived for the final genotype calls, using a Bayesian statistical model that incorporates the mapping qualities, error probabilities from the raw sequence quality scores, sampling of the two haplotypes, and an empirical model for correlated errors at a site. Both read mapping and genotype calling are evaluated on simulated data and real data. MAQ is accurate, efficient, versatile, and user-friendly. It is freely available at http://maq.sourceforge.net.

2,927 citations

Journal ArticleDOI
TL;DR: It is found that the Illumina sequencing data are highly replicable, with relatively little technical variation, and thus, for many purposes, it may suffice to sequence each mRNA sample only once (i.e., using one lane).
Abstract: Ultra-high-throughput sequencing is emerging as an attractive alternative to microarrays for genotyping, analysis of methylation patterns, and identification of transcription factor binding sites. Here, we describe an application of the Illumina sequencing (formerly Solexa sequencing) platform to study mRNA expression levels. Our goals were to estimate technical variance associated with Illumina sequencing in this context and to compare its ability to identify differentially expressed genes with existing array technologies. To do so, we estimated gene expression differences between liver and kidney RNA samples using multiple sequencing replicates, and compared the sequencing data to results obtained from Affymetrix arrays using the same RNA samples. We find that the Illumina sequencing data are highly replicable, with relatively little technical variation, and thus, for many purposes, it may suffice to sequence each mRNA sample only once (i.e., using one lane). The information in a single lane of Illumina sequencing data appears comparable to that in a single array in enabling identification of differentially expressed genes, while allowing for additional analyses such as detection of low-expressed genes, alternative splice variants, and novel transcripts. Based on our observations, we propose an empirical protocol and a statistical framework for the analysis of gene expression using ultra-high-throughput sequencing technology.

2,834 citations

Journal ArticleDOI
TL;DR: The program SOAP is designed to handle the huge amounts of short reads generated by parallel sequencing using the new generation Illumina-Solexa sequencing technology, which supports multi-threaded parallel computing and has a batch module for multiple query sets.
Abstract: Summary: We have developed a program SOAP for efficient gapped and ungapped alignment of short oligonucleotides onto reference sequences. The program is designed to handle the huge amounts of short reads generated by parallel sequencing using the new generation Illumina-Solexa sequencing technology. SOAP is compatible with numerous applications, including single-read or pair-end resequencing, small RNA discovery and mRNA tag sequence mapping. SOAP is a command-driven program, which supports multi-threaded parallel computing, and has a batch module for multiple query sets. Availability: http://soap.genomics.org.cn Contact: soap@genomics.org.cn

2,729 citations


"RNA-Seq: a revolutionary tool for t..." refers methods in this paper

  • ...There are several programs for mapping reads to the genome, including ELAND, SOA...

    [...]