scispace - formally typeset
Search or ask a question
Journal ArticleDOI

RNA-Seq: a revolutionary tool for transcriptomics

01 Jan 2009-Nature Reviews Genetics (Nature Publishing Group)-Vol. 10, Iss: 1, pp 57-63
TL;DR: The RNA-Seq approach to transcriptome profiling that uses deep-sequencing technologies provides a far more precise measurement of levels of transcripts and their isoforms than other methods.
Abstract: RNA-Seq is a recently developed approach to transcriptome profiling that uses deep-sequencing technologies. Studies using this method have already altered our view of the extent and complexity of eukaryotic transcriptomes. RNA-Seq also provides a far more precise measurement of levels of transcripts and their isoforms than other methods. This article describes the RNA-Seq approach, the challenges associated with its application, and the advances made so far in characterizing several eukaryote transcriptomes.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
07 Feb 2012-PLOS ONE
TL;DR: RNA-seq data significantly improve the genome annotation and provide a fully comprehensive view of the global transcriptome of T. thermophila, which becomes an excellent unicellular model eukaryote in which to investigate mechanisms of alternative splicing.
Abstract: Background: The ciliated protozoan Tetrahymena thermophila is a well-studied single-celled eukaryote model organism for cellular and molecular biology. However, the lack of extensive T. thermophila cDNA libraries or a large expressed sequence tag (EST) database limited the quality of the original genome annotation. Methodology/Principal Findings: This RNA-seq study describes the first deep sequencing analysis of the T. thermophila transcriptome during the three major stages of the life cycle: growth, starvation and conjugation. Uniquely mapped reads covered more than 96% of the 24,725 predicted gene models in the somatic genome. More than 1,000 new transcribed regions were identified. The great dynamic range of RNA-seq allowed detection of a nearly six order-of-magnitude range of measurable gene expression orchestrated by this cell. RNA-seq also allowed the first prediction of transcript untranslated regions (UTRs) and an updated (larger) size estimate of the T. thermophila transcriptome: 57 Mb, or about 55% of the somatic genome. Our study identified nearly 1,500 alternative splicing (AS) events distributed over 5.2% of T. thermophila genes. This percentage represents a two order-of-magnitude increase over previous EST-based estimates in Tetrahymena. Evidence of stage-specific regulation of alternative splicing was also obtained. Finally, our study allowed us to completely confirm about 26.8% of the genes originally predicted by the gene finder, to correct coding sequence boundaries and intron-exon junctions for about a third, and to reassign microarray probes and correct earlier microarray data. Conclusions/Significance: RNA-seq data significantly improve the genome annotation and provide a fully comprehensive view of the global transcriptome of T. thermophila. To our knowledge, 5.2% of T. thermophila genes with AS is the highest percentage of genes showing AS reported in a unicellular eukaryote. Tetrahymena thus becomes an excellent unicellular model eukaryote in which to investigate mechanisms of alternative splicing.

107 citations


Cites background from "RNA-Seq: a revolutionary tool for t..."

  • ...transcriptome of an organism [12,13], and is more sensitive than...

    [...]

Journal ArticleDOI
TL;DR: Fourteen new candidate gene sequences, discovered for the first time in this study, will greatly enable future research - using the sea cucumber model - into the molecular mechanisms associated with intestine and body wall regeneration.

107 citations


Cites background from "RNA-Seq: a revolutionary tool for t..."

  • ...This approach therefore has great potential for expanding sequencing databases of invertebrate animals capable of regeneration (by combining next-generation pyrosequencing) (Wang et al., 2009)....

    [...]

Journal ArticleDOI
Yan Guo1, Yulin Dai1, Hui Yu1, Shilin Zhao1, David C. Samuels1, Yu Shyr1 
01 Mar 2017-Genomics
TL;DR: It is shown that GRCh38 offers overall more accurate analysis of human sequencing data, and produced fewer false positive structural variants, and yields more reliable genomic analysis results.

106 citations

Journal ArticleDOI
TL;DR: This work sampled the transcriptomes of 10 mosquito species by assembling 36-bp sequence reads into phylogenomic data matrices containing hundreds of thousands of orthologous nucleotides from hundreds of genes, and found this approach is more efficient, data-rich, and economical than traditional PCR-based and EST-based methods.
Abstract: Assembling the tree of life is a major goal of biology, but progress has been hindered by the difficulty and expense of obtaining the orthologous DNA required for accurate and fully resolved phylogenies. Next-generation DNA sequencing technologies promise to accelerate progress, but sequencing the genomes of hundreds of thousands of eukaryotic species remains impractical. Eukaryotic transcriptomes, which are smaller than genomes and biased toward highly expressed genes that tend to be conserved, could potentially provide a rich set of phylogenetic characters. We sampled the transcriptomes of 10 mosquito species by assembling 36-bp sequence reads into phylogenomic data matrices containing hundreds of thousands of orthologous nucleotides from hundreds of genes. Analysis of these data matrices yielded robust phylogenetic inferences, even with data matrices constructed from surprisingly few sequence reads. This approach is more efficient, data-rich, and economical than traditional PCR-based and EST-based methods and provides a scalable strategy for generating phylogenomic data matrices to infer the branches and twigs of the tree of life.

106 citations


Cites methods from "RNA-Seq: a revolutionary tool for t..."

  • ...Regardless of the specific future advances, leveraging skewed transcript abundance by RNA-Seq provides a robust and efficient way to increase the genomic depth and phylogenetic resolution of the vast diversity of life on earth....

    [...]

  • ...Wang Z, Gerstein M, Snyder M (2009) RNA-Seq: A revolutionary tool for transcriptomics....

    [...]

  • ...Leveraging skewed transcript abundance by RNA-Seq to increase the genomic depth of the tree of life...

    [...]

  • ...We tested the idea that RNA-Seq (31) of non-normalized transcriptomes could be a data-rich, accurate, and cost-effective source of orthologous sequence data for phylogenomic analysis....

    [...]

  • ...Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq....

    [...]

Book ChapterDOI
01 Jan 2015
TL;DR: The present understanding of the cellular and molecular biology of mammary development and milk secretion is provided in the context of the whole body mechanisms that ensure adequate flux of nutrients to the gland to provide sufficient milk to meet the needs of the neonate.
Abstract: The mammary gland undergoes its final development in the adult animal, filling the mammary fat pad with a ductal tree and rudimentary alveoli during puberty, and expanding and fully differentiating during pregnancy. The hormones of pregnancy, prolactin (PRL), placental lactogen, growth hormone, and progesterone (P4) cooperate to produce a gland that is fully developed but nonfunctional. Secretory activity commences around parturition, with the withdrawal of progesterone and maintained levels of PRL and glucocorticoid. During lactation PRL provides a comprehensive signal that fosters synthesis and secretion of milk components and the survival of the alveolar cell. The lactating gland produces milk of a composition defined for the species using several specialized pathways including: (1) exocytosis for the secretion of milk proteins, lactose and divalent ions; (2) a unique lipid secretion pathway that produces membrane-bound milk fat globules; (3) transport systems for monovalent ions, glucose, and amino acids; and (d) transcytosis for the secretion of immunoglobulins and other milk components. Tight junctions form a gasket around the apical surface of the epithelial cells that is open to traffic of large and small molecules in pregnancy but tightly closed in lactation. The volume of milk produced is determined by milk removal from the gland, a function dependent on oxytocin secretion by the posterior pituitary and contraction of myoepithelial cells to force milk out of the alveoli. With the termination of milk removal an orderly involution process involving interactions with immune cells returns the gland to its resting state. In short, we provide a summary of our present understanding of the cellular and molecular biology of mammary development and milk secretion in the context of the whole body mechanisms that ensure adequate flux of nutrients to the gland to provide sufficient milk to meet the needs of the neonate.

106 citations

References
More filters
Journal ArticleDOI
TL;DR: Although >90% of uniquely mapped reads fell within known exons, the remaining data suggest new and revised gene models, including changed or additional promoters, exons and 3′ untranscribed regions, as well as new candidate microRNA precursors.
Abstract: We have mapped and quantified mouse transcriptomes by deeply sequencing them and recording how frequently each gene is represented in the sequence sample (RNA-Seq). This provides a digital measure of the presence and prevalence of transcripts from known and previously unknown genes. We report reference measurements composed of 41–52 million mapped 25-base-pair reads for poly(A)-selected RNA from adult mouse brain, liver and skeletal muscle tissues. We used RNA standards to quantify transcript prevalence and to test the linear range of transcript detection, which spanned five orders of magnitude. Although >90% of uniquely mapped reads fell within known exons, the remaining data suggest new and revised gene models, including changed or additional promoters, exons and 3′ untranscribed regions, as well as new candidate microRNA precursors. RNA splice events, which are not readily measured by standard gene expression microarray or serial analysis of gene expression methods, were detected directly by mapping splice-crossing sequence reads. We observed 1.45 × 10 5 distinct splices, and alternative splices were prominent, with 3,500 different genes expressing one or more alternate internal splices. The mRNA population specifies a cell’s identity and helps to govern its present and future activities. This has made transcriptome analysis a general phenotyping method, with expression microarrays of many kinds in routine use. Here we explore the possibility that transcriptome analysis, transcript discovery and transcript refinement can be done effectively in large and complex mammalian genomes by ultra-high-throughput sequencing. Expression microarrays are currently the most widely used methodology for transcriptome analysis, although some limitations persist. These include hybridization and cross-hybridization artifacts 1–3 , dye-based detection issues and design constraints that preclude or seriously limit the detection of RNA splice patterns and previously unmapped genes. These issues have made it difficult for standard array designs to provide full sequence comprehensiveness (coverage of all possible genes, including unknown ones, in large genomes) or transcriptome comprehensiveness (reliable detection of all RNAs of all prevalence classes, including the least abundant ones that are physiologically relevant). Other

12,293 citations

PatentDOI
04 Oct 2000-Science
TL;DR: Serial analysis of gene expression (SAGE) should provide a broadly applicable means for the quantitative cataloging and comparison of expressed genes in a variety of normal, developmental, and disease states.
Abstract: PROBLEM TO BE SOLVED: To provide a method for preparing a short nucleotide sequence (tag) which is useful to identify a cDNA oligonucleotide and is derived from a restricted position in a mRNA or a cDNA. SOLUTION: This is the method of preparing a tag for identifying the cDNA oligonucleotide. The above method comprises preparing the cDNA oligonucleotide bearing 5' and 3' terminals, collecting cDNA fragments by cutting the cDNA oligonucleotide with a restriction enzyme at the first restriction endonuclease site, separating a cDNA oligonucleotide bearing 5' or 3' terminal and connecting an oligonucleotide linker to the isolated cDNA fragment bearing the cDNA oligonucleotide 5' or 3' terminal. Here, the oligonucleotide linker contains the recognition site of the second restriction endonuclease enzyme and the isolated cDNA fragment is cut with the second restriction endonuclease enzyme which cuts the cDNA fragment in a section separated from the recognition site to obtain the tag for identifying the cDNA oligonucleotide.

4,437 citations

Journal ArticleDOI
TL;DR: This work describes the software MAQ, software that can build assemblies by mapping shotgun short reads to a reference genome, using quality scores to derive genotype calls of the consensus sequence of a diploid genome, e.g., from a human sample.
Abstract: New sequencing technologies promise a new era in the use of DNA sequence. However, some of these technologies produce very short reads, typically of a few tens of base pairs, and to use these reads effectively requires new algorithms and software. In particular, there is a major issue in efficiently aligning short reads to a reference genome and handling ambiguity or lack of accuracy in this alignment. Here we introduce the concept of mapping quality, a measure of the confidence that a read actually comes from the position it is aligned to by the mapping algorithm. We describe the software MAQ that can build assemblies by mapping shotgun short reads to a reference genome, using quality scores to derive genotype calls of the consensus sequence of a diploid genome, e.g., from a human sample. MAQ makes full use of mate-pair information and estimates the error probability of each read alignment. Error probabilities are also derived for the final genotype calls, using a Bayesian statistical model that incorporates the mapping qualities, error probabilities from the raw sequence quality scores, sampling of the two haplotypes, and an empirical model for correlated errors at a site. Both read mapping and genotype calling are evaluated on simulated data and real data. MAQ is accurate, efficient, versatile, and user-friendly. It is freely available at http://maq.sourceforge.net.

2,927 citations

Journal ArticleDOI
TL;DR: It is found that the Illumina sequencing data are highly replicable, with relatively little technical variation, and thus, for many purposes, it may suffice to sequence each mRNA sample only once (i.e., using one lane).
Abstract: Ultra-high-throughput sequencing is emerging as an attractive alternative to microarrays for genotyping, analysis of methylation patterns, and identification of transcription factor binding sites. Here, we describe an application of the Illumina sequencing (formerly Solexa sequencing) platform to study mRNA expression levels. Our goals were to estimate technical variance associated with Illumina sequencing in this context and to compare its ability to identify differentially expressed genes with existing array technologies. To do so, we estimated gene expression differences between liver and kidney RNA samples using multiple sequencing replicates, and compared the sequencing data to results obtained from Affymetrix arrays using the same RNA samples. We find that the Illumina sequencing data are highly replicable, with relatively little technical variation, and thus, for many purposes, it may suffice to sequence each mRNA sample only once (i.e., using one lane). The information in a single lane of Illumina sequencing data appears comparable to that in a single array in enabling identification of differentially expressed genes, while allowing for additional analyses such as detection of low-expressed genes, alternative splice variants, and novel transcripts. Based on our observations, we propose an empirical protocol and a statistical framework for the analysis of gene expression using ultra-high-throughput sequencing technology.

2,834 citations

Journal ArticleDOI
TL;DR: The program SOAP is designed to handle the huge amounts of short reads generated by parallel sequencing using the new generation Illumina-Solexa sequencing technology, which supports multi-threaded parallel computing and has a batch module for multiple query sets.
Abstract: Summary: We have developed a program SOAP for efficient gapped and ungapped alignment of short oligonucleotides onto reference sequences. The program is designed to handle the huge amounts of short reads generated by parallel sequencing using the new generation Illumina-Solexa sequencing technology. SOAP is compatible with numerous applications, including single-read or pair-end resequencing, small RNA discovery and mRNA tag sequence mapping. SOAP is a command-driven program, which supports multi-threaded parallel computing, and has a batch module for multiple query sets. Availability: http://soap.genomics.org.cn Contact: soap@genomics.org.cn

2,729 citations


"RNA-Seq: a revolutionary tool for t..." refers methods in this paper

  • ...There are several programs for mapping reads to the genome, including ELAND, SOA...

    [...]