scispace - formally typeset
Search or ask a question
Journal ArticleDOI

RNA-Seq: a revolutionary tool for transcriptomics

01 Jan 2009-Nature Reviews Genetics (Nature Publishing Group)-Vol. 10, Iss: 1, pp 57-63
TL;DR: The RNA-Seq approach to transcriptome profiling that uses deep-sequencing technologies provides a far more precise measurement of levels of transcripts and their isoforms than other methods.
Abstract: RNA-Seq is a recently developed approach to transcriptome profiling that uses deep-sequencing technologies. Studies using this method have already altered our view of the extent and complexity of eukaryotic transcriptomes. RNA-Seq also provides a far more precise measurement of levels of transcripts and their isoforms than other methods. This article describes the RNA-Seq approach, the challenges associated with its application, and the advances made so far in characterizing several eukaryote transcriptomes.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: The primary finding of extensive ribosome association shows that a necessary precondition for selective purging is met, making de novo gene evolution more plausible, and is proof of principle of the utility of ribosomal profiling data for the purpose of gene annotation.
Abstract: There have been recent surprising reports that whole genes can evolve de novo from noncoding sequences. This would be extraordinary if the noncoding sequences were random with respect to amino acid identity. However, if the noncoding sequences were previously translated at low rates, with the most strongly deleterious cryptic polypeptides purged by selection, then de novo gene origination would be more plausible. Here we analyze Saccharomyces cerevisiae data on noncoding transcripts found in association with ribosomes. We find many such transcripts. Although their average ribosomal densities are lower than those of protein-coding genes, a significant proportion of noncoding transcripts nevertheless have ribosomal densities comparable to those of coding genes. Most show increased ribosomal association in response to starvation, as has been previously reported for other noncoding sequences such as untranslated regions and introns. In rich media, ribosomal association is correlated with start codons but is not usually consistent and contiguous beyond that, suggesting that translation occurs only at low rates. One transcript contains a 28-codon open reading frame, which we name RDT1, which shows evidence of translation, and may be a new protein-coding gene that originated de novo from noncoding sequence. But the bulk of the ribosomal association cannot be attributed to unannotated protein-coding genes. Our primary finding of extensive ribosome association shows that a necessary precondition for selective purging is met, making de novo gene evolution more plausible. Our analysis is also proof of principle of the utility of ribosomal profiling data for the purpose of gene annotation.

129 citations


Cites background from "RNA-Seq: a revolutionary tool for t..."

  • ...At each stage, molecular errors in the present can provide a preview of mutations in the future (Whitehead et al. 2008; Masel and Trotter 2010; Rajon and Masel 2011)....

    [...]

Journal ArticleDOI
TL;DR: The applicability of network inference to predict interactions between host–pathogen pairs is shown, demonstrating the usefulness of this systems biology approach to decipher mechanisms of microbial pathogenesis.
Abstract: The ability to adapt to diverse micro-environmental challenges encountered within a host is of pivotal importance to the opportunistic fungal pathogen C. albicans. We have quantified C.albicans and M. musculus gene expression dynamics during phagocytosis by dendritic cells in a genome-wide, time-resolved analysis using simultaneous RNA-seq. A robust network inference map was generated from this dataset using NetGenerator, predicting novel interactions between the host and the pathogen. We experimentally verified predicted interdependent sub-networks comprising Hap3 in C. albicans, and Ptx3 and Mta2 in M. musculus. Remarkably, binding of recombinant Ptx3 to the C. albicans cell wall was found to regulate the expression of fungal Hap3 target genes as predicted by the network inference model. Pre-incubation of C. albicans with recombinant Ptx3 significantly altered the expression of Mta2 target cytokines such as IL-2 and IL-4 in a Hap3-dependent manner, further suggesting a role for Mta2 in host-pathogen interplay as predicted in the network inference model. We propose an integrated model for the functionality of these sub-networks during fungal invasion of immune cells, according to which binding of Ptx3 to the C. albicans cell wall induces remodelling via fungal Hap3 target genes, thereby altering the immune response to the pathogen. We show the applicability of network inference to predict interactions between host-pathogen pairs, demonstrating the usefulness of this systems biology approach to decipher mechanisms of microbial pathogenesis.

129 citations


Cites background from "RNA-Seq: a revolutionary tool for t..."

  • ...These technologies now allow for the parallel sequencing of millions of nucleotide sequences simultaneously (Wang et al., 2009; Zhang et al., 2011)....

    [...]

Journal ArticleDOI
TL;DR: There is good evidence to support the continued use of transcriptomics, especially emerging techniques such as RNA-Seq, as a screening tool for candidate gene discovery and the consistently low correlation between transcript abundance and other measures of gene expression imposes an inherent limitation that cannot be ignored.
Abstract: More than 100 different studies of plant transcriptomic responses to salinity or drought-related stress have now been published. Most of these use microarrays or related high-throughput profiling technologies. This compels us to ask three questions in review: (1) what has transcriptomics contributed to our understanding of stress physiology; (2) what limits the ability of transcriptomics to contribute to increases in stress tolerance; and (3) given these limits, what are the most appropriate uses of transcriptomics? We conclude that although microarrays are now a mature technology that accurately describes the transcriptome, the consistently low correlation between transcript abundance and other measures of gene expression imposes an inherent limitation that cannot be ignored. Further limitations on the relevance of transcriptomics arise in some cases from experimental practices related to the treatment regimen and the selection of tissue or germplasm. Nevertheless, there is good evidence to support the continued use of transcriptomics, especially emerging techniques such as RNA-Seq, as a screening tool for candidate gene discovery. Microarrays can also be valuable in analysing the transcriptome per se (e.g. when describing the phenotype of a transcription factor mutant or discovering non-coding RNA species), and when integrated with other types of data including metabolomic analyses.

129 citations

Patent
28 Aug 2014
TL;DR: In this article, the authors provide methods, compositions, and kits for multiplex nucleic acid analysis of single cells, which may be used for massively parallel single-cell sequencing.
Abstract: The disclosure provides for methods, compositions, and kits for multiplex nucleic acid analysis of single cells. The methods, compositions and systems may be used for massively parallel single cell sequencing. The methods, compositions and systems may be used to analyze thousands of cells concurrently. The thousands of cells may comprise a mixed population of cells (e.g., cells of different types or subtypes, different sizes).

129 citations

Journal ArticleDOI
TL;DR: RNA-Seq profiles of PR strains combined with tissue-specific profiles and molecular docking revealed multi-level insecticide resistance in C. lectularius, and identified 109 putative defense genes involved in penetration resistance and metabolic resistance.
Abstract: Background Bed bugs (Cimex lectularius) are hematophagous nocturnal parasites of humans that have attained high impact status due to their worldwide resurgence. The sudden and rampant resurgence of C. lectularius has been attributed to numerous factors including frequent international travel, narrower pest management practices, and insecticide resistance.

129 citations


Cites methods from "RNA-Seq: a revolutionary tool for t..."

  • ...The next-generation sequencing (NGS) methods via different platforms (Illumina Genome Analyzer, Applied Biosystems SOLiD, Helicos Biosciences Heliscope, Roche 454 Life Sciences) have revolutionized functional genomics research in non-model organisms [21-24]....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: Although >90% of uniquely mapped reads fell within known exons, the remaining data suggest new and revised gene models, including changed or additional promoters, exons and 3′ untranscribed regions, as well as new candidate microRNA precursors.
Abstract: We have mapped and quantified mouse transcriptomes by deeply sequencing them and recording how frequently each gene is represented in the sequence sample (RNA-Seq). This provides a digital measure of the presence and prevalence of transcripts from known and previously unknown genes. We report reference measurements composed of 41–52 million mapped 25-base-pair reads for poly(A)-selected RNA from adult mouse brain, liver and skeletal muscle tissues. We used RNA standards to quantify transcript prevalence and to test the linear range of transcript detection, which spanned five orders of magnitude. Although >90% of uniquely mapped reads fell within known exons, the remaining data suggest new and revised gene models, including changed or additional promoters, exons and 3′ untranscribed regions, as well as new candidate microRNA precursors. RNA splice events, which are not readily measured by standard gene expression microarray or serial analysis of gene expression methods, were detected directly by mapping splice-crossing sequence reads. We observed 1.45 × 10 5 distinct splices, and alternative splices were prominent, with 3,500 different genes expressing one or more alternate internal splices. The mRNA population specifies a cell’s identity and helps to govern its present and future activities. This has made transcriptome analysis a general phenotyping method, with expression microarrays of many kinds in routine use. Here we explore the possibility that transcriptome analysis, transcript discovery and transcript refinement can be done effectively in large and complex mammalian genomes by ultra-high-throughput sequencing. Expression microarrays are currently the most widely used methodology for transcriptome analysis, although some limitations persist. These include hybridization and cross-hybridization artifacts 1–3 , dye-based detection issues and design constraints that preclude or seriously limit the detection of RNA splice patterns and previously unmapped genes. These issues have made it difficult for standard array designs to provide full sequence comprehensiveness (coverage of all possible genes, including unknown ones, in large genomes) or transcriptome comprehensiveness (reliable detection of all RNAs of all prevalence classes, including the least abundant ones that are physiologically relevant). Other

12,293 citations

PatentDOI
04 Oct 2000-Science
TL;DR: Serial analysis of gene expression (SAGE) should provide a broadly applicable means for the quantitative cataloging and comparison of expressed genes in a variety of normal, developmental, and disease states.
Abstract: PROBLEM TO BE SOLVED: To provide a method for preparing a short nucleotide sequence (tag) which is useful to identify a cDNA oligonucleotide and is derived from a restricted position in a mRNA or a cDNA. SOLUTION: This is the method of preparing a tag for identifying the cDNA oligonucleotide. The above method comprises preparing the cDNA oligonucleotide bearing 5' and 3' terminals, collecting cDNA fragments by cutting the cDNA oligonucleotide with a restriction enzyme at the first restriction endonuclease site, separating a cDNA oligonucleotide bearing 5' or 3' terminal and connecting an oligonucleotide linker to the isolated cDNA fragment bearing the cDNA oligonucleotide 5' or 3' terminal. Here, the oligonucleotide linker contains the recognition site of the second restriction endonuclease enzyme and the isolated cDNA fragment is cut with the second restriction endonuclease enzyme which cuts the cDNA fragment in a section separated from the recognition site to obtain the tag for identifying the cDNA oligonucleotide.

4,437 citations

Journal ArticleDOI
TL;DR: This work describes the software MAQ, software that can build assemblies by mapping shotgun short reads to a reference genome, using quality scores to derive genotype calls of the consensus sequence of a diploid genome, e.g., from a human sample.
Abstract: New sequencing technologies promise a new era in the use of DNA sequence. However, some of these technologies produce very short reads, typically of a few tens of base pairs, and to use these reads effectively requires new algorithms and software. In particular, there is a major issue in efficiently aligning short reads to a reference genome and handling ambiguity or lack of accuracy in this alignment. Here we introduce the concept of mapping quality, a measure of the confidence that a read actually comes from the position it is aligned to by the mapping algorithm. We describe the software MAQ that can build assemblies by mapping shotgun short reads to a reference genome, using quality scores to derive genotype calls of the consensus sequence of a diploid genome, e.g., from a human sample. MAQ makes full use of mate-pair information and estimates the error probability of each read alignment. Error probabilities are also derived for the final genotype calls, using a Bayesian statistical model that incorporates the mapping qualities, error probabilities from the raw sequence quality scores, sampling of the two haplotypes, and an empirical model for correlated errors at a site. Both read mapping and genotype calling are evaluated on simulated data and real data. MAQ is accurate, efficient, versatile, and user-friendly. It is freely available at http://maq.sourceforge.net.

2,927 citations

Journal ArticleDOI
TL;DR: It is found that the Illumina sequencing data are highly replicable, with relatively little technical variation, and thus, for many purposes, it may suffice to sequence each mRNA sample only once (i.e., using one lane).
Abstract: Ultra-high-throughput sequencing is emerging as an attractive alternative to microarrays for genotyping, analysis of methylation patterns, and identification of transcription factor binding sites. Here, we describe an application of the Illumina sequencing (formerly Solexa sequencing) platform to study mRNA expression levels. Our goals were to estimate technical variance associated with Illumina sequencing in this context and to compare its ability to identify differentially expressed genes with existing array technologies. To do so, we estimated gene expression differences between liver and kidney RNA samples using multiple sequencing replicates, and compared the sequencing data to results obtained from Affymetrix arrays using the same RNA samples. We find that the Illumina sequencing data are highly replicable, with relatively little technical variation, and thus, for many purposes, it may suffice to sequence each mRNA sample only once (i.e., using one lane). The information in a single lane of Illumina sequencing data appears comparable to that in a single array in enabling identification of differentially expressed genes, while allowing for additional analyses such as detection of low-expressed genes, alternative splice variants, and novel transcripts. Based on our observations, we propose an empirical protocol and a statistical framework for the analysis of gene expression using ultra-high-throughput sequencing technology.

2,834 citations

Journal ArticleDOI
TL;DR: The program SOAP is designed to handle the huge amounts of short reads generated by parallel sequencing using the new generation Illumina-Solexa sequencing technology, which supports multi-threaded parallel computing and has a batch module for multiple query sets.
Abstract: Summary: We have developed a program SOAP for efficient gapped and ungapped alignment of short oligonucleotides onto reference sequences. The program is designed to handle the huge amounts of short reads generated by parallel sequencing using the new generation Illumina-Solexa sequencing technology. SOAP is compatible with numerous applications, including single-read or pair-end resequencing, small RNA discovery and mRNA tag sequence mapping. SOAP is a command-driven program, which supports multi-threaded parallel computing, and has a batch module for multiple query sets. Availability: http://soap.genomics.org.cn Contact: soap@genomics.org.cn

2,729 citations


"RNA-Seq: a revolutionary tool for t..." refers methods in this paper

  • ...There are several programs for mapping reads to the genome, including ELAND, SOA...

    [...]