scispace - formally typeset
Search or ask a question
Journal ArticleDOI

RNA-Seq: a revolutionary tool for transcriptomics

01 Jan 2009-Nature Reviews Genetics (Nature Publishing Group)-Vol. 10, Iss: 1, pp 57-63
TL;DR: The RNA-Seq approach to transcriptome profiling that uses deep-sequencing technologies provides a far more precise measurement of levels of transcripts and their isoforms than other methods.
Abstract: RNA-Seq is a recently developed approach to transcriptome profiling that uses deep-sequencing technologies. Studies using this method have already altered our view of the extent and complexity of eukaryotic transcriptomes. RNA-Seq also provides a far more precise measurement of levels of transcripts and their isoforms than other methods. This article describes the RNA-Seq approach, the challenges associated with its application, and the advances made so far in characterizing several eukaryote transcriptomes.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: The complete genome sequence of this Pathogen is now available and has been extremely useful for the identification of repertoire of genes present in this pathogen and its complete life style will undoubtedly be useful for developing potential antifungal drugs and tackling Candida infections.
Abstract: Candida albicans is an opportunistic human fungal pathogen that causes candidiasis As healthcare has been improved worldwide, the number of immunocompromised patients has been increased to a greater extent and they are highly susceptible to various pathogenic microbes and C albicans has been prominent among the fungal pathogens The complete genome sequence of this pathogen is now available and has been extremely useful for the identification of repertoire of genes present in this pathogen The major challenge is now to assign the functions to these genes of which 13% are specific to C albicans Due to its close relationship with yeast Saccharomyces cerevisiae, an edge over other fungal pathogens because most of the technologies can be directly transferred to C albicans from S cerevisiae and it is amenable to mutation, gene disruption, and transformation The last two decades have witnessed enormous amount of research activities on this pathogen that leads to the understanding of host-parasite interaction, infections, and disease propagation Clearly, C albicans has emerged as a model organism for studying fungal pathogens along with other two fungi Aspergillus fumigatus and Cryptococcus neoformans Understanding its complete life style of C albicans will undoubtedly be useful for developing potential antifungal drugs and tackling Candida infections This will also shed light on the functioning of other fungal pathogens

138 citations


Cites methods from "RNA-Seq: a revolutionary tool for t..."

  • ...Subsequently, each cDNA molecule is sequenced by high-throughput sequencing technology to obtain short sequences of sizes ranging from 30 to 400 bp depending upon the sequencing technology used [21]....

    [...]

Journal ArticleDOI
TL;DR: This article shows that eQTL mapping of a single gene using the association‐based method, by directly modeling TReC using discrete distributions, has higher statistical power than the two‐step approach: data normalization followed by linear regression.
Abstract: Summary RNA-seq may replace gene expression microarrays in the near future Using RNA-seq, the expression of a gene can be estimated using the total number of sequence reads mapped to that gene, known as the total read count (TReC) Traditional expression quantitative trait locus (eQTL) mapping methods, such as linear regression, can be applied to TReC measurements after they are properly normalized In this article, we show that eQTL mapping, by directly modeling TReC using discrete distributions, has higher statistical power than the two-step approach: data normalization followed by linear regression In addition, RNA-seq provides information on allele-specific expression (ASE) that is not available from microarrays By combining the information from TReC and ASE, we can computationally distinguish cis -a ndtrans-eQTL and further improve the power of cis-eQTL mapping Both simulation and real data studies confirm the improved power of our new methods We also discuss the design issues of RNA-seq experiments Specifically, we show that by combining TReC and ASE measurements, it is possible to minimize cost and retain the statistical power of cis-eQTL mapping by reducing sample size while increasing the number of sequence reads per sample In addition to RNA-seq data, our method can also be employed to study the genetic basis of other types of sequencing data, such as chromatin immunoprecipitation followed by DNA sequencing data In this article, we focus on eQTL mapping of a single gene using the association-based method However, our method establishes a statistical framework for future developments of eQTL mapping methods using RNA-seq data (eg, linkage-based eQTL mapping), and the joint study of multiple genetic markers and/or multiple genes

138 citations


Cites background or methods from "RNA-Seq: a revolutionary tool for t..."

  • ...High-throughput RNA sequencing, also known as RNA-seq, is becoming a popular technique to measure gene expression abundance (Mortazavi et al., 2008; Wang, Gerstein, and Snyder, 2009)....

    [...]

  • ...For example, RNA-seq is less noisy and has a much larger dynamic range than microarrays, and RNA-seq can identify new transcripts whereas microarray’s detection capability is limited by the probes on the array (Wang et al., 2009)....

    [...]

Journal ArticleDOI
TL;DR: The nature of, and types of mechanisms underlying, expression changes that occur in upon intraspecific hybridization in natural populations are explored.
Abstract: Hybridization is a prominent process among natural plant populations that can result in phenotypic novelty, heterosis, and changes in gene expression. The effects of intraspecific hybridization on F1 hybrid gene expression were investigated using parents from divergent, natural populations of Cirsium arvense, an invasive Compositae weed. Using an RNA-seq approach, the expression of 68,746 unigenes was quantified in parents and hybrids. The expression levels of 51% of transcripts differed between parents, a majority of which had less than 1.25× fold-changes. More unigenes had higher expression in the invasive parent (P1) than the noninvasive parent (P2). Of those that were divergently expressed between parents, 10% showed additive and 81% showed nonadditive (transgressive or dominant) modes of gene action in the hybrids. A majority of the dominant cases had P2-like expression patterns in the hybrids. Comparisons of allele-specific expression also enabled a survey of cis- and trans-regulatory effects. Cis- and trans-regulatory divergence was found at 70% and 68% of 62,281 informative single-nucleotide polymorphism sites, respectively. Of the 17% of sites exhibiting both cis- and trans-effects, a majority (70%) had antagonistic regulatory interactions (cis x trans); trans-divergence tended to drive higher expression of the P1 allele, whereas cis-divergence tended to increase P2 transcript abundance. Trans-effects correlated more highly than cis with parental expression divergence and accounted for a greater proportion of the regulatory divergence at sites with additive compared with nonadditive inheritance patterns. This study explores the nature of, and types of mechanisms underlying, expression changes that occur in upon intraspecific hybridization in natural populations.

138 citations

Journal ArticleDOI
TL;DR: This study provides a comprehensive examination of gene activities in bovine embryos and identified little-known potential master regulators of pre-implantation development, demonstrating that bovines are better models for human embryonic development.
Abstract: During mammalian pre-implantation embryonic development dramatic and orchestrated changes occur in gene transcription. The identification of the complete changes has not been possible until the development of the Next Generation Sequencing Technology. Here we report comprehensive transcriptome dynamics of single matured bovine oocytes and pre-implantation embryos developed in vivo. Surprisingly, more than half of the estimated 22,000 bovine genes, 11,488 to 12,729 involved in more than 100 pathways, is expressed in oocytes and early embryos. Despite the similarity in the total numbers of genes expressed across stages, the nature of the expressed genes is dramatically different. A total of 2,845 genes were differentially expressed among different stages, of which the largest change was observed between the 4- and 8-cell stages, demonstrating that the bovine embryonic genome is activated at this transition. Additionally, 774 genes were identified as only expressed/highly enriched in particular stages of development, suggesting their stage-specific roles in embryogenesis. Using weighted gene co-expression network analysis, we found 12 stage-specific modules of co-expressed genes that can be used to represent the corresponding stage of development. Furthermore, we identified conserved key members (or hub genes) of the bovine expressed gene networks. Their vast association with other embryonic genes suggests that they may have important regulatory roles in embryo development; yet, the majority of the hub genes are relatively unknown/under-studied in embryos. We also conducted the first comparison of embryonic expression profiles across three mammalian species, human, mouse and bovine, for which RNA-seq data are available. We found that the three species share more maternally deposited genes than embryonic genome activated genes. More importantly, there are more similarities in embryonic transcriptomes between bovine and humans than between humans and mice, demonstrating that bovine embryos are better models for human embryonic development. This study provides a comprehensive examination of gene activities in bovine embryos and identified little-known potential master regulators of pre-implantation development.

138 citations


Cites background or methods from "RNA-Seq: a revolutionary tool for t..."

  • ...In this study, we applied the RNA-seq technology and revealed the transcriptomes of bovine in vivo pre-implantation development in a very high-throughput and quantitative manner [14]....

    [...]

  • ...The RNA-seq technology provides unique benefits for studying gene expression with high resolutions and reproducibility, as well as for detecting novel transcripts and alternative splicing events [14,15]....

    [...]

Journal ArticleDOI
TL;DR: New proteomic research trends such as the study of posttranslational modifications and protein–protein interactions, as well as the combined use of the different ‐omics approaches, are discussed in relation to the development of a more functional and integrated perspective, needed for achieving a more comprehensive knowledge of evolutionary change.
Abstract: The study of the proteome (proteomics), which includes the dynamics of protein expression, regulation, interactions and its function, has played a less prominent role in evolutionary and ecological investigations in comparison with the study of the genome and transcriptome. There are, however, a number of arguments suggesting that this situation should change. First, the proteome is closer to the phenotype than the genome or the transcriptome, and as such may be more directly responsive to natural selection, and thus closely linked to adaptation. Second, there is evidence of a low correlation between protein and transcript expression levels across genes in many different organisms. Finally, there have been some recent important technological improvements in proteomics methods that make them feasible, practical and useful to address a wide range of evolutionary questions even in nonmodel organisms. The different proteomic methods, their limitations and problems when interpreting empirical data are described and discussed. In addition, the proteomic literature pertaining to evolutionary ecology is reviewed with examples, and potential applications of proteomics in a variety of evolutionary contexts are outlined. New proteomic research trends such as the study of posttranslational modifications and protein-protein interactions, as well as the combined use of the different -omics approaches, are discussed in relation to the development of a more functional and integrated perspective, needed for achieving a more comprehensive knowledge of evolutionary change.

138 citations


Cites background from "RNA-Seq: a revolutionary tool for t..."

  • ...Actually, the newest methodological advance in this direction consists of the deep sequencing of RNA (RNA-Seq) that is allowing the identification and quantification of the transcriptome, including those of nonmodel organisms (Wang et al. 2009b and reference therein)....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: Although >90% of uniquely mapped reads fell within known exons, the remaining data suggest new and revised gene models, including changed or additional promoters, exons and 3′ untranscribed regions, as well as new candidate microRNA precursors.
Abstract: We have mapped and quantified mouse transcriptomes by deeply sequencing them and recording how frequently each gene is represented in the sequence sample (RNA-Seq). This provides a digital measure of the presence and prevalence of transcripts from known and previously unknown genes. We report reference measurements composed of 41–52 million mapped 25-base-pair reads for poly(A)-selected RNA from adult mouse brain, liver and skeletal muscle tissues. We used RNA standards to quantify transcript prevalence and to test the linear range of transcript detection, which spanned five orders of magnitude. Although >90% of uniquely mapped reads fell within known exons, the remaining data suggest new and revised gene models, including changed or additional promoters, exons and 3′ untranscribed regions, as well as new candidate microRNA precursors. RNA splice events, which are not readily measured by standard gene expression microarray or serial analysis of gene expression methods, were detected directly by mapping splice-crossing sequence reads. We observed 1.45 × 10 5 distinct splices, and alternative splices were prominent, with 3,500 different genes expressing one or more alternate internal splices. The mRNA population specifies a cell’s identity and helps to govern its present and future activities. This has made transcriptome analysis a general phenotyping method, with expression microarrays of many kinds in routine use. Here we explore the possibility that transcriptome analysis, transcript discovery and transcript refinement can be done effectively in large and complex mammalian genomes by ultra-high-throughput sequencing. Expression microarrays are currently the most widely used methodology for transcriptome analysis, although some limitations persist. These include hybridization and cross-hybridization artifacts 1–3 , dye-based detection issues and design constraints that preclude or seriously limit the detection of RNA splice patterns and previously unmapped genes. These issues have made it difficult for standard array designs to provide full sequence comprehensiveness (coverage of all possible genes, including unknown ones, in large genomes) or transcriptome comprehensiveness (reliable detection of all RNAs of all prevalence classes, including the least abundant ones that are physiologically relevant). Other

12,293 citations

PatentDOI
04 Oct 2000-Science
TL;DR: Serial analysis of gene expression (SAGE) should provide a broadly applicable means for the quantitative cataloging and comparison of expressed genes in a variety of normal, developmental, and disease states.
Abstract: PROBLEM TO BE SOLVED: To provide a method for preparing a short nucleotide sequence (tag) which is useful to identify a cDNA oligonucleotide and is derived from a restricted position in a mRNA or a cDNA. SOLUTION: This is the method of preparing a tag for identifying the cDNA oligonucleotide. The above method comprises preparing the cDNA oligonucleotide bearing 5' and 3' terminals, collecting cDNA fragments by cutting the cDNA oligonucleotide with a restriction enzyme at the first restriction endonuclease site, separating a cDNA oligonucleotide bearing 5' or 3' terminal and connecting an oligonucleotide linker to the isolated cDNA fragment bearing the cDNA oligonucleotide 5' or 3' terminal. Here, the oligonucleotide linker contains the recognition site of the second restriction endonuclease enzyme and the isolated cDNA fragment is cut with the second restriction endonuclease enzyme which cuts the cDNA fragment in a section separated from the recognition site to obtain the tag for identifying the cDNA oligonucleotide.

4,437 citations

Journal ArticleDOI
TL;DR: This work describes the software MAQ, software that can build assemblies by mapping shotgun short reads to a reference genome, using quality scores to derive genotype calls of the consensus sequence of a diploid genome, e.g., from a human sample.
Abstract: New sequencing technologies promise a new era in the use of DNA sequence. However, some of these technologies produce very short reads, typically of a few tens of base pairs, and to use these reads effectively requires new algorithms and software. In particular, there is a major issue in efficiently aligning short reads to a reference genome and handling ambiguity or lack of accuracy in this alignment. Here we introduce the concept of mapping quality, a measure of the confidence that a read actually comes from the position it is aligned to by the mapping algorithm. We describe the software MAQ that can build assemblies by mapping shotgun short reads to a reference genome, using quality scores to derive genotype calls of the consensus sequence of a diploid genome, e.g., from a human sample. MAQ makes full use of mate-pair information and estimates the error probability of each read alignment. Error probabilities are also derived for the final genotype calls, using a Bayesian statistical model that incorporates the mapping qualities, error probabilities from the raw sequence quality scores, sampling of the two haplotypes, and an empirical model for correlated errors at a site. Both read mapping and genotype calling are evaluated on simulated data and real data. MAQ is accurate, efficient, versatile, and user-friendly. It is freely available at http://maq.sourceforge.net.

2,927 citations

Journal ArticleDOI
TL;DR: It is found that the Illumina sequencing data are highly replicable, with relatively little technical variation, and thus, for many purposes, it may suffice to sequence each mRNA sample only once (i.e., using one lane).
Abstract: Ultra-high-throughput sequencing is emerging as an attractive alternative to microarrays for genotyping, analysis of methylation patterns, and identification of transcription factor binding sites. Here, we describe an application of the Illumina sequencing (formerly Solexa sequencing) platform to study mRNA expression levels. Our goals were to estimate technical variance associated with Illumina sequencing in this context and to compare its ability to identify differentially expressed genes with existing array technologies. To do so, we estimated gene expression differences between liver and kidney RNA samples using multiple sequencing replicates, and compared the sequencing data to results obtained from Affymetrix arrays using the same RNA samples. We find that the Illumina sequencing data are highly replicable, with relatively little technical variation, and thus, for many purposes, it may suffice to sequence each mRNA sample only once (i.e., using one lane). The information in a single lane of Illumina sequencing data appears comparable to that in a single array in enabling identification of differentially expressed genes, while allowing for additional analyses such as detection of low-expressed genes, alternative splice variants, and novel transcripts. Based on our observations, we propose an empirical protocol and a statistical framework for the analysis of gene expression using ultra-high-throughput sequencing technology.

2,834 citations

Journal ArticleDOI
TL;DR: The program SOAP is designed to handle the huge amounts of short reads generated by parallel sequencing using the new generation Illumina-Solexa sequencing technology, which supports multi-threaded parallel computing and has a batch module for multiple query sets.
Abstract: Summary: We have developed a program SOAP for efficient gapped and ungapped alignment of short oligonucleotides onto reference sequences. The program is designed to handle the huge amounts of short reads generated by parallel sequencing using the new generation Illumina-Solexa sequencing technology. SOAP is compatible with numerous applications, including single-read or pair-end resequencing, small RNA discovery and mRNA tag sequence mapping. SOAP is a command-driven program, which supports multi-threaded parallel computing, and has a batch module for multiple query sets. Availability: http://soap.genomics.org.cn Contact: soap@genomics.org.cn

2,729 citations


"RNA-Seq: a revolutionary tool for t..." refers methods in this paper

  • ...There are several programs for mapping reads to the genome, including ELAND, SOA...

    [...]