scispace - formally typeset
Search or ask a question
Journal ArticleDOI

RNA-Seq: a revolutionary tool for transcriptomics

01 Jan 2009-Nature Reviews Genetics (Nature Publishing Group)-Vol. 10, Iss: 1, pp 57-63
TL;DR: The RNA-Seq approach to transcriptome profiling that uses deep-sequencing technologies provides a far more precise measurement of levels of transcripts and their isoforms than other methods.
Abstract: RNA-Seq is a recently developed approach to transcriptome profiling that uses deep-sequencing technologies. Studies using this method have already altered our view of the extent and complexity of eukaryotic transcriptomes. RNA-Seq also provides a far more precise measurement of levels of transcripts and their isoforms than other methods. This article describes the RNA-Seq approach, the challenges associated with its application, and the advances made so far in characterizing several eukaryote transcriptomes.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: Results indicate that powerful basal defence mechanisms involved in the recognition of PAMPs or DAMPs and a high level of accumulation of defence-related gene products may contribute to BLP resistance in soybean.
Abstract: Bacterial leaf pustule (BLP) disease is caused by Xanthomonas axonopodis pv glycines (Xag) To investigate the plant basal defence mechanisms induced in response to Xag, differential gene expression in near-isogenic lines (NILs) of BLP-susceptible and BLP-resistant soybean was analysed by RNA-Seq Of a total of 46 367 genes that were mapped to soybean genome reference sequences, 1978 and 783 genes were found to be up- and down-regulated, respectively, in the BLP-resistant NIL relative to the BLP-susceptible NIL at 0, 6, and 12h after inoculation (hai) Clustering analysis revealed that these genes could be grouped into 10 clusters with different expression patterns Functional annotation based on gene ontology (GO) categories was carried out Among the putative soybean defence response genes identified (GO:0006952), 134 exhibited significant differences in expression between the BLP-resistant and -susceptible NILs In particular, pathogen-associated molecular pattern (PAMP) and damage-associated molecular pattern (DAMP) receptors and the genes induced by these receptors were highly expressed at 0 hai in the BLP-resistant NIL Additionally, pathogenesis-related (PR)-1 and -14 were highly expressed at 0 hai, and PR-3, -6, and -12 were highly expressed at 12 hai There were also significant differences in the expression of the core JA-signalling components MYC2 and JASMONATE ZIM-motif These results indicate that powerful basal defence mechanisms involved in the recognition of PAMPs or DAMPs and a high level of accumulation of defence-related gene products may contribute to BLP resistance in soybean

102 citations


Cites background from "RNA-Seq: a revolutionary tool for t..."

  • ...Using this approach, one can achieve unprecedented levels of accuracy and specificity in quantifying differentially expressed genes, identifying novel transcribed regions, and identifying alternative splice events.(29,30) Using a high-throughput gene expression profiling technique and available complete soybean genome sequences, genes involved in the soybean response to Xag infection were identified by differential expression profiling....

    [...]

Journal ArticleDOI
TL;DR: In this article, a combination of boosting and stability selection is proposed to detect influential predictors in high-dimensional settings while controlling the given error bound in various simulation scenarios, and the results were applied to investigate phenotype measurements in patients with autism spectrum disorders using a log-linear interaction model which was fitted by boosting.
Abstract: Modern biotechnologies often result in high-dimensional data sets with many more variables than observations (n≪p). These data sets pose new challenges to statistical analysis: Variable selection becomes one of the most important tasks in this setting. Similar challenges arise if in modern data sets from observational studies, e.g., in ecology, where flexible, non-linear models are fitted to high-dimensional data. We assess the recently proposed flexible framework for variable selection called stability selection. By the use of resampling procedures, stability selection adds a finite sample error control to high-dimensional variable selection procedures such as Lasso or boosting. We consider the combination of boosting and stability selection and present results from a detailed simulation study that provide insights into the usefulness of this combination. The interpretation of the used error bounds is elaborated and insights for practical data analysis are given. Stability selection with boosting was able to detect influential predictors in high-dimensional settings while controlling the given error bound in various simulation scenarios. The dependence on various parameters such as the sample size, the number of truly influential variables or tuning parameters of the algorithm was investigated. The results were applied to investigate phenotype measurements in patients with autism spectrum disorders using a log-linear interaction model which was fitted by boosting. Stability selection identified five differentially expressed amino acid pathways. Stability selection is implemented in the freely available R package stabs ( http://CRAN.R-project.org/package=stabs ). It proved to work well in high-dimensional settings with more predictors than observations for both, linear and additive models. The original version of stability selection, which controls the per-family error rate, is quite conservative, though, this is much less the case for its improvement, complementary pairs stability selection. Nevertheless, care should be taken to appropriately specify the error bound.

102 citations

Journal ArticleDOI
TL;DR: 'Steroid hormone biosynthesis', a process that normally occurs in reproductive tissue, was significantly associated with changes in gene expression in the liver of SNEB cows, indicating that NEB is an imbalance between energy intake and energy requirements for lactation and body maintenance.
Abstract: The liver is central to most economically important metabolic processes in cattle. However, the changes in expression of genes that drive these processes remain incompletely characterised. RNA-seq is the new gold standard for whole transcriptome analysis but so far there are no reports of its application to analysis of differential gene expression in cattle liver. We used RNA-seq to study differences in expression profiles of hepatic genes and their associated pathways in individual cattle in either mild negative energy balance (MNEB) or severe negative energy balance (SNEB). NEB is an imbalance between energy intake and energy requirements for lactation and body maintenance. This aberrant metabolic state affects high-yielding dairy cows after calving and is of considerable economic importance because of its negative impact on fertility and health in dairy herds. Analysis of changes in hepatic gene expression in SNEB animals will increase our understanding of NEB and contribute to the development of strategies to circumvent it. RNA-seq analysis was carried out on total RNA from liver from early post partum Holstein Friesian cows in MNEB (n = 5) and SNEB (n = 6). 12,833 genes were deemed to be expressed (>4 reads per gene per animal), 413 of which were shown to be statistically significantly differentially expressed (SDE) at a false discovery rate (FDR) of 0.1% and 200 of which were SDE (FDR of 0.1%) with a ≥2-fold change between MNEB and SNEB animals. GOseq/KEGG pathway analysis showed that SDE genes with ≥2- fold change were associated (P <0.05) with 9 KEGG pathways. Seven of these pathways were related to fatty acid metabolism and unexpectedly included ‘Steroid hormone biosynthesis’, a process which mainly occurs in the reproductive organs rather than the liver. RNA-seq analysis showed that the major changes at the level of transcription in the liver of SNEB cows were related to fat metabolism. 'Steroid hormone biosynthesis', a process that normally occurs in reproductive tissue, was significantly associated with changes in gene expression in the liver of SNEB cows. Changes in gene expression were found in this pathway that have not been previously been identified in SNEB cows.

102 citations


Cites background from "RNA-Seq: a revolutionary tool for t..."

  • ...RNA-seq is a relatively new technique that can be used to analyse changes in gene expression across the entire transcriptome [13,14], and is now being applied to a rapidly increasing number of organisms [12]....

    [...]

Journal ArticleDOI
TL;DR: Progress in sequencing technologies and bioinformatics will improve the costs, sensitivity, and accuracy of detecting somatic mutations, while large-scale projects are underway to coordinate cancer genome sequencing at the global level to facilitate the generation and dissemination of high-quality uniform genetic data.
Abstract: Advances in next-generation sequencing technology are enabling the systematic analyses of whole cancer genomes, providing insights into the landscape of somatic mutations and the great genetic heterogeneity that defines the unique signature of an individual tumor. Moreover, integrated studies of the genome, epigenome, and transcriptome reveal mechanisms of tumorigenesis at multiple levels. Progress in sequencing technologies and bioinformatics will improve the costs, sensitivity, and accuracy of detecting somatic mutations, while large-scale projects are underway to coordinate cancer genome sequencing at the global level to facilitate the generation and dissemination of high-quality uniform genetic data. These developments will create opportunities for deeper studies of cancer genetics and the clinical application of genome sequencing, and will motivate further research in cancer pathogenesis.

102 citations

Journal ArticleDOI
TL;DR: The advent of high-throughput sequencing technologies has ushered in the 'personal genome sequencing' era for both normal and cancer genomes, and made possible large-scale genome sequencing studies such as the 1000 Genomes Project and the International Cancer Genome Consortium.
Abstract: Substantial progress has been made in human genetics and genomics research over the past ten years since the publication of the draft sequence of the human genome in 2001. Findings emanating directly from the Human Genome Project, together with those from follow-on studies, have had an enormous impact on our understanding of the architecture and function of the human genome. Major developments have been made in cataloguing genetic variation, the International HapMap Project, and with respect to advances in genotyping technologies. These developments are vital for the emergence of genome-wide association studies in the investigation of complex diseases and traits. In parallel, the advent of high-throughput sequencing technologies has ushered in the 'personal genome sequencing' era for both normal and cancer genomes, and made possible large-scale genome sequencing studies such as the 1000 Genomes Project and the International Cancer Genome Consortium. The high-throughput sequencing and sequence-capture technologies are also providing new opportunities to study Mendelian disorders through exome sequencing and whole-genome sequencing. This paper reviews these major developments in human genetics and genomics over the past decade.

102 citations


Cites methods from "RNA-Seq: a revolutionary tool for t..."

  • ...The advent of NGS technologies has spawned new approaches to exploring the transcriptome (eg RNA-Seq).(362,363) This method allows the study of the expression of mRNAs and non-coding RNAs, and is also able to detect and identify new transcripts (coding and non-coding) that have not been formally annotated....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: Although >90% of uniquely mapped reads fell within known exons, the remaining data suggest new and revised gene models, including changed or additional promoters, exons and 3′ untranscribed regions, as well as new candidate microRNA precursors.
Abstract: We have mapped and quantified mouse transcriptomes by deeply sequencing them and recording how frequently each gene is represented in the sequence sample (RNA-Seq). This provides a digital measure of the presence and prevalence of transcripts from known and previously unknown genes. We report reference measurements composed of 41–52 million mapped 25-base-pair reads for poly(A)-selected RNA from adult mouse brain, liver and skeletal muscle tissues. We used RNA standards to quantify transcript prevalence and to test the linear range of transcript detection, which spanned five orders of magnitude. Although >90% of uniquely mapped reads fell within known exons, the remaining data suggest new and revised gene models, including changed or additional promoters, exons and 3′ untranscribed regions, as well as new candidate microRNA precursors. RNA splice events, which are not readily measured by standard gene expression microarray or serial analysis of gene expression methods, were detected directly by mapping splice-crossing sequence reads. We observed 1.45 × 10 5 distinct splices, and alternative splices were prominent, with 3,500 different genes expressing one or more alternate internal splices. The mRNA population specifies a cell’s identity and helps to govern its present and future activities. This has made transcriptome analysis a general phenotyping method, with expression microarrays of many kinds in routine use. Here we explore the possibility that transcriptome analysis, transcript discovery and transcript refinement can be done effectively in large and complex mammalian genomes by ultra-high-throughput sequencing. Expression microarrays are currently the most widely used methodology for transcriptome analysis, although some limitations persist. These include hybridization and cross-hybridization artifacts 1–3 , dye-based detection issues and design constraints that preclude or seriously limit the detection of RNA splice patterns and previously unmapped genes. These issues have made it difficult for standard array designs to provide full sequence comprehensiveness (coverage of all possible genes, including unknown ones, in large genomes) or transcriptome comprehensiveness (reliable detection of all RNAs of all prevalence classes, including the least abundant ones that are physiologically relevant). Other

12,293 citations

PatentDOI
04 Oct 2000-Science
TL;DR: Serial analysis of gene expression (SAGE) should provide a broadly applicable means for the quantitative cataloging and comparison of expressed genes in a variety of normal, developmental, and disease states.
Abstract: PROBLEM TO BE SOLVED: To provide a method for preparing a short nucleotide sequence (tag) which is useful to identify a cDNA oligonucleotide and is derived from a restricted position in a mRNA or a cDNA. SOLUTION: This is the method of preparing a tag for identifying the cDNA oligonucleotide. The above method comprises preparing the cDNA oligonucleotide bearing 5' and 3' terminals, collecting cDNA fragments by cutting the cDNA oligonucleotide with a restriction enzyme at the first restriction endonuclease site, separating a cDNA oligonucleotide bearing 5' or 3' terminal and connecting an oligonucleotide linker to the isolated cDNA fragment bearing the cDNA oligonucleotide 5' or 3' terminal. Here, the oligonucleotide linker contains the recognition site of the second restriction endonuclease enzyme and the isolated cDNA fragment is cut with the second restriction endonuclease enzyme which cuts the cDNA fragment in a section separated from the recognition site to obtain the tag for identifying the cDNA oligonucleotide.

4,437 citations

Journal ArticleDOI
TL;DR: This work describes the software MAQ, software that can build assemblies by mapping shotgun short reads to a reference genome, using quality scores to derive genotype calls of the consensus sequence of a diploid genome, e.g., from a human sample.
Abstract: New sequencing technologies promise a new era in the use of DNA sequence. However, some of these technologies produce very short reads, typically of a few tens of base pairs, and to use these reads effectively requires new algorithms and software. In particular, there is a major issue in efficiently aligning short reads to a reference genome and handling ambiguity or lack of accuracy in this alignment. Here we introduce the concept of mapping quality, a measure of the confidence that a read actually comes from the position it is aligned to by the mapping algorithm. We describe the software MAQ that can build assemblies by mapping shotgun short reads to a reference genome, using quality scores to derive genotype calls of the consensus sequence of a diploid genome, e.g., from a human sample. MAQ makes full use of mate-pair information and estimates the error probability of each read alignment. Error probabilities are also derived for the final genotype calls, using a Bayesian statistical model that incorporates the mapping qualities, error probabilities from the raw sequence quality scores, sampling of the two haplotypes, and an empirical model for correlated errors at a site. Both read mapping and genotype calling are evaluated on simulated data and real data. MAQ is accurate, efficient, versatile, and user-friendly. It is freely available at http://maq.sourceforge.net.

2,927 citations

Journal ArticleDOI
TL;DR: It is found that the Illumina sequencing data are highly replicable, with relatively little technical variation, and thus, for many purposes, it may suffice to sequence each mRNA sample only once (i.e., using one lane).
Abstract: Ultra-high-throughput sequencing is emerging as an attractive alternative to microarrays for genotyping, analysis of methylation patterns, and identification of transcription factor binding sites. Here, we describe an application of the Illumina sequencing (formerly Solexa sequencing) platform to study mRNA expression levels. Our goals were to estimate technical variance associated with Illumina sequencing in this context and to compare its ability to identify differentially expressed genes with existing array technologies. To do so, we estimated gene expression differences between liver and kidney RNA samples using multiple sequencing replicates, and compared the sequencing data to results obtained from Affymetrix arrays using the same RNA samples. We find that the Illumina sequencing data are highly replicable, with relatively little technical variation, and thus, for many purposes, it may suffice to sequence each mRNA sample only once (i.e., using one lane). The information in a single lane of Illumina sequencing data appears comparable to that in a single array in enabling identification of differentially expressed genes, while allowing for additional analyses such as detection of low-expressed genes, alternative splice variants, and novel transcripts. Based on our observations, we propose an empirical protocol and a statistical framework for the analysis of gene expression using ultra-high-throughput sequencing technology.

2,834 citations

Journal ArticleDOI
TL;DR: The program SOAP is designed to handle the huge amounts of short reads generated by parallel sequencing using the new generation Illumina-Solexa sequencing technology, which supports multi-threaded parallel computing and has a batch module for multiple query sets.
Abstract: Summary: We have developed a program SOAP for efficient gapped and ungapped alignment of short oligonucleotides onto reference sequences. The program is designed to handle the huge amounts of short reads generated by parallel sequencing using the new generation Illumina-Solexa sequencing technology. SOAP is compatible with numerous applications, including single-read or pair-end resequencing, small RNA discovery and mRNA tag sequence mapping. SOAP is a command-driven program, which supports multi-threaded parallel computing, and has a batch module for multiple query sets. Availability: http://soap.genomics.org.cn Contact: soap@genomics.org.cn

2,729 citations


"RNA-Seq: a revolutionary tool for t..." refers methods in this paper

  • ...There are several programs for mapping reads to the genome, including ELAND, SOA...

    [...]