scispace - formally typeset
Search or ask a question
Journal ArticleDOI

The Transcriptional Landscape of the Yeast Genome Defined by RNA Sequencing

06 Jun 2008-Science (American Association for the Advancement of Science)-Vol. 320, Iss: 5881, pp 1344-1349
TL;DR: A quantitative sequencing-based method is developed for mapping transcribed regions, in which complementary DNA fragments are subjected to high-throughput sequencing and mapped to the genome, and it is demonstrated that most (74.5%) of the nonrepetitive sequence of the yeast genome is transcribed.
Abstract: The identification of untranslated regions, introns, and coding regions within an organism remains challenging. We developed a quantitative sequencing-based method called RNA-Seq for mapping transcribed regions, in which complementary DNA fragments are subjected to high-throughput sequencing and mapped to the genome. We applied RNA-Seq to generate a high-resolution transcriptome map of the yeast genome and demonstrated that most (74.5%) of the nonrepetitive sequence of the yeast genome is transcribed. We confirmed many known and predicted introns and demonstrated that others are not actively used. Alternative initiation codons and upstream open reading frames also were identified for many yeast genes. We also found unexpected 3'-end heterogeneity and the presence of many overlapping genes. These results indicate that the yeast transcriptome is more complex than previously appreciated.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: Bowtie extends previous Burrows-Wheeler techniques with a novel quality-aware backtracking algorithm that permits mismatches and can be used simultaneously to achieve even greater alignment speeds.
Abstract: Bowtie is an ultrafast, memory-efficient alignment program for aligning short DNA sequence reads to large genomes. For the human genome, Burrows-Wheeler indexing allows Bowtie to align more than 25 million reads per CPU hour with a memory footprint of approximately 1.3 gigabytes. Bowtie extends previous Burrows-Wheeler techniques with a novel quality-aware backtracking algorithm that permits mismatches. Multiple processor cores can be used simultaneously to achieve even greater alignment speeds. Bowtie is open source http://bowtie.cbcb.umd.edu.

20,335 citations


Cites methods from "The Transcriptional Landscape of th..."

  • ...source tool that has been reported and used in short-read projects [6,28]....

    [...]

28 Jul 2005
TL;DR: PfPMP1)与感染红细胞、树突状组胞以及胎盘的单个或多个受体作用,在黏附及免疫逃避中起关键的作�ly.
Abstract: 抗原变异可使得多种致病微生物易于逃避宿主免疫应答。表达在感染红细胞表面的恶性疟原虫红细胞表面蛋白1(PfPMP1)与感染红细胞、内皮细胞、树突状细胞以及胎盘的单个或多个受体作用,在黏附及免疫逃避中起关键的作用。每个单倍体基因组var基因家族编码约60种成员,通过启动转录不同的var基因变异体为抗原变异提供了分子基础。

18,940 citations

Journal ArticleDOI
TL;DR: It is shown that accurate gene-level abundance estimates are best obtained with large numbers of short single-end reads, and estimates of the relative frequencies of isoforms within single genes may be improved through the use of paired- end reads, depending on the number of possible splice forms for each gene.
Abstract: RNA-Seq is revolutionizing the way transcript abundances are measured. A key challenge in transcript quantification from RNA-Seq data is the handling of reads that map to multiple genes or isoforms. This issue is particularly important for quantification with de novo transcriptome assemblies in the absence of sequenced genomes, as it is difficult to determine which transcripts are isoforms of the same gene. A second significant issue is the design of RNA-Seq experiments, in terms of the number of reads, read length, and whether reads come from one or both ends of cDNA fragments. We present RSEM, an user-friendly software package for quantifying gene and isoform abundances from single-end or paired-end RNA-Seq data. RSEM outputs abundance estimates, 95% credibility intervals, and visualization files and can also simulate RNA-Seq data. In contrast to other existing tools, the software does not require a reference genome. Thus, in combination with a de novo transcriptome assembler, RSEM enables accurate transcript quantification for species without sequenced genomes. On simulated and real data sets, RSEM has superior or comparable performance to quantification methods that rely on a reference genome. Taking advantage of RSEM's ability to effectively use ambiguously-mapping reads, we show that accurate gene-level abundance estimates are best obtained with large numbers of short single-end reads. On the other hand, estimates of the relative frequencies of isoforms within single genes may be improved through the use of paired-end reads, depending on the number of possible splice forms for each gene. RSEM is an accurate and user-friendly software tool for quantifying transcript abundances from RNA-Seq data. As it does not rely on the existence of a reference genome, it is particularly useful for quantification with de novo transcriptome assemblies. In addition, RSEM has enabled valuable guidance for cost-efficient design of quantification experiments with RNA-Seq, which is currently relatively expensive.

14,524 citations


Cites methods from "The Transcriptional Landscape of th..."

  • ...A simple quantification method that was used in some initial RNA-Seq papers [13,14] and that is still used today is to count the number of reads that map uniquely to each gene, possibly correcting a gene’s count by the “mappability” of its sequence [15] and its length....

    [...]

Journal ArticleDOI
TL;DR: A method based on the negative binomial distribution, with variance and mean linked by local regression, is proposed and an implementation, DESeq, as an R/Bioconductor package is presented.
Abstract: High-throughput sequencing assays such as RNA-Seq, ChIP-Seq or barcode counting provide quantitative readouts in the form of count data. To infer differential signal in such data correctly and with good statistical power, estimation of data variability throughout the dynamic range and a suitable error model are required. We propose a method based on the negative binomial distribution, with variance and mean linked by local regression and present an implementation, DESeq, as an R/Bioconductor package.

13,356 citations


Cites background from "The Transcriptional Landscape of th..."

  • ...[1] performed RNA-Seq on replicates of Saccharomyces cerevisiae cultures....

    [...]

  • ...This is confirmed by comparison of technical with biological replicates [1]....

    [...]

  • ...A common feature between these assays is that they sequence large amounts of DNA fragments that reflect, for example, a biological system’s repertoire of RNA molecules (RNASeq [1,2]) or the DNA or RNA interaction regions of nucleotide binding molecules (ChIP-Seq [3], HITS-CLIP [4])....

    [...]

  • ...However, it has been noted [1,8] that the assumption of Poisson distribution is too restrictive: it predicts smaller variations than what is seen in the data....

    [...]

Journal ArticleDOI
TL;DR: The results suggest that Cufflinks can illuminate the substantial regulatory flexibility and complexity in even this well-studied model of muscle development and that it can improve transcriptome-based genome annotation.
Abstract: High-throughput mRNA sequencing (RNA-Seq) promises simultaneous transcript discovery and abundance estimation. However, this would require algorithms that are not restricted by prior gene annotations and that account for alternative transcription and splicing. Here we introduce such algorithms in an open-source software program called Cufflinks. To test Cufflinks, we sequenced and analyzed >430 million paired 75-bp RNA-Seq reads from a mouse myoblast cell line over a differentiation time series. We detected 13,692 known transcripts and 3,724 previously unannotated ones, 62% of which are supported by independent expression data or by homologous genes in other species. Over the time series, 330 genes showed complete switches in the dominant transcription start site (TSS) or splice isoform, and we observed more subtle shifts in 1,304 other genes. These results suggest that Cufflinks can illuminate the substantial regulatory flexibility and complexity in even this well-studied model of muscle development and that it can improve transcriptome-based genome annotation.

13,337 citations


Additional excerpts

  • ...1621" language="eng" relation="no"> High-throughput mRNA sequencing (RNA-Seq) promises simultaneous transcript discovery and abundance estimatio...

    [...]

References
More filters
28 Jul 2005
TL;DR: PfPMP1)与感染红细胞、树突状组胞以及胎盘的单个或多个受体作用,在黏附及免疫逃避中起关键的作�ly.
Abstract: 抗原变异可使得多种致病微生物易于逃避宿主免疫应答。表达在感染红细胞表面的恶性疟原虫红细胞表面蛋白1(PfPMP1)与感染红细胞、内皮细胞、树突状细胞以及胎盘的单个或多个受体作用,在黏附及免疫逃避中起关键的作用。每个单倍体基因组var基因家族编码约60种成员,通过启动转录不同的var基因变异体为抗原变异提供了分子基础。

18,940 citations

PatentDOI
13 May 2002-Science
TL;DR: In this paper, the authors proposed a method for using proteome chips to systematically assay all protein interactions in a species in a high-throughput manner, and also related to methods for making protein arrays by attaching double-tagged fusion proteins to a solid support.
Abstract: The present invention relates to proteome chips comprising arrays having a large proportion of all proteins expressed in a single species. The invention also relates to methods for making proteome chips. The invention also relates to methods for using proteome chips to systematically assay all protein interactions in a species in a high-throughput manner. The present invention also relates to methods for making and purifying eukaryotic proteins in a high-density array format. The invention also relates to methods for making protein arrays by attaching double-tagged fusion proteins to a solid support. The invention also relates to a method for identifying whether a signal is positive.

1,967 citations

PatentDOI
27 May 2003-Cell
TL;DR: The results reveal an unanticipated level of regulation which is superimposed on that due to gene-specific transcription factors, a novel mechanism for coordinate regulation of specific sets of genes when cells encounter limiting nutrients, and evidence that the ultimate targets of signal transduction pathways can be identified within the initiation apparatus.

1,963 citations

Journal ArticleDOI
15 May 2003-Nature
TL;DR: A comparative analysis of the yeast Saccharomyces cerevisiae based on high-quality draft sequences of three related species, which inferred a putative function for most of these motifs, and provided insights into their combinatorial interactions.
Abstract: Identifying the functional elements encoded in a genome is one of the principal challenges in modern biology. Comparative genomics should offer a powerful, general approach. Here, we present a comparative analysis of the yeast Saccharomyces cerevisiae based on high-quality draft sequences of three related species (S. paradoxus, S. mikatae and S. bayanus). We first aligned the genomes and characterized their evolution, defining the regions and mechanisms of change. We then developed methods for direct identification of genes and regulatory motifs. The gene analysis yielded a major revision to the yeast gene catalogue, affecting approximately 15% of all genes and reducing the total count by about 500 genes. The motif analysis automatically identified 72 genome-wide elements, including most known regulatory motifs and numerous new motifs. We inferred a putative function for most of these motifs, and provided insights into their combinatorial interactions. The results have implications for genome analysis of diverse organisms, including the human.

1,837 citations

Journal ArticleDOI
TL;DR: M mammalian cells use the same strategy to downregulate protein synthesis while inducing transcriptional activators of stress-response genes under various stressful conditions, including amino acid starvation.
Abstract: Cells reprogram gene expression in response to environmental changes by mobilizing transcriptional activators. The activator protein Gcn4 of the yeast Saccharomyces cerevisiae is regulated by an intricate translational control mechanism, which is the primary focus of this review, and also by the modulation of its stability in response to nutrient availability. Translation of GCN4 mRNA is derepressed in amino acid-deprived cells, leading to transcriptional induction of nearly all genes encoding amino acid biosynthetic enzymes. The trans-acting proteins that control GCN4 translation have general functions in the initiation of protein synthesis, or regulate the activities of initiation factors, so that the molecular events that induce GCN4 translation also reduce the rate of general protein synthesis. This dual regulatory response enables cells to limit their consumption of amino acids while diverting resources into amino acid biosynthesis in nutrient-poor environments. Remarkably, mammalian cells use the same strategy to downregulate protein synthesis while inducing transcriptional activators of stress-response genes under various stressful conditions, including amino acid starvation.

1,200 citations