RNA-seq: An assessment of technical reproducibility and comparison with gene expression arrays
Reads0
Chats0
TLDR
It is found that the Illumina sequencing data are highly replicable, with relatively little technical variation, and thus, for many purposes, it may suffice to sequence each mRNA sample only once (i.e., using one lane).Abstract:
Ultra-high-throughput sequencing is emerging as an attractive alternative to microarrays for genotyping, analysis of methylation patterns, and identification of transcription factor binding sites. Here, we describe an application of the Illumina sequencing (formerly Solexa sequencing) platform to study mRNA expression levels. Our goals were to estimate technical variance associated with Illumina sequencing in this context and to compare its ability to identify differentially expressed genes with existing array technologies. To do so, we estimated gene expression differences between liver and kidney RNA samples using multiple sequencing replicates, and compared the sequencing data to results obtained from Affymetrix arrays using the same RNA samples. We find that the Illumina sequencing data are highly replicable, with relatively little technical variation, and thus, for many purposes, it may suffice to sequence each mRNA sample only once (i.e., using one lane). The information in a single lane of Illumina sequencing data appears comparable to that in a single array in enabling identification of differentially expressed genes, while allowing for additional analyses such as detection of low-expressed genes, alternative splice variants, and novel transcripts. Based on our observations, we propose an empirical protocol and a statistical framework for the analysis of gene expression using ultra-high-throughput sequencing technology.read more
Citations
More filters
Journal ArticleDOI
edgeR: a Bioconductor package for differential expression analysis of digital gene expression data.
TL;DR: EdgeR as mentioned in this paper is a Bioconductor software package for examining differential expression of replicated count data, which uses an overdispersed Poisson model to account for both biological and technical variability and empirical Bayes methods are used to moderate the degree of overdispersion across transcripts, improving the reliability of inference.
Journal ArticleDOI
Ultrafast and memory-efficient alignment of short DNA sequences to the human genome
TL;DR: Bowtie extends previous Burrows-Wheeler techniques with a novel quality-aware backtracking algorithm that permits mismatches and can be used simultaneously to achieve even greater alignment speeds.
Journal ArticleDOI
RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome
Bo Li,Colin N. Dewey +1 more
TL;DR: It is shown that accurate gene-level abundance estimates are best obtained with large numbers of short single-end reads, and estimates of the relative frequencies of isoforms within single genes may be improved through the use of paired- end reads, depending on the number of possible splice forms for each gene.
Journal ArticleDOI
Differential expression analysis for sequence count data.
Simon Anders,Wolfgang Huber +1 more
TL;DR: A method based on the negative binomial distribution, with variance and mean linked by local regression, is proposed and an implementation, DESeq, as an R/Bioconductor package is presented.
Journal ArticleDOI
Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation
Cole Trapnell,Cole Trapnell,Brian A. Williams,Geo Pertea,Ali Mortazavi,Gordon Kwan,Marijke J. van Baren,Steven L. Salzberg,Barbara J. Wold,Lior Pachter +9 more
TL;DR: The results suggest that Cufflinks can illuminate the substantial regulatory flexibility and complexity in even this well-studied model of muscle development and that it can improve transcriptome-based genome annotation.
References
More filters
Book
Generalized Linear Models
Peter McCullagh,John A. Nelder +1 more
TL;DR: In this paper, a generalization of the analysis of variance is given for these models using log- likelihoods, illustrated by examples relating to four distributions; the Normal, Binomial (probit analysis, etc.), Poisson (contingency tables), and gamma (variance components).
BookDOI
Modern Applied Statistics with S
W. N. Venables,Brian D. Ripley +1 more
TL;DR: A guide to using S environments to perform statistical analyses providing both an introduction to the use of S and a course in modern statistical methods.
Journal ArticleDOI
Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments
TL;DR: The hierarchical model of Lonnstedt and Speed (2002) is developed into a practical approach for general microarray experiments with arbitrary numbers of treatments and RNA samples and the moderated t-statistic is shown to follow a t-distribution with augmented degrees of freedom.
Journal ArticleDOI
Generalized Linear Models
TL;DR: This is the rst book on generalized linear models written by authors not mostly associated with the biological sciences, and it is thoroughly enjoyable to read.