Transcriptomics in the RNA-seq era

doi:10.1016/J.CBPA.2012.12.008

Home
/
Papers
/
Transcriptomics in the RNA-seq era

Journal Article•DOI•

Transcriptomics in the RNA-seq era

Paul A. McGettigan¹•Institutions (1)

University College Dublin¹

01 Feb 2013-Current Opinion in Chemical Biology (Elsevier Current Trends)-Vol. 17, Iss: 1, pp 4-11

TL;DR: The emerging picture of mammalian transcription is complex with further refinement expected with the integration of epigenomic data generated by projects such as ENCODE.

read less

About: This article is published in Current Opinion in Chemical Biology.The article was published on 2013-02-01. It has received 301 citations till now. The article focuses on the topics: RNA-Seq & DNA microarray.

...read moreread less

Citations

PDF

Open Access

More filters

Integrative Genomics Viewer

[...]

James T. Robinson¹, Helga Thorvaldsdottir¹, Wendy Winckler¹, Mitchell Guttman¹, Eric S. Lander¹, Eric S. Lander², Gad Getz¹, Jill P. Mesirov¹ - Show less +4 more•Institutions (2)

Massachusetts Institute of Technology¹, Harvard University²

01 Jan 2011

TL;DR: The sheer volume and scope of data posed by this flood of data pose a significant challenge to the development of efficient and intuitive visualization tools able to scale to very large data sets and to flexibly integrate multiple data types, including clinical data.

...read moreread less

Abstract: Rapid improvements in sequencing and array-based platforms are resulting in a flood of diverse genome-wide data, including data from exome and whole-genome sequencing, epigenetic surveys, expression profiling of coding and noncoding RNAs, single nucleotide polymorphism (SNP) and copy number profiling, and functional assays. Analysis of these large, diverse data sets holds the promise of a more comprehensive understanding of the genome and its relation to human disease. Experienced and knowledgeable human review is an essential component of this process, complementing computational approaches. This calls for efficient and intuitive visualization tools able to scale to very large data sets and to flexibly integrate multiple data types, including clinical data. However, the sheer volume and scope of data pose a significant challenge to the development of such tools.

...read moreread less

2,187 citations

Journal Article•DOI•

Transcriptomics technologies

[...]

Rohan G. T. Lowe¹, Neil J. Shirley², Mark R. Bleackley¹, Stephen K. Dolan³, Thomas Shafee¹ - Show less +1 more•Institutions (3)

La Trobe University¹, University of Adelaide², University of Cambridge³

18 May 2017-PLOS Computational Biology

TL;DR: The first attempts to study the whole transcriptome began in the early 1990s, and technological advances since the late 1990s have made transcriptomics a widespread discipline as mentioned in this paper, which has enabled the study of how gene expression changes in different organisms and has been instrumental in the understanding of human disease.

...read moreread less

Abstract: Transcriptomics technologies are the techniques used to study an organism’s transcriptome, the sum of all of its RNA transcripts. The information content of an organism is recorded in the DNA of its genome and expressed through transcription. Here, mRNA serves as a transient intermediary molecule in the information network, whilst noncoding RNAs perform additional diverse functions. A transcriptome captures a snapshot in time of the total transcripts present in a cell. The first attempts to study the whole transcriptome began in the early 1990s, and technological advances since the late 1990s have made transcriptomics a widespread discipline. Transcriptomics has been defined by repeated technological innovations that transform the field. There are two key contemporary techniques in the field: microarrays, which quantify a set of predetermined sequences, and RNA sequencing (RNA-Seq), which uses high-throughput sequencing to capture all sequences. Measuring the expression of an organism’s genes in different tissues, conditions, or time points gives information on how genes are regulated and reveals details of an organism’s biology. It can also help to infer the functions of previously unannotated genes. Transcriptomic analysis has enabled the study of how gene expression changes in different organisms and has been instrumental in the understanding of human disease. An analysis of gene expression in its entirety allows detection of broad coordinated trends which cannot be discerned by more targeted assays.

...read moreread less

525 citations

Journal Article•DOI•

A Comparative Study of Techniques for Differential Expression Analysis on RNA-Seq Data

[...]

Zong Hong Zhang¹, Dhanisha J. Jhaveri¹, Vikki M. Marshall¹, Denis C. Bauer¹, Janette Edson¹, Ramesh K. Narayanan¹, Gregory J. Robinson¹, Andreas E. Lundberg², Perry F. Bartlett¹, Naomi R. Wray¹, Qiongyi Zhao¹ - Show less +7 more•Institutions (2)

University of Queensland¹, Swedish University of Agricultural Sciences²

13 Aug 2014-PLOS ONE

TL;DR: It is observed that edgeR performs slightly better than DESeq and Cuffdiff2 in terms of the ability to uncover true positives, and DESeq or taking the intersection of DEGs from two or more tools is recommended if the number of false positives is a major concern in the study.

...read moreread less

Abstract: Recent advances in next-generation sequencing technology allow high-throughput cDNA sequencing (RNA-Seq) to be widely applied in transcriptomic studies, in particular for detecting differentially expressed genes between groups. Many software packages have been developed for the identification of differentially expressed genes (DEGs) between treatment groups based on RNA-Seq data. However, there is a lack of consensus on how to approach an optimal study design and choice of suitable software for the analysis. In this comparative study we evaluate the performance of three of the most frequently used software tools: Cufflinks-Cuffdiff2, DESeq and edgeR. A number of important parameters of RNA-Seq technology were taken into consideration, including the number of replicates, sequencing depth, and balanced vs. unbalanced sequencing depth within and between groups. We benchmarked results relative to sets of DEGs identified through either quantitative RT-PCR or microarray. We observed that edgeR performs slightly better than DESeq and Cuffdiff2 in terms of the ability to uncover true positives. Overall, DESeq or taking the intersection of DEGs from two or more tools is recommended if the number of false positives is a major concern in the study. In other circumstances, edgeR is slightly preferable for differential expression analysis at the expense of potentially introducing more false positives.

...read moreread less

297 citations

Cites methods from "Transcriptomics in the RNA-seq era"

...Taking advantage of these tools, the power of the RNA-Seq approach to detect DEGs has been recently demonstrated [24-27]....
[...]

Journal Article•DOI•

Global transcriptome analyses of human and murine terminal erythroid differentiation

[...]

Xiuli An¹, Xiuli An², Vincent P. Schulz³, Jie Li¹, Kunlu Wu¹, Kunlu Wu⁴, Jing Liu⁴, Jing Liu¹, Fumin Xue, Jingping Hu, Narla Mohandas¹, Patrick G. Gallagher³ - Show less +8 more•Institutions (4)

New York Blood Center¹, Zhengzhou University², Yale University³, Central South University⁴

29 May 2014-Blood

TL;DR: Clustering and network analyses revealed that varying stage-specific patterns of expression observed across differentiation were enriched for genes of differing function, and tight clustering of transcriptomes from differing stages, even between biologically different replicates, validated the utility of the FACS-based assays.

...read moreread less

275 citations

Cites background or methods from "Transcriptomics in the RNA-seq era"

...Beyond providing unbiased detection of transcripts, it provides information on transcript composition and abundance, including detection of novel transcripts, isoforms, alternative splice sites, allele-specific expression, and rare transcripts.(11-13) RNA-seq has a low background signal and a large dynamic range, with high levels of reproducibility for both technical and biological replicates....
[...]
...RNA sequencing (RNA-seq) allows unbiased detection and quantification of transcriptomes using high-throughput sequencing.(10,11) Beyond providing unbiased detection of transcripts, it provides information on transcript composition and abundance, including detection of novel transcripts, isoforms, alternative splice sites, allele-specific expression, and rare transcripts....
[...]

Journal Article•DOI•

New functionalities in the TCGAbiolinks package for the study and integration of cancer data from GDC and GTEX

[...]

Mohamed Mounir, Marta Lucchetta, Tiago C. Silva¹, Catharina Olsen², Gianluca Bontempi², Xi Chen³, Houtan Noushmehr⁴, Houtan Noushmehr¹, Antonio Colaprico², Antonio Colaprico³, Elena Papaleo⁵ - Show less +7 more•Institutions (5)

University of São Paulo¹, Université libre de Bruxelles², University of Miami³, Henry Ford Hospital⁴, University of Copenhagen⁵

05 Mar 2019-PLOS Computational Biology

TL;DR: New features and enhancements of TCGAbiolinks are introduced, including more accurate and flexible pipelines for differential expression analyses, different methods for tumor purity estimation and filtering, and integration of normal samples from other platforms iv) support for other genomics datasets, exemplified by the TARGET data.

...read moreread less

Abstract: The advent of Next-Generation Sequencing (NGS) technologies has opened new perspectives in deciphering the genetic mechanisms underlying complex diseases. Nowadays, the amount of genomic data is massive and substantial efforts and new tools are required to unveil the information hidden in the data. The Genomic Data Commons (GDC) Data Portal is a platform that contains different genomic studies including the ones from The Cancer Genome Atlas (TCGA) and the Therapeutically Applicable Research to Generate Effective Treatments (TARGET) initiatives, accounting for more than 40 tumor types originating from nearly 30000 patients. Such platforms, although very attractive, must make sure the stored data are easily accessible and adequately harmonized. Moreover, they have the primary focus on the data storage in a unique place, and they do not provide a comprehensive toolkit for analyses and interpretation of the data. To fulfill this urgent need, comprehensive but easily accessible computational methods for integrative analyses of genomic data that do not renounce a robust statistical and theoretical framework are required. In this context, the R/Bioconductor package TCGAbiolinks was developed, offering a variety of bioinformatics functionalities. Here we introduce new features and enhancements of TCGAbiolinks in terms of i) more accurate and flexible pipelines for differential expression analyses, ii) different methods for tumor purity estimation and filtering, iii) integration of normal samples from other platforms iv) support for other genomics datasets, exemplified here by the TARGET data. Evidence has shown that accounting for tumor purity is essential in the study of tumorigenesis, as these factors promote confounding behavior regarding differential expression analysis. With this in mind, we implemented these filtering procedures in TCGAbiolinks. Moreover, a limitation of some of the TCGA datasets is the unavailability or paucity of corresponding normal samples. We thus integrated into TCGAbiolinks the possibility to use normal samples from the Genotype-Tissue Expression (GTEx) project, which is another large-scale repository cataloging gene expression from healthy individuals. The new functionalities are available in the TCGAbiolinks version 2.8 and higher released in Bioconductor version 3.7.

...read moreread less

265 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

The Sequence Alignment/Map format and SAMtools

[...]

Heng Li¹, Bob Handsaker², Alec Wysoker², T. J. Fennell², Jue Ruan³, Nils Homer², Gabor T. Marth⁴, Gonçalo R. Abecasis², Richard Durbin¹ - Show less +5 more•Institutions (4)

Wellcome Trust Sanger Institute¹, University of California, Los Angeles², Chinese Academy of Sciences³, Boston College⁴

01 Aug 2009-Bioinformatics

TL;DR: SAMtools as discussed by the authors implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.

...read moreread less

Abstract: Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments. Availability: http://samtools.sourceforge.net Contact: [email protected]

...read moreread less

45,957 citations

"Transcriptomics in the RNA-seq era" refers methods in this paper

...The last two years has seen the standardization of the SAM/BAM format for short-read alignment data [44]....
[...]

Journal Article•DOI•

edgeR: a Bioconductor package for differential expression analysis of digital gene expression data.

[...]

Mark D. Robinson¹, Davis J. McCarthy¹, Gordon K. Smyth¹•Institutions (1)

Walter and Eliza Hall Institute of Medical Research¹

01 Jan 2010-Bioinformatics

TL;DR: EdgeR as mentioned in this paper is a Bioconductor software package for examining differential expression of replicated count data, which uses an overdispersed Poisson model to account for both biological and technical variability and empirical Bayes methods are used to moderate the degree of overdispersion across transcripts, improving the reliability of inference.

...read moreread less

Abstract: Summary: It is expected that emerging digital gene expression (DGE) technologies will overtake microarray technologies in the near future for many functional genomics applications. One of the fundamental data analysis tasks, especially for gene expression studies, involves determining whether there is evidence that counts for a transcript or exon are significantly different across experimental conditions. edgeR is a Bioconductor software package for examining differential expression of replicated count data. An overdispersed Poisson model is used to account for both biological and technical variability. Empirical Bayes methods are used to moderate the degree of overdispersion across transcripts, improving the reliability of inference. The methodology can be used even with the most minimal levels of replication, provided at least one phenotype or experimental condition is replicated. The software may have other applications beyond sequencing data, such as proteome peptide count data. Availability: The package is freely available under the LGPL licence from the Bioconductor web site (http://bioconductor.org).

...read moreread less

29,413 citations

"Transcriptomics in the RNA-seq era" refers methods in this paper

...The most commonly used methods have been parametric methods utilizing variants of the negative binomial distribution such as edgeR [36], HTseq [37], bayseq [38], and NBPseq [39]....
[...]

Journal Article•DOI•

Ultrafast and memory-efficient alignment of short DNA sequences to the human genome

[...]

Ben Langmead¹, Cole Trapnell¹, Mihai Pop¹, Steven L. Salzberg¹•Institutions (1)

University of Maryland, College Park¹

04 Mar 2009-Genome Biology

TL;DR: Bowtie extends previous Burrows-Wheeler techniques with a novel quality-aware backtracking algorithm that permits mismatches and can be used simultaneously to achieve even greater alignment speeds.

...read moreread less

Abstract: Bowtie is an ultrafast, memory-efficient alignment program for aligning short DNA sequence reads to large genomes. For the human genome, Burrows-Wheeler indexing allows Bowtie to align more than 25 million reads per CPU hour with a memory footprint of approximately 1.3 gigabytes. Bowtie extends previous Burrows-Wheeler techniques with a novel quality-aware backtracking algorithm that permits mismatches. Multiple processor cores can be used simultaneously to achieve even greater alignment speeds. Bowtie is open source http://bowtie.cbcb.umd.edu.

...read moreread less

20,335 citations

"Transcriptomics in the RNA-seq era" refers methods in this paper

...This has led to the displacement of the original cohort of aligners by tools based on the Burrows Wheeler Transform such as Bowtie [24] and SOAP [25]....
[...]

Journal Article•DOI•

Full-length transcriptome assembly from RNA-Seq data without a reference genome.

[...]

Manfred Grabherr¹, Brian J. Haas¹, Moran Yassour¹, Moran Yassour², Joshua Z. Levin¹, Dawn Thompson¹, Ido Amit¹, Xian Adiconis¹, Lin Fan¹, Raktima Raychowdhury¹, Qiandong Zeng¹, Zehua Chen¹, Evan Mauceli¹, Nir Hacohen¹, Andreas Gnirke¹, Nicholas Rhind³, Federica Di Palma¹, Bruce W. Birren¹, Chad Nusbaum¹, Kerstin Lindblad-Toh⁴, Kerstin Lindblad-Toh¹, Nir Friedman², Aviv Regev¹ - Show less +19 more•Institutions (4)

Massachusetts Institute of Technology¹, Hebrew University of Jerusalem², University of Massachusetts Medical School³, Science for Life Laboratory⁴

01 Jul 2011-Nature Biotechnology

TL;DR: The Trinity method for de novo assembly of full-length transcripts and evaluate it on samples from fission yeast, mouse and whitefly, whose reference genome is not yet available, providing a unified solution for transcriptome reconstruction in any sample.

...read moreread less

Abstract: Massively parallel sequencing of cDNA has enabled deep and efficient probing of transcriptomes. Current approaches for transcript reconstruction from such data often rely on aligning reads to a reference genome, and are thus unsuitable for samples with a partial or missing reference genome. Here we present the Trinity method for de novo assembly of full-length transcripts and evaluate it on samples from fission yeast, mouse and whitefly, whose reference genome is not yet available. By efficiently constructing and analyzing sets of de Bruijn graphs, Trinity fully reconstructs a large fraction of transcripts, including alternatively spliced isoforms and transcripts from recently duplicated genes. Compared with other de novo transcriptome assemblers, Trinity recovers more full-length transcripts across a broad range of expression levels, with a sensitivity similar to methods that rely on genome alignments. Our approach provides a unified solution for transcriptome reconstruction in any sample, especially in the absence of a reference genome.

...read moreread less

15,665 citations

"Transcriptomics in the RNA-seq era" refers methods in this paper

...Popular assembly methods include Oases [58], Trinity [59], and trans-ABySS [60]....
[...]

Journal Article•DOI•

RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome

[...]

Bo Li¹, Colin N. Dewey¹•Institutions (1)

University of Wisconsin-Madison¹

04 Aug 2011-BMC Bioinformatics

TL;DR: It is shown that accurate gene-level abundance estimates are best obtained with large numbers of short single-end reads, and estimates of the relative frequencies of isoforms within single genes may be improved through the use of paired- end reads, depending on the number of possible splice forms for each gene.

...read moreread less

Abstract: RNA-Seq is revolutionizing the way transcript abundances are measured. A key challenge in transcript quantification from RNA-Seq data is the handling of reads that map to multiple genes or isoforms. This issue is particularly important for quantification with de novo transcriptome assemblies in the absence of sequenced genomes, as it is difficult to determine which transcripts are isoforms of the same gene. A second significant issue is the design of RNA-Seq experiments, in terms of the number of reads, read length, and whether reads come from one or both ends of cDNA fragments. We present RSEM, an user-friendly software package for quantifying gene and isoform abundances from single-end or paired-end RNA-Seq data. RSEM outputs abundance estimates, 95% credibility intervals, and visualization files and can also simulate RNA-Seq data. In contrast to other existing tools, the software does not require a reference genome. Thus, in combination with a de novo transcriptome assembler, RSEM enables accurate transcript quantification for species without sequenced genomes. On simulated and real data sets, RSEM has superior or comparable performance to quantification methods that rely on a reference genome. Taking advantage of RSEM's ability to effectively use ambiguously-mapping reads, we show that accurate gene-level abundance estimates are best obtained with large numbers of short single-end reads. On the other hand, estimates of the relative frequencies of isoforms within single genes may be improved through the use of paired-end reads, depending on the number of possible splice forms for each gene. RSEM is an accurate and user-friendly software tool for quantifying transcript abundances from RNA-Seq data. As it does not rely on the existence of a reference genome, it is particularly useful for quantification with de novo transcriptome assemblies. In addition, RSEM has enabled valuable guidance for cost-efficient design of quantification experiments with RNA-Seq, which is currently relatively expensive.

...read moreread less

14,524 citations

"Transcriptomics in the RNA-seq era" refers methods in this paper

...Nonparametric methods such as NOISeq [40] and Samseq [41] and expectation–maximization methods such as RSEM [42] have also been applied to this problem....
[...]
...Li B, Dewey CN: RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome....
[...]