scispace - formally typeset
Journal ArticleDOI

limma powers differential expression analyses for RNA-sequencing and microarray studies

20 Apr 2015-Nucleic Acids Research (Oxford University Press)-Vol. 43, Iss: 7
TL;DR: The philosophy and design of the limma package is reviewed, summarizing both new and historical features, with an emphasis on recent enhancements and features that have not been previously described.

...read more

Abstract: limma is an R/Bioconductor software package that provides an integrated solution for analysing data from gene expression experiments. It contains rich features for handling complex experimental designs and for information borrowing to overcome the problem of small sample sizes. Over the past decade, limma has been a popular choice for gene discovery through differential expression analyses of microarray and high-throughput PCR data. The package contains particularly strong facilities for reading, normalizing and exploring such data. Recently, the capabilities of limma have been significantly expanded in two important directions. First, the package can now perform both differential expression and differential splicing analyses of RNA sequencing (RNA-seq) data. All the downstream analysis tools previously restricted to microarray data are now available for RNA-seq as well. These capabilities allow users to analyse both RNA-seq and microarray data with very similar pipelines. Second, the package is now able to go past the traditional gene-wise expression analyses in a variety of ways, analysing expression profiles in terms of co-regulated sets of genes or in terms of higher-order expression signatures. This provides enhanced possibilities for biological interpretation of gene expression differences. This article reviews the philosophy and design of the limma package, summarizing both new and historical features, with an emphasis on recent enhancements and features that have not been previously described.

...read more

Topics: Microarray databases (61%), Bioconductor (51%)
Citations
More filters

Journal ArticleDOI
TL;DR: An analytical strategy for integrating scRNA-seq data sets based on common sources of variation is introduced, enabling the identification of shared populations across data sets and downstream comparative analysis.

...read more

Abstract: Computational single-cell RNA-seq (scRNA-seq) methods have been successfully applied to experiments representing a single condition, technology, or species to discover and define cellular phenotypes. However, identifying subpopulations of cells that are present across multiple data sets remains challenging. Here, we introduce an analytical strategy for integrating scRNA-seq data sets based on common sources of variation, enabling the identification of shared populations across data sets and downstream comparative analysis. We apply this approach, implemented in our R toolkit Seurat (http://satijalab.org/seurat/), to align scRNA-seq data sets of peripheral blood mononuclear cells under resting and stimulated conditions, hematopoietic progenitors sequenced using two profiling technologies, and pancreatic cell 'atlases' generated from human and mouse islets. In each case, we learn distinct or transitional cell states jointly across data sets, while boosting statistical power through integrated analysis. Our approach facilitates general comparisons of scRNA-seq data sets, potentially deepening our understanding of how distinct cell states respond to perturbation, disease, and evolution.

...read more

4,666 citations


Journal ArticleDOI
07 May 2016-The Lancet
TL;DR: Treatment with atezolizumab resulted in a significantly improved RECIST v1.1 response rate, compared with a historical control overall response rate of 10%, and Exploratory analyses showed The Cancer Genome Atlas (TCGA) subtypes and mutation load to be independently predictive for response to atezolediazepine.

...read more

Abstract: Summary Background Patients with metastatic urothelial carcinoma have few treatment options after failure of platinum-based chemotherapy. In this trial, we assessed treatment with atezolizumab, an engineered humanised immunoglobulin G1 monoclonal antibody that binds selectively to programmed death ligand 1 (PD-L1), in this patient population. Methods For this multicentre, single-arm, two-cohort, phase 2 trial, patients (aged ≥18 years) with inoperable locally advanced or metastatic urothelial carcinoma whose disease had progressed after previous platinum-based chemotherapy were enrolled from 70 major academic medical centres and community oncology practices in Europe and North America. Key inclusion criteria for enrolment were Eastern Cooperative Oncology Group performance status of 0 or 1, measurable disease defined by Response Evaluation Criteria In Solid Tumors version 1.1 (RECIST v1.1), adequate haematological and end-organ function, and no autoimmune disease or active infections. Formalin-fixed paraffin-embedded tumour specimens with sufficient viable tumour content were needed from all patients before enrolment. Patients received treatment with intravenous atezolizumab (1200 mg, given every 3 weeks). PD-L1 expression on tumour-infiltrating immune cells (ICs) was assessed prospectively by immunohistochemistry. The co-primary endpoints were the independent review facility-assessed objective response rate according to RECIST v1.1 and the investigator-assessed objective response rate according to immune-modified RECIST, analysed by intention to treat. A hierarchical testing procedure was used to assess whether the objective response rate was significantly higher than the historical control rate of 10% at an α level of 0·05. This study is registered with ClinicalTrials.gov, number NCT02108652. Findings Between May 13, 2014, and Nov 19, 2014, 486 patients were screened and 315 patients were enrolled into the study. Of these patients, 310 received atezolizumab treatment (five enrolled patients later did not meet eligibility criteria and were not dosed with study drug). The PD-L1 expression status on infiltrating immune cells (ICs) in the tumour microenvironment was defined by the percentage of PD-L1-positive immune cells: IC0 ( Interpretation Atezolizumab showed durable activity and good tolerability in this patient population. Increased levels of PD-L1 expression on immune cells were associated with increased response. This report is the first to show the association of TCGA subtypes with response to immune checkpoint inhibition and to show the importance of mutation load as a biomarker of response to this class of agents in advanced urothelial carcinoma. Funding F Hoffmann-La Roche Ltd.

...read more

2,369 citations


Journal ArticleDOI
Mihaela Pertea1, Daehwan Kim1, Geo Pertea1, Jeffrey T. Leek1  +1 moreInstitutions (1)
01 Sep 2016-Nature Protocols
TL;DR: This protocol describes all the steps necessary to process a large set of raw sequencing reads and create lists of gene transcripts, expression levels, and differentially expressed genes and transcripts.

...read more

Abstract: High-throughput sequencing of mRNA (RNA-seq) has become the standard method for measuring and comparing the levels of gene expression in a wide variety of species and conditions. RNA-seq experiments generate very large, complex data sets that demand fast, accurate and flexible software to reduce the raw read data to comprehensible results. HISAT (hierarchical indexing for spliced alignment of transcripts), StringTie and Ballgown are free, open-source software tools for comprehensive analysis of RNA-seq experiments. Together, they allow scientists to align reads to a genome, assemble transcripts including novel splice variants, compute the abundance of these transcripts in each sample and compare experiments to identify differentially expressed genes and transcripts. This protocol describes all the steps necessary to process a large set of raw sequencing reads and create lists of gene transcripts, expression levels, and differentially expressed genes and transcripts. The protocol's execution time depends on the computing resources, but it typically takes under 45 min of computer time. HISAT, StringTie and Ballgown are available from http://ccb.jhu.edu/software.shtml.

...read more

2,234 citations


Journal ArticleDOI
30 Dec 2015-F1000Research
TL;DR: It is illustrated that while the presence of differential isoform usage can lead to inflated false discovery rates in differential expression analyses on simple count matrices and transcript-level abundance estimates improve the performance in simulated data, the difference is relatively minor in several real data sets.

...read more

Abstract: High-throughput sequencing of cDNA (RNA-seq) is used extensively to characterize the transcriptome of cells. Many transcriptomic studies aim at comparing either abundance levels or the transcriptome composition between given conditions, and as a first step, the sequencing reads must be used as the basis for abundance quantification of transcriptomic features of interest, such as genes or transcripts. Various quantification approaches have been proposed, ranging from simple counting of reads that overlap given genomic regions to more complex estimation of underlying transcript abundances. In this paper, we show that gene-level abundance estimates and statistical inference offer advantages over transcript-level analyses, in terms of performance and interpretability. We also illustrate that the presence of differential isoform usage can lead to inflated false discovery rates in differential gene expression analyses on simple count matrices but that this can be addressed by incorporating offsets derived from transcript-level abundance estimates. We also show that the problem is relatively minor in several real data sets. Finally, we provide an R package ( tximport) to help users integrate transcript-level abundance estimates from common quantification pipelines into count-based statistical inference engines.

...read more

1,587 citations


Journal ArticleDOI
TL;DR: An R/Bioconductor package called TCGAbiolinks is developed to address bioinformatics challenges of the Cancer Genome Atlas data by using a guided workflow to allow users to query, download and perform integrative analyses of TCGA data.

...read more

Abstract: The Cancer Genome Atlas (TCGA) research network has made public a large collection of clinical and molecular phenotypes of more than 10 000 tumor patients across 33 different tumor types. Using this cohort, TCGA has published over 20 marker papers detailing the genomic and epigenomic alterations associated with these tumor types. Although many important discoveries have been made by TCGA's research network, opportunities still exist to implement novel methods, thereby elucidating new biological pathways and diagnostic markers. However, mining the TCGA data presents several bioinformatics challenges, such as data retrieval and integration with clinical data and other molecular data types (e.g. RNA and DNA methylation). We developed an R/Bioconductor package called TCGAbiolinks to address these challenges and offer bioinformatics solutions by using a guided workflow to allow users to query, download and perform integrative analyses of TCGA data. We combined methods from computer science and statistics into the pipeline and incorporated methodologies developed in previous TCGA marker studies and in our own group. Using four different TCGA tumor types (Kidney, Brain, Breast and Colon) as examples, we provide case studies to illustrate examples of reproducibility, integrative analysis and utilization of different Bioconductor packages to advance and accelerate novel discoveries.

...read more

1,161 citations


Cites methods from "limma powers differential expressio..."

  • ...Specifically, the objects are summarized in a ‘SummarizedExperiment’ object (3) to allow easy integration with other Bioconductor packages, such as GRanges (25), IRanges (25), limma (26) and edgeR (27)....

    [...]


References
More filters

Journal Article
01 Jan 2014-MSOR connections
TL;DR: Copyright (©) 1999–2012 R Foundation for Statistical Computing; permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and permission notice are preserved on all copies.

...read more

Abstract: Copyright (©) 1999–2012 R Foundation for Statistical Computing. Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are preserved on all copies. Permission is granted to copy and distribute modified versions of this manual under the conditions for verbatim copying, provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one. Permission is granted to copy and distribute translations of this manual into another language, under the above conditions for modified versions, except that this permission notice may be stated in a translation approved by the R Core Team.

...read more

229,202 citations


Journal ArticleDOI
Abstract: SUMMARY The common approach to the multiplicity problem calls for controlling the familywise error rate (FWER). This approach, though, has faults, and we point out a few. A different approach to problems of multiple significance testing is presented. It calls for controlling the expected proportion of falsely rejected hypotheses -the false discovery rate. This error rate is equivalent to the FWER when all hypotheses are true but is smaller otherwise. Therefore, in problems where the control of the false discovery rate rather than that of the FWER is desired, there is potential for a gain in power. A simple sequential Bonferronitype procedure is proved to control the false discovery rate for independent test statistics, and a simulation study shows that the gain in power is substantial. The use of the new procedure and the appropriateness of the criterion are illustrated with examples.

...read more

71,936 citations


"limma powers differential expressio..." refers background in this paper

  • ...Users can control either the family-wise type I error rate or the false discovery rate (46)....

    [...]


Journal ArticleDOI
08 Feb 1986-The Lancet
TL;DR: An alternative approach, based on graphical techniques and simple calculations, is described, together with the relation between this analysis and the assessment of repeatability.

...read more

Abstract: In clinical measurement comparison of a new measurement technique with an established one is often needed to see whether they agree sufficiently for the new to replace the old. Such investigations are often analysed inappropriately, notably by using correlation coefficients. The use of correlation is misleading. An alternative approach, based on graphical techniques and simple calculations, is described, together with the relation between this analysis and the assessment of repeatability.

...read more

41,576 citations


"limma powers differential expressio..." refers methods in this paper

  • ...Such a plot is called a Bland-Altman plot [36] or a Tukey mean-difference plot [10]....

    [...]


Journal ArticleDOI
01 May 2000-Nature Genetics
TL;DR: The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing.

...read more

Abstract: Genomic sequencing has made it clear that a large fraction of the genes specifying the core biological functions are shared by all eukaryotes. Knowledge of the biological role of such shared proteins in one organism can often be transferred to other organisms. The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing. To this end, three independent ontologies accessible on the World-Wide Web (http://www.geneontology.org) are being constructed: biological process, molecular function and cellular component.

...read more

30,473 citations


Journal ArticleDOI
Abstract: Although genomewide RNA expression analysis has become a routine tool in biomedical research, extracting biological insight from such information remains a major challenge. Here, we describe a powerful analytical method called Gene Set Enrichment Analysis (GSEA) for interpreting gene expression data. The method derives its power by focusing on gene sets, that is, groups of genes that share common biological function, chromosomal location, or regulation. We demonstrate how GSEA yields insights into several cancer-related data sets, including leukemia and lung cancer. Notably, where single-gene analysis finds little similarity between two independent studies of patient survival in lung cancer, GSEA reveals many biological pathways in common. The GSEA method is embodied in a freely available software package, together with an initial database of 1,325 biologically defined gene sets.

...read more

26,320 citations


Performance
Metrics
No. of citations received by the Paper in previous years
YearCitations
202220
20214,044
20203,381
20192,393
20181,680
20171,367