scispace - formally typeset
Search or ask a question
Posted ContentDOI

An algorithm for fast preranked gene set enrichment analysis using cumulative statistic calculation

20 Jun 2016-bioRxiv (Cold Spring Harbor Laboratory)-pp 060012
TL;DR: It is shown that it is possible to make hundreds of thousands permutations in a few minutes, which leads to very accurate p-values, which allows applying standard FDR correction procedures, which are more accurate than the ones currently used.
Abstract: Gene set enrichment analysis is a widely used tool for analyzing gene expression data. However, current implementations are slow due to a large number of required samples for the analysis to have a good statistical power. In this paper we present a novel algorithm, that efficiently reuses one sample multiple times and thus speeds up the analysis. We show that it is possible to make hundreds of thousands permutations in a few minutes, which leads to very accurate p-values. This, in turn, allows applying standard FDR correction procedures, which are more accurate than the ones currently used. The method is implemented in a form of an R package and is freely available at \url{https://github.com/ctlab/fgsea}.
Citations
More filters
Journal ArticleDOI
TL;DR: An updated version of the popular Bioconductor package, clusterProfiler 4.0, which provides a universal interface for functional enrichment analysis in thousands of organisms based on internally supported ontologies and pathways as well as annotation data provided by users or derived from online databases.
Abstract: Functional enrichment analysis is pivotal for interpreting high-throughput omics data in life science. It is crucial for this type of tool to use the latest annotation databases for as many organisms as possible. To meet these requirements, we present here an updated version of our popular Bioconductor package, clusterProfiler 4.0. This package has been enhanced considerably compared with its original version published 9 years ago. The new version provides a universal interface for functional enrichment analysis in thousands of organisms based on internally supported ontologies and pathways as well as annotation data provided by users or derived from online databases. It also extends the dplyr and ggplot2 packages to offer tidy interfaces for data operation and visualization. Other new features include gene set enrichment analysis and comparison of enrichment results from multiple gene lists. We anticipate that clusterProfiler 4.0 will be applied to a wide range of scenarios across diverse organisms.

2,448 citations

Journal ArticleDOI
05 Apr 2018-Cell
TL;DR: Novel stemness indices for assessing the degree of oncogenic dedifferentiation are provided and it is found that the dedifferentiated oncogenic phenotype was generally most prominent in metastatic tumors.

1,099 citations


Cites methods from "An algorithm for fast preranked gen..."

  • ...2.1) Sergushichev, 2016 http://bioconductor.org/packages/release/bioc/html/fgsea.html Methylumi (v2....

    [...]

  • ...We used the fgsea R/Bioconductor package to compute the enrichment scores (Sergushichev, 2016)....

    [...]

Journal ArticleDOI
TL;DR: The growing application of gene expression profiling demands powerful yet user-friendly bioinformatics tools to support systems-level data understanding and NetworkAnalyst was first released in 2014 to address the key need for interpreting gene expression data within the context of protein-protein interaction networks.
Abstract: The growing application of gene expression profiling demands powerful yet user-friendly bioinformatics tools to support systems-level data understanding. NetworkAnalyst was first released in 2014 to address the key need for interpreting gene expression data within the context of protein-protein interaction (PPI) networks. It was soon updated for gene expression meta-analysis with improved workflow and performance. Over the years, NetworkAnalyst has been continuously updated based on community feedback and technology progresses. Users can now perform gene expression profiling for 17 different species. In addition to generic PPI networks, users can now create cell-type or tissue specific PPI networks, gene regulatory networks, gene co-expression networks as well as networks for toxicogenomics and pharmacogenomics studies. The resulting networks can be customized and explored in 2D, 3D as well as Virtual Reality (VR) space. For meta-analysis, users can now visually compare multiple gene lists through interactive heatmaps, enrichment networks, Venn diagrams or chord diagrams. In addition, users have the option to create their own data analysis projects, which can be saved and resumed at a later time. These new features are released together as NetworkAnalyst 3.0, freely available at https://www.networkanalyst.ca.

968 citations

Journal ArticleDOI
TL;DR: iDEP helps unveil the multifaceted functions of p53 and the possible involvement of several microRNAs such as miR-92a, miR/Bioconductor packages, 2 web services, and comprehensive annotation and pathway databases for 220 plant and animal species.
Abstract: RNA-seq is widely used for transcriptomic profiling, but the bioinformatics analysis of resultant data can be time-consuming and challenging, especially for biologists. We aim to streamline the bioinformatic analyses of gene-level data by developing a user-friendly, interactive web application for exploratory data analysis, differential expression, and pathway analysis. iDEP (integrated Differential Expression and Pathway analysis) seamlessly connects 63 R/Bioconductor packages, 2 web services, and comprehensive annotation and pathway databases for 220 plant and animal species. The workflow can be reproduced by downloading customized R code and related pathway files. As an example, we analyzed an RNA-Seq dataset of lung fibroblasts with Hoxa1 knockdown and revealed the possible roles of SP1 and E2F1 and their target genes, including microRNAs, in blocking G1/S transition. In another example, our analysis shows that in mouse B cells without functional p53, ionizing radiation activates the MYC pathway and its downstream genes involved in cell proliferation, ribosome biogenesis, and non-coding RNA metabolism. In wildtype B cells, radiation induces p53-mediated apoptosis and DNA repair while suppressing the target genes of MYC and E2F1, and leads to growth and cell cycle arrest. iDEP helps unveil the multifaceted functions of p53 and the possible involvement of several microRNAs such as miR-92a, miR-504, and miR-30a. In both examples, we validated known molecular pathways and generated novel, testable hypotheses. Combining comprehensive analytic functionalities with massive annotation databases, iDEP ( http://ge-lab.org/idep/ ) enables biologists to easily translate transcriptomic and proteomic data into actionable insights.

618 citations

Journal ArticleDOI
Francine E. Garrett-Bakelman1, Francine E. Garrett-Bakelman2, Manjula Darshi3, Stefan J. Green4, Ruben C. Gur5, Ling Lin6, Brandon R. Macias, Miles J. McKenna7, Cem Meydan2, Tejaswini Mishra6, Jad Nasrini5, Brian D. Piening6, Brian D. Piening8, Lindsay F. Rizzardi9, Kumar Sharma3, Jamila H. Siamwala10, Jamila H. Siamwala11, Lynn Taylor7, Martha Hotz Vitaterna12, Maryam Afkarian13, Ebrahim Afshinnekoo2, Sara Ahadi6, Aditya Ambati6, Maneesh Arya, Daniela Bezdan2, Colin M. Callahan9, Songjie Chen6, Augustine M.K. Choi2, George E. Chlipala4, Kévin Contrepois6, Marisa Covington, Brian Crucian, Immaculata De Vivo14, David F. Dinges5, Douglas J. Ebert, Jason I. Feinberg9, Jorge Gandara2, Kerry George, John Goutsias9, George Grills2, Alan R. Hargens11, Martina Heer15, Martina Heer16, Ryan P. Hillary6, Andrew N. Hoofnagle17, Vivian Hook11, Garrett Jenkinson9, Garrett Jenkinson18, Peng Jiang12, Ali Keshavarzian19, Steven S. Laurie, Brittany Lee-McMullen6, Sarah B. Lumpkins, Matthew MacKay2, Mark Maienschein-Cline4, Ari Melnick2, Tyler M. Moore5, Kiichi Nakahira2, Hemal H. Patel11, Robert Pietrzyk, Varsha Rao6, Rintaro Saito11, Rintaro Saito20, Denis Salins6, Jan M. Schilling11, Dorothy D. Sears11, Caroline Sheridan2, Michael B. Stenger, Rakel Tryggvadottir9, Alexander E. Urban6, Tomas Vaisar17, Benjamin Van Espen11, Jing Zhang6, Michael G. Ziegler11, Sara R. Zwart21, John B. Charles, Craig E. Kundrot, Graham B. I. Scott22, Susan M. Bailey7, Mathias Basner5, Andrew P. Feinberg9, Stuart M. C. Lee, Christopher E. Mason, Emmanuel Mignot6, Brinda K. Rana11, Scott M. Smith, Michael Snyder6, Fred W. Turek10, Fred W. Turek12 
12 Apr 2019-Science
TL;DR: Given that the majority of the biological and human health variables remained stable, or returned to baseline, after a 340-day space mission, these data suggest that human health can be mostly sustained over this duration of spaceflight.
Abstract: INTRODUCTION To date, 559 humans have been flown into space, but long-duration (>300 days) missions are rare (n = 8 total). Long-duration missions that will take humans to Mars and beyond are planned by public and private entities for the 2020s and 2030s; therefore, comprehensive studies are needed now to assess the impact of long-duration spaceflight on the human body, brain, and overall physiology. The space environment is made harsh and challenging by multiple factors, including confinement, isolation, and exposure to environmental stressors such as microgravity, radiation, and noise. The selection of one of a pair of monozygotic (identical) twin astronauts for NASA’s first 1-year mission enabled us to compare the impact of the spaceflight environment on one twin to the simultaneous impact of the Earth environment on a genetically matched subject. RATIONALE The known impacts of the spaceflight environment on human health and performance, physiology, and cellular and molecular processes are numerous and include bone density loss, effects on cognitive performance, microbial shifts, and alterations in gene regulation. However, previous studies collected very limited data, did not integrate simultaneous effects on multiple systems and data types in the same subject, or were restricted to 6-month missions. Measurement of the same variables in an astronaut on a year-long mission and in his Earth-bound twin indicated the biological measures that might be used to determine the effects of spaceflight. Presented here is an integrated longitudinal, multidimensional description of the effects of a 340-day mission onboard the International Space Station. RESULTS Physiological, telomeric, transcriptomic, epigenetic, proteomic, metabolomic, immune, microbiomic, cardiovascular, vision-related, and cognitive data were collected over 25 months. Some biological functions were not significantly affected by spaceflight, including the immune response (T cell receptor repertoire) to the first test of a vaccination in flight. However, significant changes in multiple data types were observed in association with the spaceflight period; the majority of these eventually returned to a preflight state within the time period of the study. These included changes in telomere length, gene regulation measured in both epigenetic and transcriptional data, gut microbiome composition, body weight, carotid artery dimensions, subfoveal choroidal thickness and peripapillary total retinal thickness, and serum metabolites. In addition, some factors were significantly affected by the stress of returning to Earth, including inflammation cytokines and immune response gene networks, as well as cognitive performance. For a few measures, persistent changes were observed even after 6 months on Earth, including some genes’ expression levels, increased DNA damage from chromosomal inversions, increased numbers of short telomeres, and attenuated cognitive function. CONCLUSION Given that the majority of the biological and human health variables remained stable, or returned to baseline, after a 340-day space mission, these data suggest that human health can be mostly sustained over this duration of spaceflight. The persistence of the molecular changes (e.g., gene expression) and the extrapolation of the identified risk factors for longer missions (>1 year) remain estimates and should be demonstrated with these measures in future astronauts. Finally, changes described in this study highlight pathways and mechanisms that may be vulnerable to spaceflight and may require safeguards for longer space missions; thus, they serve as a guide for targeted countermeasures or monitoring during future missions.

538 citations

References
More filters
Journal ArticleDOI
TL;DR: The Gene Set Enrichment Analysis (GSEA) method as discussed by the authors focuses on gene sets, that is, groups of genes that share common biological function, chromosomal location, or regulation.
Abstract: Although genomewide RNA expression analysis has become a routine tool in biomedical research, extracting biological insight from such information remains a major challenge. Here, we describe a powerful analytical method called Gene Set Enrichment Analysis (GSEA) for interpreting gene expression data. The method derives its power by focusing on gene sets, that is, groups of genes that share common biological function, chromosomal location, or regulation. We demonstrate how GSEA yields insights into several cancer-related data sets, including leukemia and lung cancer. Notably, where single-gene analysis finds little similarity between two independent studies of patient survival in lung cancer, GSEA reveals many biological pathways in common. The GSEA method is embodied in a freely available software package, together with an initial database of 1,325 biologically defined gene sets.

34,830 citations

Journal ArticleDOI
TL;DR: The philosophy and design of the limma package is reviewed, summarizing both new and historical features, with an emphasis on recent enhancements and features that have not been previously described.
Abstract: limma is an R/Bioconductor software package that provides an integrated solution for analysing data from gene expression experiments. It contains rich features for handling complex experimental designs and for information borrowing to overcome the problem of small sample sizes. Over the past decade, limma has been a popular choice for gene discovery through differential expression analyses of microarray and high-throughput PCR data. The package contains particularly strong facilities for reading, normalizing and exploring such data. Recently, the capabilities of limma have been significantly expanded in two important directions. First, the package can now perform both differential expression and differential splicing analyses of RNA sequencing (RNA-seq) data. All the downstream analysis tools previously restricted to microarray data are now available for RNA-seq as well. These capabilities allow users to analyse both RNA-seq and microarray data with very similar pipelines. Second, the package is now able to go past the traditional gene-wise expression analyses in a variety of ways, analysing expression profiles in terms of co-regulated sets of genes or in terms of higher-order expression signatures. This provides enhanced possibilities for biological interpretation of gene expression differences. This article reviews the philosophy and design of the limma package, summarizing both new and historical features, with an emphasis on recent enhancements and features that have not been previously described.

22,147 citations

Journal ArticleDOI
TL;DR: The Reactome data model allows us to represent many diverse processes in the human system, including the pathways of intermediary metabolism, regulatory pathways, and signal transduction, and high-level processes, such as the cell cycle.
Abstract: Reactome, located at http://www.reactome.org is a curated, peer-reviewed resource of human biological processes. Given the genetic makeup of an organism, the complete set of possible reactions constitutes its reactome. The basic unit of the Reactome database is a reaction; reactions are then grouped into causal chains to form pathways. The Reactome data model allows us to represent many diverse processes in the human system, including the pathways of intermediary metabolism, regulatory pathways, and signal transduction, and high-level processes, such as the cell cycle. Reactome provides a qualitative framework, on which quantitative data can be superimposed. Tools have been developed to facilitate custom data entry and annotation by expert biologists, and to allow visualization and exploration of the finished dataset as an interactive process map. Although our primary curational domain is pathways from Homo sapiens, we regularly create electronic projections of human pathways onto other organisms via putative orthologs, thus making Reactome relevant to model organism research communities. The database is publicly available under open source terms, which allows both its content and its software infrastructure to be freely used and redistributed.

1,246 citations

Journal ArticleDOI
16 Jan 2009-Immunity
TL;DR: Although modifications of signature-cytokine genes (Ifng, Il4, and Il17) partially conform to the expectation of lineage commitment, genes encoding transcription factors like Tbx21 exhibit a broad spectrum of epigenetic states, consistent with the demonstrated T-bet and interferon-gamma induction in nTreg cells.

1,098 citations


"An algorithm for fast preranked gen..." refers methods in this paper

  • ...To assess the algorithm performance we ran the algorithm on a T-cells differentiation dataset [4]....

    [...]

Journal ArticleDOI
TL;DR: DOSE is an R package providing semantic similarity computations among DO terms and genes which allows biologists to explore the similarities of diseases and of gene functions in disease perspective and to verify disease relevance in a biological experiment and identify unexpected disease associations.
Abstract: Summary Disease ontology (DO) annotates human genes in the context of disease. DO is important annotation in translating molecular findings from high-throughput data to clinical relevance. DOSE is an R package providing semantic similarity computations among DO terms and genes which allows biologists to explore the similarities of diseases and of gene functions in disease perspective. Enrichment analyses including hypergeometric model and gene set enrichment analysis are also implemented to support discovering disease associations of high-throughput biological data. This allows biologists to verify disease relevance in a biological experiment and identify unexpected disease associations. Comparison among gene clusters is also supported. Availability and implementation DOSE is released under Artistic-2.0 License. The source code and documents are freely available through Bioconductor (http://www.bioconductor.org/packages/release/bioc/html/DOSE.html). Supplementary information Supplementary data are available at Bioinformatics online. Contact gcyu@connect.hku.hk or tqyhe@jnu.edu.cn.

642 citations