scispace - formally typeset
Search or ask a question
Journal ArticleDOI

GeneTrail—advanced gene set enrichment analysis

01 Jul 2007-Nucleic Acids Research (Oxford University Press)-Vol. 35, pp 186-192
TL;DR: GeneTrail's statistics module includes a novel dynamic-programming algorithm that improves the P-value computation of GSEA methods considerably and is freely accessible at http://genetrail.uni-sb.de.
Abstract: We present a comprehensive and efficient gene set analysis tool, called 'GeneTrail' that offers a rich functionality and is easy to use. Our web-based application facilitates the statistical evaluation of high-throughput genomic or proteomic data sets with respect to enrichment of functional categories. GeneTrail covers a wide variety of biological categories and pathways, among others KEGG, TRANSPATH, TRANSFAC, and GO. Our web server provides two common statistical approaches, 'Over-Representation Analysis' (ORA) comparing a reference set of genes to a test set, and 'Gene Set Enrichment Analysis' (GSEA) scoring sorted lists of genes. Besides other newly developed features, GeneTrail's statistics module includes a novel dynamic-programming algorithm that improves the P-value computation of GSEA methods considerably. GeneTrail is freely accessible at http://genetrail.bioinf.uni-sb.de.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: The survey will help tool designers/developers and experienced end users understand the underlying algorithms and pertinent details of particular tool categories/tools, enabling them to make the best choices for their particular research interests.
Abstract: Functional analysis of large gene lists, derived in most cases from emerging high-throughput genomic, proteomic and bioinformatics scanning approaches, is still a challenging and daunting task. The gene-annotation enrichment analysis is a promising high-throughput strategy that increases the likelihood for investigators to identify biological processes most pertinent to their study. Approximately 68 bioinformatics enrichment tools that are currently available in the community are collected in this survey. Tools are uniquely categorized into three major classes, according to their underlying enrichment algorithms. The comprehensive collections, unique tool classifications and associated questions/issues will provide a more comprehensive and up-to-date view regarding the advantages, pitfalls and recent trends in a simpler tool-class level rather than by a tool-by-tool approach. Thus, the survey will help tool designers/developers and experienced end users understand the underlying algorithms and pertinent details of particular tool categories/tools, enabling them to make the best choices for their particular research interests.

13,102 citations


Cites background from "GeneTrail—advanced gene set enrichm..."

  • ...Moreover, a number of tools, such as Onto-Express (62), easyGO (66), GoMiner (10), eGOn (42), GoSurfer (25), GOFFA (50) and GeneTrail (57), are able to display the enrichment analysis results on the DAG or a tree structure so that users may easily explore the enrichment results in neighboring nodes....

    [...]

  • ...However, some recent tools or new releases of early-generation tools, such as Onto-Express (62), DAVID (61), WebGestalt (40), Fatigo+ (56), FACT (30), g:Profiler (64), GAzer (63) and GeneTrail (57), etc., extended their backend bio-databases by integrating wide-range heterogeneous data content (e.g....

    [...]

  • ...However, some recent tools or new releases of early-generation tools, such as Onto-Express (62), DAVID (61), WebGestalt (40), Fatigo+ (56), FACT (30), g:Profiler (64), GAzer (63) and GeneTrail (57), etc....

    [...]

Journal ArticleDOI
TL;DR: A significant update to one of the tools in this domain called Enrichr, a comprehensive resource for curated gene sets and a search engine that accumulates biological knowledge for further biological discoveries is presented.
Abstract: Enrichment analysis is a popular method for analyzing gene sets generated by genome-wide experiments. Here we present a significant update to one of the tools in this domain called Enrichr. Enrichr currently contains a large collection of diverse gene set libraries available for analysis and download. In total, Enrichr currently contains 180 184 annotated gene sets from 102 gene set libraries. New features have been added to Enrichr including the ability to submit fuzzy sets, upload BED files, improved application programming interface and visualization of the results as clustergrams. Overall, Enrichr is a comprehensive resource for curated gene sets and a search engine that accumulates biological knowledge for further biological discoveries. Enrichr is freely available at: http://amp.pharm.mssm.edu/Enrichr.

6,201 citations


Cites methods from "GeneTrail—advanced gene set enrichm..."

  • ...To benchmark the performance of the various enrichment analysis methods implemented within Enrichr, namely, the proportion test, the Z-score and the combined score, as well as other similar published methods, for example, the over representation analysis (ORA) method (11), as well as simple methods such as the Jaccard distance or the number of overlapping genes, we processed 489 experiments that genetically perturbed (knockdown, knockout or overexpression) transcript factors (TFs) from 293 studies available from GEO....

    [...]

Journal ArticleDOI
TL;DR: A web server, KOBAS 2.0, is reported, which annotates an input set of genes with putative pathways and disease relationships based on mapping to genes with known annotations, which allows for both ID mapping and cross-species sequence similarity mapping.
Abstract: High-throughput experimental technologies often identify dozens to hundreds of genes related to, or changed in, a biological or pathological process. From these genes one wants to identify biological pathways that may be involved and diseases that may be implicated. Here, we report a web server, KOBAS 2.0, which annotates an input set of genes with putative pathways and disease relationships based on mapping to genes with known annotations. It allows for both ID mapping and cross-species sequence similarity mapping. It then performs statistical tests to identify statistically significantly enriched pathways and diseases. KOBAS 2.0 incorporates knowledge across 1327 species from 5 pathway databases (KEGG PATHWAY, PID, BioCyc, Reactome and Panther) and 5 human disease databases (OMIM, KEGG DISEASE, FunDO, GAD and NHGRI GWAS Catalog). KOBAS 2.0 can be accessed at http://kobas.cbi.pku.edu.cn.

3,293 citations


Cites background from "GeneTrail—advanced gene set enrichm..."

  • ...A growing number of tools have been developed for pathway and disease identification, including, but not limited to, MAPPFinder (13), EASE (14), DAVID (15,16), ArrayXPath (17), WebGestalt (18), FuncCluster (19), PageMan (20), GENECODIS (21,22), GeneTrail (23), g:Profiler (24), FunNet (25) and PaLS (26)....

    [...]

Journal ArticleDOI
TL;DR: GOrilla is a web-based application that identifies enriched GO terms in ranked lists of genes, without requiring the user to provide explicit target and background sets, and its unique features and advantages over other threshold free enrichment tools include rigorous statistics, fast running time and an effective graphical representation.
Abstract: Since the inception of the GO annotation project, a variety of tools have been developed that support exploring and searching the GO database In particular, a variety of tools that perform GO enrichment analysis are currently available Most of these tools require as input a target set of genes and a background set and seek enrichment in the target set compared to the background set A few tools also exist that support analyzing ranked lists The latter typically rely on simulations or on union-bound correction for assigning statistical significance to the results GOrilla is a web-based application that identifies enriched GO terms in ranked lists of genes, without requiring the user to provide explicit target and background sets This is particularly useful in many typical cases where genomic data may be naturally represented as a ranked list of genes (eg by level of expression or of differential expression) GOrilla employs a flexible threshold statistical approach to discover GO terms that are significantly enriched at the top of a ranked gene list Building on a complete theoretical characterization of the underlying distribution, called mHG, GOrilla computes an exact p-value for the observed enrichment, taking threshold multiple testing into account without the need for simulations This enables rigorous statistical analysis of thousand of genes and thousands of GO terms in order of seconds The output of the enrichment analysis is visualized as a hierarchical structure, providing a clear view of the relations between enriched GO terms GOrilla is an efficient GO analysis tool with unique features that make a useful addition to the existing repertoire of GO enrichment tools GOrilla's unique features and advantages over other threshold free enrichment tools include rigorous statistics, fast running time and an effective graphical representation GOrilla is publicly available at: http://cbl-gorillacstechnionacil

3,157 citations


Cites methods from "GeneTrail—advanced gene set enrichm..."

  • ...A few tools have been developed that use a threshold free approach including GSEA [12], FatiScan [13], GO-stat [14], GeneTrail [15] and iGA [16]....

    [...]

Journal ArticleDOI
TL;DR: The evolution of knowledge base–driven pathway analysis over its first decade is discussed, distinctly divided into three generations, and a number of annotation challenges that must be addressed to enable development of the next generation of pathway analysis methods are identified.
Abstract: Pathway analysis has become the first choice for gaining insight into the underlying biology of differentially expressed genes and proteins, as it reduces complexity and has increased explanatory power. We discuss the evolution of knowledge base-driven pathway analysis over its first decade, distinctly divided into three generations. We also discuss the limitations that are specific to each generation, and how they are addressed by successive generations of methods. We identify a number of annotation challenges that must be addressed to enable development of the next generation of pathway analysis methods. Furthermore, we identify a number of methodological challenges that the next generation of methods must tackle to take advantage of the technological advances in genomics and proteomics in order to improve specificity, sensitivity, and relevance of pathway analysis.

1,357 citations


Additional excerpts

  • ...de) [87]...

    [...]

References
More filters
Journal ArticleDOI
TL;DR: In this paper, a different approach to problems of multiple significance testing is presented, which calls for controlling the expected proportion of falsely rejected hypotheses -the false discovery rate, which is equivalent to the FWER when all hypotheses are true but is smaller otherwise.
Abstract: SUMMARY The common approach to the multiplicity problem calls for controlling the familywise error rate (FWER). This approach, though, has faults, and we point out a few. A different approach to problems of multiple significance testing is presented. It calls for controlling the expected proportion of falsely rejected hypotheses -the false discovery rate. This error rate is equivalent to the FWER when all hypotheses are true but is smaller otherwise. Therefore, in problems where the control of the false discovery rate rather than that of the FWER is desired, there is potential for a gain in power. A simple sequential Bonferronitype procedure is proved to control the false discovery rate for independent test statistics, and a simulation study shows that the gain in power is substantial. The use of the new procedure and the appropriateness of the criterion are illustrated with examples.

83,420 citations


"GeneTrail—advanced gene set enrichm..." refers methods in this paper

  • ...Therefore, GeneTrail offers two adjustment methods, the conservative Bonferroni adjustment and the control of the false discovery rate (FDR) according to Benjamini and Hochberg (24)....

    [...]

Journal ArticleDOI
TL;DR: The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing.
Abstract: Genomic sequencing has made it clear that a large fraction of the genes specifying the core biological functions are shared by all eukaryotes. Knowledge of the biological role of such shared proteins in one organism can often be transferred to other organisms. The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing. To this end, three independent ontologies accessible on the World-Wide Web (http://www.geneontology.org) are being constructed: biological process, molecular function and cellular component.

35,225 citations


"GeneTrail—advanced gene set enrichm..." refers methods in this paper

  • ...Additionally, GeneTrail uses a local copy of the GO database (1) that includes electronically inferred annotations (IEAs) and manually curated annotations....

    [...]

  • ...Some of the developed tools focus on the analysis of only one type of functional categories for example various Gene Ontology (GO) (1) based tools, among them FatiGO (2), BiNGO (3), and GOstat (4)....

    [...]

Journal ArticleDOI
TL;DR: The Gene Set Enrichment Analysis (GSEA) method as discussed by the authors focuses on gene sets, that is, groups of genes that share common biological function, chromosomal location, or regulation.
Abstract: Although genomewide RNA expression analysis has become a routine tool in biomedical research, extracting biological insight from such information remains a major challenge. Here, we describe a powerful analytical method called Gene Set Enrichment Analysis (GSEA) for interpreting gene expression data. The method derives its power by focusing on gene sets, that is, groups of genes that share common biological function, chromosomal location, or regulation. We demonstrate how GSEA yields insights into several cancer-related data sets, including leukemia and lung cancer. Notably, where single-gene analysis finds little similarity between two independent studies of patient survival in lung cancer, GSEA reveals many biological pathways in common. The GSEA method is embodied in a freely available software package, together with an initial database of 1,325 biologically defined gene sets.

34,830 citations

Book
01 Mar 1973
TL;DR: An ideal text for an upper-level undergraduate or first-year graduate course, Nonparametric Statistical Methods, Second Edition is also an invaluable source for professionals who want to keep abreast of the latest developments within this dynamic branch of modern statistics.
Abstract: This Second Edition of Myles Hollander and Douglas A. Wolfe's successful Nonparametric Statistical Methods meets the needs of a new generation of users, with completely up-to-date coverage of this important statistical area. Like its predecessor, the revised edition, along with its companion ftp site, aims to equip readers with the conceptual and technical skills necessary to select and apply the appropriate procedures for a given situation. An extensive array of examples drawn from actual experiments illustrates clearly how to use nonparametric approaches to handle one- or two-sample location and dispersion problems, dichotomous data, and one-way and two-way layout problems. An ideal text for an upper-level undergraduate or first-year graduate course, Nonparametric Statistical Methods, Second Edition is also an invaluable source for professionals who want to keep abreast of the latest developments within this dynamic branch of modern statistics.

7,240 citations

Journal ArticleDOI
TL;DR: The National Center for Biotechnology Information Reference Sequence (RefSeq) database provides a non-redundant collection of sequences representing genomic data, transcripts and proteins that pragmatically includes sequence data that are currently publicly available in the archival databases.
Abstract: The National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database (http://www.ncbi.nlm.nih.gov/RefSeq/) provides a non-redundant collection of sequences representing genomic data, transcripts and proteins. Although the goal is to provide a comprehensive dataset representing the complete sequence information for any given species, the database pragmatically includes sequence data that are currently publicly available in the archival databases. The database incorporates data from over 2400 organisms and includes over one million proteins representing significant taxonomic diversity spanning prokaryotes, eukaryotes and viruses. Nucleotide and protein sequences are explicitly linked, and the sequences are linked to other resources including the NCBI Map Viewer and Gene. Sequences are annotated to include coding regions, conserved domains, variation, references, names, database cross-references, and other features using a combined approach of collaboration and other input from the scientific community, automated annotation, propagation from GenBank and curation by NCBI staff.

4,229 citations


"GeneTrail—advanced gene set enrichm..." refers background in this paper

  • ...The current version of BNþþ integrates for example the following biological data sources: RefSeq ( 12 ), KEGG (13), TRANSPATH (14), TRANSFAC (15), DIP (16), MINT (17), HPRD (18), IntAct (19)....

    [...]