GeneTrail—advanced gene set enrichment analysis

doi:10.1093/NAR/GKM323

Home
/
Papers
/
GeneTrail—advanced gene set enrichment analysis

Journal Article•DOI•

GeneTrail—advanced gene set enrichment analysis

Christina Backes¹, Andreas Keller¹, Jan Kuentzer¹, Benny Kneissl¹, Nicole Comtesse¹, Yasser A. Elnakady¹, Rolf Müller¹, Eckart Meese¹, Hans-Peter Lenhof¹ - Show less +5 more•Institutions (1)

Saarland University¹

01 Jul 2007-Nucleic Acids Research (Oxford University Press)-Vol. 35, pp 186-192

TL;DR: GeneTrail's statistics module includes a novel dynamic-programming algorithm that improves the P-value computation of GSEA methods considerably and is freely accessible at http://genetrail.uni-sb.de.

read less

Abstract: We present a comprehensive and efficient gene set analysis tool, called 'GeneTrail' that offers a rich functionality and is easy to use. Our web-based application facilitates the statistical evaluation of high-throughput genomic or proteomic data sets with respect to enrichment of functional categories. GeneTrail covers a wide variety of biological categories and pathways, among others KEGG, TRANSPATH, TRANSFAC, and GO. Our web server provides two common statistical approaches, 'Over-Representation Analysis' (ORA) comparing a reference set of genes to a test set, and 'Gene Set Enrichment Analysis' (GSEA) scoring sorted lists of genes. Besides other newly developed features, GeneTrail's statistics module includes a novel dynamic-programming algorithm that improves the P-value computation of GSEA methods considerably. GeneTrail is freely accessible at http://genetrail.bioinf.uni-sb.de.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists

[...]

Da-Wei Huang¹, Brad T. Sherman¹, Richard A. Lempicki¹•Institutions (1)

Science Applications International Corporation¹

01 Jan 2009-Nucleic Acids Research

TL;DR: The survey will help tool designers/developers and experienced end users understand the underlying algorithms and pertinent details of particular tool categories/tools, enabling them to make the best choices for their particular research interests.

...read moreread less

Abstract: Functional analysis of large gene lists, derived in most cases from emerging high-throughput genomic, proteomic and bioinformatics scanning approaches, is still a challenging and daunting task. The gene-annotation enrichment analysis is a promising high-throughput strategy that increases the likelihood for investigators to identify biological processes most pertinent to their study. Approximately 68 bioinformatics enrichment tools that are currently available in the community are collected in this survey. Tools are uniquely categorized into three major classes, according to their underlying enrichment algorithms. The comprehensive collections, unique tool classifications and associated questions/issues will provide a more comprehensive and up-to-date view regarding the advantages, pitfalls and recent trends in a simpler tool-class level rather than by a tool-by-tool approach. Thus, the survey will help tool designers/developers and experienced end users understand the underlying algorithms and pertinent details of particular tool categories/tools, enabling them to make the best choices for their particular research interests.

...read moreread less

13,102 citations

Cites background from "GeneTrail—advanced gene set enrichm..."

...Moreover, a number of tools, such as Onto-Express (62), easyGO (66), GoMiner (10), eGOn (42), GoSurfer (25), GOFFA (50) and GeneTrail (57), are able to display the enrichment analysis results on the DAG or a tree structure so that users may easily explore the enrichment results in neighboring nodes....
[...]
...However, some recent tools or new releases of early-generation tools, such as Onto-Express (62), DAVID (61), WebGestalt (40), Fatigo+ (56), FACT (30), g:Profiler (64), GAzer (63) and GeneTrail (57), etc., extended their backend bio-databases by integrating wide-range heterogeneous data content (e.g....
[...]
...However, some recent tools or new releases of early-generation tools, such as Onto-Express (62), DAVID (61), WebGestalt (40), Fatigo+ (56), FACT (30), g:Profiler (64), GAzer (63) and GeneTrail (57), etc....
[...]

Journal Article•DOI•

Enrichr: a comprehensive gene set enrichment analysis web server 2016 update

[...]

Maxim V. Kuleshov¹, Matthew R. Jones¹, Andrew D. Rouillard¹, Nicolas F. Fernandez¹, Qiaonan Duan¹, Zichen Wang¹, Simon Koplev¹, Sherry L. Jenkins¹, Kathleen M. Jagodnik², Alexander Lachmann¹, Michael G. McDermott¹, Caroline D. Monteiro¹, Gregory W. Gundersen¹, Avi Ma'ayan¹ - Show less +10 more•Institutions (2)

Icahn School of Medicine at Mount Sinai¹, Glenn Research Center²

08 Jul 2016-Nucleic Acids Research

TL;DR: A significant update to one of the tools in this domain called Enrichr, a comprehensive resource for curated gene sets and a search engine that accumulates biological knowledge for further biological discoveries is presented.

...read moreread less

Abstract: Enrichment analysis is a popular method for analyzing gene sets generated by genome-wide experiments. Here we present a significant update to one of the tools in this domain called Enrichr. Enrichr currently contains a large collection of diverse gene set libraries available for analysis and download. In total, Enrichr currently contains 180 184 annotated gene sets from 102 gene set libraries. New features have been added to Enrichr including the ability to submit fuzzy sets, upload BED files, improved application programming interface and visualization of the results as clustergrams. Overall, Enrichr is a comprehensive resource for curated gene sets and a search engine that accumulates biological knowledge for further biological discoveries. Enrichr is freely available at: http://amp.pharm.mssm.edu/Enrichr.

...read moreread less

6,201 citations

Cites methods from "GeneTrail—advanced gene set enrichm..."

...To benchmark the performance of the various enrichment analysis methods implemented within Enrichr, namely, the proportion test, the Z-score and the combined score, as well as other similar published methods, for example, the over representation analysis (ORA) method (11), as well as simple methods such as the Jaccard distance or the number of overlapping genes, we processed 489 experiments that genetically perturbed (knockdown, knockout or overexpression) transcript factors (TFs) from 293 studies available from GEO....
[...]

Journal Article•DOI•

KOBAS 2.0: a web server for annotation and identification of enriched pathways and diseases

[...]

Chen Xie¹, Xizeng Mao², Jiaju Huang², Yang Ding², Jianmin Wu², Shan Dong², Lei Kong², Ge Gao², Chuan-Yun Li², Liping Wei² - Show less +6 more•Institutions (2)

Peking University¹, Garvan Institute of Medical Research²

01 Jul 2011-Nucleic Acids Research

TL;DR: A web server, KOBAS 2.0, is reported, which annotates an input set of genes with putative pathways and disease relationships based on mapping to genes with known annotations, which allows for both ID mapping and cross-species sequence similarity mapping.

...read moreread less

Abstract: High-throughput experimental technologies often identify dozens to hundreds of genes related to, or changed in, a biological or pathological process. From these genes one wants to identify biological pathways that may be involved and diseases that may be implicated. Here, we report a web server, KOBAS 2.0, which annotates an input set of genes with putative pathways and disease relationships based on mapping to genes with known annotations. It allows for both ID mapping and cross-species sequence similarity mapping. It then performs statistical tests to identify statistically significantly enriched pathways and diseases. KOBAS 2.0 incorporates knowledge across 1327 species from 5 pathway databases (KEGG PATHWAY, PID, BioCyc, Reactome and Panther) and 5 human disease databases (OMIM, KEGG DISEASE, FunDO, GAD and NHGRI GWAS Catalog). KOBAS 2.0 can be accessed at http://kobas.cbi.pku.edu.cn.

...read moreread less

3,293 citations

Cites background from "GeneTrail—advanced gene set enrichm..."

...A growing number of tools have been developed for pathway and disease identification, including, but not limited to, MAPPFinder (13), EASE (14), DAVID (15,16), ArrayXPath (17), WebGestalt (18), FuncCluster (19), PageMan (20), GENECODIS (21,22), GeneTrail (23), g:Profiler (24), FunNet (25) and PaLS (26)....
[...]

Journal Article•DOI•

GOrilla: a tool for discovery and visualization of enriched GO terms in ranked gene lists

[...]

Eran Eden¹, Roy Navon², Roy Navon³, Israel Steinfeld⁴, Israel Steinfeld³, Doron Lipson⁴, Zohar Yakhini⁴, Zohar Yakhini³ - Show less +4 more•Institutions (4)

Weizmann Institute of Science¹, Tel Aviv University², Agilent Technologies³, Technion – Israel Institute of Technology⁴

03 Feb 2009-BMC Bioinformatics

TL;DR: GOrilla is a web-based application that identifies enriched GO terms in ranked lists of genes, without requiring the user to provide explicit target and background sets, and its unique features and advantages over other threshold free enrichment tools include rigorous statistics, fast running time and an effective graphical representation.

...read moreread less

Abstract: Since the inception of the GO annotation project, a variety of tools have been developed that support exploring and searching the GO database In particular, a variety of tools that perform GO enrichment analysis are currently available Most of these tools require as input a target set of genes and a background set and seek enrichment in the target set compared to the background set A few tools also exist that support analyzing ranked lists The latter typically rely on simulations or on union-bound correction for assigning statistical significance to the results GOrilla is a web-based application that identifies enriched GO terms in ranked lists of genes, without requiring the user to provide explicit target and background sets This is particularly useful in many typical cases where genomic data may be naturally represented as a ranked list of genes (eg by level of expression or of differential expression) GOrilla employs a flexible threshold statistical approach to discover GO terms that are significantly enriched at the top of a ranked gene list Building on a complete theoretical characterization of the underlying distribution, called mHG, GOrilla computes an exact p-value for the observed enrichment, taking threshold multiple testing into account without the need for simulations This enables rigorous statistical analysis of thousand of genes and thousands of GO terms in order of seconds The output of the enrichment analysis is visualized as a hierarchical structure, providing a clear view of the relations between enriched GO terms GOrilla is an efficient GO analysis tool with unique features that make a useful addition to the existing repertoire of GO enrichment tools GOrilla's unique features and advantages over other threshold free enrichment tools include rigorous statistics, fast running time and an effective graphical representation GOrilla is publicly available at: http://cbl-gorillacstechnionacil

...read moreread less

3,157 citations

Cites methods from "GeneTrail—advanced gene set enrichm..."

...A few tools have been developed that use a threshold free approach including GSEA [12], FatiScan [13], GO-stat [14], GeneTrail [15] and iGA [16]....
[...]

Journal Article•DOI•

Ten years of pathway analysis: current approaches and outstanding challenges.

[...]

Purvesh Khatri¹, Marina Sirota¹, Marina Sirota², Atul J. Butte¹, Atul J. Butte² - Show less +1 more•Institutions (2)

Stanford University¹, Lucile Packard Children's Hospital²

23 Feb 2012-PLOS Computational Biology

TL;DR: The evolution of knowledge base–driven pathway analysis over its first decade is discussed, distinctly divided into three generations, and a number of annotation challenges that must be addressed to enable development of the next generation of pathway analysis methods are identified.

...read moreread less

Abstract: Pathway analysis has become the first choice for gaining insight into the underlying biology of differentially expressed genes and proteins, as it reduces complexity and has increased explanatory power. We discuss the evolution of knowledge base-driven pathway analysis over its first decade, distinctly divided into three generations. We also discuss the limitations that are specific to each generation, and how they are addressed by successive generations of methods. We identify a number of annotation challenges that must be addressed to enable development of the next generation of pathway analysis methods. Furthermore, we identify a number of methodological challenges that the next generation of methods must tackle to take advantage of the technological advances in genomics and proteomics in order to improve specificity, sensitivity, and relevance of pathway analysis.

...read moreread less

1,357 citations

Additional excerpts

...de) [87]...
[...]

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Controlling the false discovery rate: a practical and powerful approach to multiple testing

[...]

Yoav Benjamini, Yosef Hochberg

01 Jan 1995-Journal of the royal statistical society series b-methodological

TL;DR: In this paper, a different approach to problems of multiple significance testing is presented, which calls for controlling the expected proportion of falsely rejected hypotheses -the false discovery rate, which is equivalent to the FWER when all hypotheses are true but is smaller otherwise.

...read moreread less

Abstract: SUMMARY The common approach to the multiplicity problem calls for controlling the familywise error rate (FWER). This approach, though, has faults, and we point out a few. A different approach to problems of multiple significance testing is presented. It calls for controlling the expected proportion of falsely rejected hypotheses -the false discovery rate. This error rate is equivalent to the FWER when all hypotheses are true but is smaller otherwise. Therefore, in problems where the control of the false discovery rate rather than that of the FWER is desired, there is potential for a gain in power. A simple sequential Bonferronitype procedure is proved to control the false discovery rate for independent test statistics, and a simulation study shows that the gain in power is substantial. The use of the new procedure and the appropriateness of the criterion are illustrated with examples.

...read moreread less

83,420 citations

"GeneTrail—advanced gene set enrichm..." refers methods in this paper

...Therefore, GeneTrail offers two adjustment methods, the conservative Bonferroni adjustment and the control of the false discovery rate (FDR) according to Benjamini and Hochberg (24)....
[...]

Journal Article•DOI•

Gene Ontology: tool for the unification of biology

[...]

M Ashburner¹, Catherine A. Ball, Judith A. Blake, David Botstein, Heather Butler, J. M. Cherry, Allan Peter Davis, Kara Dolinski, Selina S. Dwight, J.T. Eppig, Midori A. Harris, David P. Hill, Laurie Issel-Tarver, Andrew Kasarskis, Suzanna E. Lewis, John C. Matese, Joel E. Richardson, M. Ringwald, Gerald M. Rubin, Gavin Sherlock - Show less +16 more•Institutions (1)

Stanford University¹

01 May 2000-Nature Genetics

TL;DR: The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing.

...read moreread less

Abstract: Genomic sequencing has made it clear that a large fraction of the genes specifying the core biological functions are shared by all eukaryotes. Knowledge of the biological role of such shared proteins in one organism can often be transferred to other organisms. The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing. To this end, three independent ontologies accessible on the World-Wide Web (http://www.geneontology.org) are being constructed: biological process, molecular function and cellular component.

...read moreread less

35,225 citations

"GeneTrail—advanced gene set enrichm..." refers methods in this paper

...Additionally, GeneTrail uses a local copy of the GO database (1) that includes electronically inferred annotations (IEAs) and manually curated annotations....
[...]
...Some of the developed tools focus on the analysis of only one type of functional categories for example various Gene Ontology (GO) (1) based tools, among them FatiGO (2), BiNGO (3), and GOstat (4)....
[...]

Journal Article•DOI•

Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles

[...]

Aravind Subramanian¹, Pablo Tamayo¹, Vamsi K. Mootha², Sayan Mukherjee³, Benjamin L. Ebert², Michael A. Gillette², Amanda G. Paulovich⁴, Scott L. Pomeroy², Todd R. Golub², Eric S. Lander¹, Jill P. Mesirov¹ - Show less +7 more•Institutions (4)

Massachusetts Institute of Technology¹, Harvard University², Duke University³, Fred Hutchinson Cancer Research Center⁴

25 Oct 2005-Proceedings of the National Academy of Sciences of the United States of America

TL;DR: The Gene Set Enrichment Analysis (GSEA) method as discussed by the authors focuses on gene sets, that is, groups of genes that share common biological function, chromosomal location, or regulation.

...read moreread less

Abstract: Although genomewide RNA expression analysis has become a routine tool in biomedical research, extracting biological insight from such information remains a major challenge. Here, we describe a powerful analytical method called Gene Set Enrichment Analysis (GSEA) for interpreting gene expression data. The method derives its power by focusing on gene sets, that is, groups of genes that share common biological function, chromosomal location, or regulation. We demonstrate how GSEA yields insights into several cancer-related data sets, including leukemia and lung cancer. Notably, where single-gene analysis finds little similarity between two independent studies of patient survival in lung cancer, GSEA reveals many biological pathways in common. The GSEA method is embodied in a freely available software package, together with an initial database of 1,325 biologically defined gene sets.

...read moreread less

34,830 citations

Book•

Nonparametric Statistical Methods

[...]

Myles Hollander, Douglas A. Wolfe

01 Mar 1973

TL;DR: An ideal text for an upper-level undergraduate or first-year graduate course, Nonparametric Statistical Methods, Second Edition is also an invaluable source for professionals who want to keep abreast of the latest developments within this dynamic branch of modern statistics.

...read moreread less

Abstract: This Second Edition of Myles Hollander and Douglas A. Wolfe's successful Nonparametric Statistical Methods meets the needs of a new generation of users, with completely up-to-date coverage of this important statistical area. Like its predecessor, the revised edition, along with its companion ftp site, aims to equip readers with the conceptual and technical skills necessary to select and apply the appropriate procedures for a given situation. An extensive array of examples drawn from actual experiments illustrates clearly how to use nonparametric approaches to handle one- or two-sample location and dispersion problems, dichotomous data, and one-way and two-way layout problems. An ideal text for an upper-level undergraduate or first-year graduate course, Nonparametric Statistical Methods, Second Edition is also an invaluable source for professionals who want to keep abreast of the latest developments within this dynamic branch of modern statistics.

...read moreread less

7,240 citations

Journal Article•DOI•

NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins

[...]

Kim D. Pruitt¹, Tatiana Tatusova¹, Donna Maglott¹•Institutions (1)

National Institutes of Health¹

17 Dec 2004-Nucleic Acids Research

TL;DR: The National Center for Biotechnology Information Reference Sequence (RefSeq) database provides a non-redundant collection of sequences representing genomic data, transcripts and proteins that pragmatically includes sequence data that are currently publicly available in the archival databases.

...read moreread less

Abstract: The National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database (http://www.ncbi.nlm.nih.gov/RefSeq/) provides a non-redundant collection of sequences representing genomic data, transcripts and proteins. Although the goal is to provide a comprehensive dataset representing the complete sequence information for any given species, the database pragmatically includes sequence data that are currently publicly available in the archival databases. The database incorporates data from over 2400 organisms and includes over one million proteins representing significant taxonomic diversity spanning prokaryotes, eukaryotes and viruses. Nucleotide and protein sequences are explicitly linked, and the sequences are linked to other resources including the NCBI Map Viewer and Gene. Sequences are annotated to include coding regions, conserved domains, variation, references, names, database cross-references, and other features using a combined approach of collaboration and other input from the scientific community, automated annotation, propagation from GenBank and curation by NCBI staff.

...read moreread less

4,229 citations

"GeneTrail—advanced gene set enrichm..." refers background in this paper

...The current version of BNþþ integrates for example the following biological data sources: RefSeq ( 12 ), KEGG (13), TRANSPATH (14), TRANSFAC (15), DIP (16), MINT (17), HPRD (18), IntAct (19)....
[...]