scispace - formally typeset
Search or ask a question
Journal ArticleDOI

GOrilla: a tool for discovery and visualization of enriched GO terms in ranked gene lists

TL;DR: GOrilla is a web-based application that identifies enriched GO terms in ranked lists of genes, without requiring the user to provide explicit target and background sets, and its unique features and advantages over other threshold free enrichment tools include rigorous statistics, fast running time and an effective graphical representation.
Abstract: Since the inception of the GO annotation project, a variety of tools have been developed that support exploring and searching the GO database In particular, a variety of tools that perform GO enrichment analysis are currently available Most of these tools require as input a target set of genes and a background set and seek enrichment in the target set compared to the background set A few tools also exist that support analyzing ranked lists The latter typically rely on simulations or on union-bound correction for assigning statistical significance to the results GOrilla is a web-based application that identifies enriched GO terms in ranked lists of genes, without requiring the user to provide explicit target and background sets This is particularly useful in many typical cases where genomic data may be naturally represented as a ranked list of genes (eg by level of expression or of differential expression) GOrilla employs a flexible threshold statistical approach to discover GO terms that are significantly enriched at the top of a ranked gene list Building on a complete theoretical characterization of the underlying distribution, called mHG, GOrilla computes an exact p-value for the observed enrichment, taking threshold multiple testing into account without the need for simulations This enables rigorous statistical analysis of thousand of genes and thousands of GO terms in order of seconds The output of the enrichment analysis is visualized as a hierarchical structure, providing a clear view of the relations between enriched GO terms GOrilla is an efficient GO analysis tool with unique features that make a useful addition to the existing repertoire of GO enrichment tools GOrilla's unique features and advantages over other threshold free enrichment tools include rigorous statistics, fast running time and an effective graphical representation GOrilla is publicly available at: http://cbl-gorillacstechnionacil
Citations
More filters
Journal ArticleDOI
18 Jul 2011-PLOS ONE
TL;DR: REVIGO is a Web server that summarizes long, unintelligible lists of GO terms by finding a representative subset of the terms using a simple clustering algorithm that relies on semantic similarity measures.
Abstract: Outcomes of high-throughput biological experiments are typically interpreted by statistical testing for enriched gene functional categories defined by the Gene Ontology (GO). The resulting lists of GO terms may be large and highly redundant, and thus difficult to interpret. REVIGO is a Web server that summarizes long, unintelligible lists of GO terms by finding a representative subset of the terms using a simple clustering algorithm that relies on semantic similarity measures. Furthermore, REVIGO visualizes this non-redundant GO term set in multiple ways to assist in interpretation: multidimensional scaling and graph-based visualizations accurately render the subdivisions and the semantic relationships in the data, while treemaps and tag clouds are also offered as alternative views. REVIGO is freely available at http://revigo.irb.hr/.

4,919 citations

Journal ArticleDOI
06 Sep 2012-Nature
TL;DR: The first extensive map of human DHSs identified through genome-wide profiling in 125 diverse cell and tissue types is presented, revealing novel relationships between chromatin accessibility, transcription, DNA methylation and regulatory factor occupancy patterns.
Abstract: DNase I hypersensitive sites (DHSs) are markers of regulatory DNA and have underpinned the discovery of all classes of cis-regulatory elements including enhancers, promoters, insulators, silencers and locus control regions. Here we present the first extensive map of human DHSs identified through genome-wide profiling in 125 diverse cell and tissue types. We identify ∼2.9 million DHSs that encompass virtually all known experimentally validated cis-regulatory sequences and expose a vast trove of novel elements, most with highly cell-selective regulation. Annotating these elements using ENCODE data reveals novel relationships between chromatin accessibility, transcription, DNA methylation and regulatory factor occupancy patterns. We connect ∼580,000 distal DHSs with their target promoters, revealing systematic pairing of different classes of distal DHSs and specific promoter types. Patterning of chromatin accessibility at many regulatory regions is organized with dozens to hundreds of co-activated elements, and the transcellular DNase I sensitivity pattern at a given region can predict cell-type-specific functional behaviours. The DHS landscape shows signatures of recent functional evolutionary constraint. However, the DHS compartment in pluripotent and immortalized cells exhibits higher mutation rates than that in highly differentiated cells, exposing an unexpected link between chromatin accessibility, proliferative potential and patterns of human variation. An extensive map of human DNase I hypersensitive sites, markers of regulatory DNA, in 125 diverse cell and tissue types is described; integration of this information with other ENCODE-generated data sets identifies new relationships between chromatin accessibility, transcription, DNA methylation and regulatory factor occupancy patterns. This paper describes the first extensive map of human DNaseI hypersensitive sites — markers of regulatory DNA — in 125 diverse cell and tissue types. Integration of this information with other data sets generated by ENCODE (Encyclopedia of DNA Elements) identified new relationships between chromatin accessibility, transcription, DNA methylation and regulatory-factor occupancy patterns. Evolutionary-conservation analysis revealed signatures of recent functional constraint within DNaseI hypersensitive sites.

2,628 citations

Journal ArticleDOI
15 Jan 2015-Cell
TL;DR: The genetic findings provide evidence for immunoediting in tumors and uncover mechanisms of tumor-intrinsic resistance to cytolytic activity, suggesting immune-mediated elimination.

2,600 citations

Journal ArticleDOI
TL;DR: It is determined that short-chain fatty acids (SCFA), microbiota-derived bacterial fermentation products, regulated microglia homeostasis and mice deficient for the SCFA receptor FFAR2 mirroredmicroglia defects found under GF conditions, suggesting that host bacteria vitally regulate microglian maturation and function.
Abstract: As the tissue macrophages of the CNS, microglia are critically involved in diseases of the CNS. However, it remains unknown what controls their maturation and activation under homeostatic conditions. We observed substantial contributions of the host microbiota to microglia homeostasis, as germ-free (GF) mice displayed global defects in microglia with altered cell proportions and an immature phenotype, leading to impaired innate immune responses. Temporal eradication of host microbiota severely changed microglia properties. Limited microbiota complexity also resulted in defective microglia. In contrast, recolonization with a complex microbiota partially restored microglia features. We determined that short-chain fatty acids (SCFA), microbiota-derived bacterial fermentation products, regulated microglia homeostasis. Accordingly, mice deficient for the SCFA receptor FFAR2 mirrored microglia defects found under GF conditions. These findings suggest that host bacteria vitally regulate microglia maturation and function, whereas microglia impairment can be rectified to some extent by complex microbiota.

2,096 citations

References
More filters
Journal ArticleDOI
TL;DR: The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing.
Abstract: Genomic sequencing has made it clear that a large fraction of the genes specifying the core biological functions are shared by all eukaryotes. Knowledge of the biological role of such shared proteins in one organism can often be transferred to other organisms. The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing. To this end, three independent ontologies accessible on the World-Wide Web (http://www.geneontology.org) are being constructed: biological process, molecular function and cellular component.

35,225 citations

Journal ArticleDOI
TL;DR: The Gene Set Enrichment Analysis (GSEA) method as discussed by the authors focuses on gene sets, that is, groups of genes that share common biological function, chromosomal location, or regulation.
Abstract: Although genomewide RNA expression analysis has become a routine tool in biomedical research, extracting biological insight from such information remains a major challenge. Here, we describe a powerful analytical method called Gene Set Enrichment Analysis (GSEA) for interpreting gene expression data. The method derives its power by focusing on gene sets, that is, groups of genes that share common biological function, chromosomal location, or regulation. We demonstrate how GSEA yields insights into several cancer-related data sets, including leukemia and lung cancer. Notably, where single-gene analysis finds little similarity between two independent studies of patient survival in lung cancer, GSEA reveals many biological pathways in common. The GSEA method is embodied in a freely available software package, together with an initial database of 1,325 biologically defined gene sets.

34,830 citations

Journal ArticleDOI
31 Jan 2002-Nature
TL;DR: DNA microarray analysis on primary breast tumours of 117 young patients is used and supervised classification is applied to identify a gene expression signature strongly predictive of a short interval to distant metastases (‘poor prognosis’ signature) in patients without tumour cells in local lymph nodes at diagnosis, providing a strategy to select patients who would benefit from adjuvant therapy.
Abstract: Breast cancer patients with the same stage of disease can have markedly different treatment responses and overall outcome. The strongest predictors for metastases (for example, lymph node status and histological grade) fail to classify accurately breast tumours according to their clinical behaviour. Chemotherapy or hormonal therapy reduces the risk of distant metastases by approximately one-third; however, 70-80% of patients receiving this treatment would have survived without it. None of the signatures of breast cancer gene expression reported to date allow for patient-tailored therapy strategies. Here we used DNA microarray analysis on primary breast tumours of 117 young patients, and applied supervised classification to identify a gene expression signature strongly predictive of a short interval to distant metastases ('poor prognosis' signature) in patients without tumour cells in local lymph nodes at diagnosis (lymph node negative). In addition, we established a signature that identifies tumours of BRCA1 carriers. The poor prognosis signature consists of genes regulating cell cycle, invasion, metastasis and angiogenesis. This gene expression profile will outperform all currently used clinical parameters in predicting disease outcome. Our findings provide a strategy to select patients who would benefit from adjuvant therapy.

9,664 citations


"GOrilla: a tool for discovery and v..." refers background in this paper

  • ...breast cancer dataset [21], which is a landmark study in clinical use of gene expression data....

    [...]

Journal ArticleDOI
TL;DR: DAMID is a web-accessible program that integrates functional genomic annotations with intuitive graphical summaries that assists in the interpretation of genome-scale datasets by facilitating the transition from data collection to biological meaning.
Abstract: The distributed nature of biological knowledge poses a major challenge to the interpretation of genome-scale datasets, including those derived from microarray and proteomic studies. This report describes DAVID, a web-accessible program that integrates functional genomic annotations with intuitive graphical summaries. Lists of gene or protein identifiers are rapidly annotated and summarized according to shared categorical data for Gene Ontology, protein domain, and biochemical pathway membership. DAVID assists in the interpretation of genome-scale datasets by facilitating the transition from data collection to biological meaning.

8,849 citations

Journal ArticleDOI
TL;DR: The Biological Networks Gene Ontology tool (BiNGO) is an open-source Java tool to determine whichGene Ontology terms are significantly overrepresented in a set of genes.
Abstract: Summary: The Biological Networks Gene Ontology tool (BiNGO) is an open-source Java tool to determine which Gene Ontology (GO) terms are significantly overrepresented in a set of genes. BiNGO can be used either on a list of genes, pasted as text, or interactively on subgraphs of biological networks visualized in Cytoscape. BiNGO maps the predominant functional themes of the tested gene set on the GO hierarchy, and takes advantage of Cytoscape's versatile visualization environment to produce an intuitive and customizable visual representation of the results. Availability: http://www.psb.ugent.be/cbd/papers/BiNGO/ Contact: martin.kuiper@psb.ugent.be

3,884 citations


"GOrilla: a tool for discovery and v..." refers methods in this paper

  • ...A large repertoire of tools for enrichment analysis has been developed in recent years, including GoMiner [3], FatiGO [4], BiNGO [5], GOAT [6], DAVID [7] and others....

    [...]

  • ...A few tools visualize the results of enrichment analysis in the DAG structure, including the downloadable version of GoMiner [3], the CytoScape plug-in BiNGO [5], GOLEM [8], GOEAST [9] and GOTM [10]....

    [...]