scispace - formally typeset
Open AccessJournal ArticleDOI

Fast Genome-Wide Functional Annotation through Orthology Assignment by eggNOG-Mapper

Reads0
Chats0
TLDR
EggNOG-mapper is developed, a tool for functional annotation of large sets of sequences based on fast orthology assignments using precomputed clusters and phylogenies from the eggNOG database, and scored within the top-5 methods in the three GO categories using the CAFA2 NK-partial benchmark.
Abstract
Orthology assignment is ideally suited for functional inference. However, because predicting orthology is computationally intensive at large scale, and most pipelines are relatively inaccessible (e.g., new assignments only available through database updates), less precise homology-based functional transfer is still the default for (meta-)genome annotation. We, therefore, developed eggNOG-mapper, a tool for functional annotation of large sets of sequences based on fast orthology assignments using precomputed clusters and phylogenies from the eggNOG database. To validate our method, we benchmarked Gene Ontology (GO) predictions against two widely used homology-based approaches: BLAST and InterProScan. Orthology filters applied to BLAST results reduced the rate of false positive assignments by 11%, and increased the ratio of experimentally validated terms recovered over all terms assigned per protein by 15%. Compared with InterProScan, eggNOG-mapper achieved similar proteome coverage and precision while predicting, on average, 41 more terms per protein and increasing the rate of experimentally validated terms recovered over total term assignments per protein by 35%. EggNOG-mapper predictions scored within the top-5 methods in the three GO categories using the CAFA2 NK-partial benchmark. Finally, we evaluated eggNOG-mapper for functional annotation of metagenomics data, yielding better performance than interProScan. eggNOG-mapper runs ∼15× faster than BLAST and at least 2.5× faster than InterProScan. The tool is available standalone and as an online service at http://eggnog-mapper.embl.de.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses.

TL;DR: eggNOG as discussed by the authors is a public database of orthology relationships, gene evolutionary histories and functional annotations, with a major update of the underlying genome sets, which have been expanded to 4445 representative bacteria and 168 archaea derived from 25 038 genomes.
Posted ContentDOI

eggNOG-mapper v2: Functional Annotation, Orthology Assignments, and Domain Prediction at the Metagenomic Scale

TL;DR: For example, EggNOG-mapper v2 as mentioned in this paper is a tool for functional annotation based on precomputed orthology assignments, optimized for vast (meta)genomic data sets.
Journal ArticleDOI

eggNOG-mapper v2: Functional Annotation, Orthology Assignments, and Domain Prediction at the Metagenomic Scale.

TL;DR: For example, EggNOG-mapper v2 as mentioned in this paper is a tool for functional annotation based on precomputed orthology assignments, optimized for vast (meta)genomic data sets.
References
More filters
Journal ArticleDOI

Pfam: the protein families database.

TL;DR: Pfam as discussed by the authors is a widely used database of protein families, containing 14 831 manually curated entries in the current version, version 27.0, and has been updated several times since 2012.
Journal ArticleDOI

Fast and sensitive protein alignment using DIAMOND

TL;DR: DIAMOND is introduced, an open-source algorithm based on double indexing that is 20,000 times faster than BLASTX on short reads and has a similar degree of sensitivity.
Journal ArticleDOI

InterProScan 5: genome-scale protein function classification

TL;DR: A new Java-based architecture for the widely used protein function prediction software package InterProScan is described, resulting in a flexible and stable system that is able to use both multiprocessor machines and/or conventional clusters to achieve scalable distributed data analysis.
Book

Accelerated Profile HMM Searches

TL;DR: An acceleration heuristic for profile HMMs, the “multiple segment Viterbi” (MSV) algorithm, which computes an optimal sum of multiple ungapped local alignment segments using a striped vector-parallel approach previously described for fast Smith/Waterman alignment.
Journal ArticleDOI

A genomic perspective on protein families

TL;DR: Comparison of proteins encoded in seven complete genomes from five major phylogenetic lineages and elucidation of consistent patterns of sequence similarities allowed the delineation of 720 clusters of orthologous groups (COGs), which comprise a framework for functional and evolutionary genome analysis.
Related Papers (5)