Fast Genome-Wide Functional Annotation through Orthology Assignment by eggNOG-Mapper
Jaime Huerta-Cepas,Kristoffer Forslund,Luis Pedro Coelho,Damian Szklarczyk,Damian Szklarczyk,Lars Juhl Jensen,Christian von Mering,Christian von Mering,Peer Bork +8 more
Reads0
Chats0
TLDR
EggNOG-mapper is developed, a tool for functional annotation of large sets of sequences based on fast orthology assignments using precomputed clusters and phylogenies from the eggNOG database, and scored within the top-5 methods in the three GO categories using the CAFA2 NK-partial benchmark.Abstract:
Orthology assignment is ideally suited for functional inference. However, because predicting orthology is computationally intensive at large scale, and most pipelines are relatively inaccessible (e.g., new assignments only available through database updates), less precise homology-based functional transfer is still the default for (meta-)genome annotation. We, therefore, developed eggNOG-mapper, a tool for functional annotation of large sets of sequences based on fast orthology assignments using precomputed clusters and phylogenies from the eggNOG database. To validate our method, we benchmarked Gene Ontology (GO) predictions against two widely used homology-based approaches: BLAST and InterProScan. Orthology filters applied to BLAST results reduced the rate of false positive assignments by 11%, and increased the ratio of experimentally validated terms recovered over all terms assigned per protein by 15%. Compared with InterProScan, eggNOG-mapper achieved similar proteome coverage and precision while predicting, on average, 41 more terms per protein and increasing the rate of experimentally validated terms recovered over total term assignments per protein by 35%. EggNOG-mapper predictions scored within the top-5 methods in the three GO categories using the CAFA2 NK-partial benchmark. Finally, we evaluated eggNOG-mapper for functional annotation of metagenomics data, yielding better performance than interProScan. eggNOG-mapper runs ∼15× faster than BLAST and at least 2.5× faster than InterProScan. The tool is available standalone and as an online service at http://eggnog-mapper.embl.de.read more
Citations
More filters
Journal ArticleDOI
eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses.
Jaime Huerta-Cepas,Damian Szklarczyk,Davide Heller,Ana Hernández-Plaza,Sofia K. Forslund,Helen Cook,Daniel R. Mende,Ivica Letunic,Thomas Rattei,Lars Juhl Jensen,Christian von Mering,Peer Bork +11 more
TL;DR: eggNOG as discussed by the authors is a public database of orthology relationships, gene evolutionary histories and functional annotations, with a major update of the underlying genome sets, which have been expanded to 4445 representative bacteria and 168 archaea derived from 25 038 genomes.
Journal ArticleDOI
Extensive Unexplored Human Microbiome Diversity Revealed by Over 150,000 Genomes from Metagenomes Spanning Age, Geography, and Lifestyle.
Edoardo Pasolli,Francesco Asnicar,Serena Manara,Moreno Zolfo,Nicolai Karcher,Federica Armanini,Francesco Beghini,Paolo Manghi,Adrian Tett,Paolo Ghensi,Maria Carmen Collado,Benjamin L. Rice,Casey DuLong,Xochitl C. Morgan,Christopher D. Golden,Christopher Quince,Curtis Huttenhower,Nicola Segata +17 more
TL;DR: Thousands of microbial genomes from yet-to-be-named species are identified, the pangenomes of human-associated microbes are expanded, and better exploitation of metagenomic technologies are allowed.
Posted ContentDOI
eggNOG-mapper v2: Functional Annotation, Orthology Assignments, and Domain Prediction at the Metagenomic Scale
Carlos Pérez Cantalapiedra,Ana Hernández-Plaza,Ivica Letunic,Peer Bork,Peer Bork,Jaime Huerta-Cepas +5 more
TL;DR: For example, EggNOG-mapper v2 as mentioned in this paper is a tool for functional annotation based on precomputed orthology assignments, optimized for vast (meta)genomic data sets.
Journal ArticleDOI
eggNOG-mapper v2: Functional Annotation, Orthology Assignments, and Domain Prediction at the Metagenomic Scale.
TL;DR: For example, EggNOG-mapper v2 as mentioned in this paper is a tool for functional annotation based on precomputed orthology assignments, optimized for vast (meta)genomic data sets.
Journal ArticleDOI
A unified catalog of 204,938 reference genomes from the human gut microbiome.
Alexandre Almeida,Alexandre Almeida,Stephen Nayfach,Stephen Nayfach,Miguel Boland,Francesco Strozzi,Martin Beracochea,Zhou Jason Shi,Katherine S. Pollard,Ekaterina A. Sakharova,Donovan H. Parks,Philip Hugenholtz,Nicola Segata,Nikos C. Kyrpides,Nikos C. Kyrpides,Robert D. Finn +15 more
TL;DR: The Unified Human Gastrointestinal Genome (UHGG) collection, comprising 204,938 nonredundant genomes from 4,644 gut prokaryotes, is presented, providing comprehensive resources for microbiome researchers.
References
More filters
Journal ArticleDOI
Pfam: the protein families database.
Robert D. Finn,Alex Bateman,Jody Clements,Penelope Coggill,Ruth Y. Eberhardt,Sean R. Eddy,Andreas Heger,Kirstie Hetherington,Liisa Holm,Jaina Mistry,Erik L. L. Sonnhammer,John Tate,Marco Punta +12 more
TL;DR: Pfam as discussed by the authors is a widely used database of protein families, containing 14 831 manually curated entries in the current version, version 27.0, and has been updated several times since 2012.
Journal ArticleDOI
Fast and sensitive protein alignment using DIAMOND
TL;DR: DIAMOND is introduced, an open-source algorithm based on double indexing that is 20,000 times faster than BLASTX on short reads and has a similar degree of sensitivity.
Journal ArticleDOI
InterProScan 5: genome-scale protein function classification
Philip Jones,David Binns,Hsin-Yu Chang,Matthew Fraser,Weizhong Li,Craig McAnulla,Hamish McWilliam,John Maslen,Alex L. Mitchell,Gift Nuka,Sebastien Pesseat,Antony F. Quinn,Amaia Sangrador-Vegas,Maxim Scheremetjew,Siew-Yit Yong,Rodrigo Lopez,Sarah Hunter +16 more
TL;DR: A new Java-based architecture for the widely used protein function prediction software package InterProScan is described, resulting in a flexible and stable system that is able to use both multiprocessor machines and/or conventional clusters to achieve scalable distributed data analysis.
Book
Accelerated Profile HMM Searches
TL;DR: An acceleration heuristic for profile HMMs, the “multiple segment Viterbi” (MSV) algorithm, which computes an optimal sum of multiple ungapped local alignment segments using a striped vector-parallel approach previously described for fast Smith/Waterman alignment.
Journal ArticleDOI
A genomic perspective on protein families
TL;DR: Comparison of proteins encoded in seven complete genomes from five major phylogenetic lineages and elucidation of consistent patterns of sequence similarities allowed the delineation of 720 clusters of orthologous groups (COGs), which comprise a framework for functional and evolutionary genome analysis.