scispace - formally typeset
Open AccessJournal ArticleDOI

Recovery of nearly 8,000 metagenome-assembled genomes substantially expands the tree of life

TLDR
The recovery of 7,903 bacterial and archaeal metagenome-assembled genomes increases the phylogenetic diversity represented by public genome repositories and provides the first representatives from 20 candidate phyla.
Abstract
Challenges in cultivating microorganisms have limited the phylogenetic diversity of currently available microbial genomes. This is being addressed by advances in sequencing throughput and computational techniques that allow for the cultivation-independent recovery of genomes from metagenomes. Here, we report the reconstruction of 7,903 bacterial and archaeal genomes from >1,500 public metagenomes. All genomes are estimated to be ≥50% complete and nearly half are ≥90% complete with ≤5% contamination. These genomes increase the phylogenetic diversity of bacterial and archaeal genome trees by >30% and provide the first representatives of 17 bacterial and three archaeal candidate phyla. We also recovered 245 genomes from the Patescibacteria superphylum (also known as the Candidate Phyla Radiation) and find that the relative diversity of this group varies substantially with different protein marker sets. The scale and quality of this data set demonstrate that recovering genomes from metagenomes provides an expedient path forward to exploring microbial dark matter.

read more

Citations
More filters
Journal Article

Fast Tree: Computing Large Minimum-Evolution Trees with Profiles instead of a Distance Matrix

TL;DR: FastTree as mentioned in this paper uses sequence profiles of internal nodes in the tree to implement neighbor-joining and uses heuristics to quickly identify candidate joins, then uses nearest-neighbor interchanges to reduce the length of the tree.
Journal ArticleDOI

High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries.

TL;DR: FastANI is developed, a method to compute ANI using alignment-free approximate sequence mapping, and it is shown 95% ANI is an accurate threshold for demarcating prokaryotic species by analyzing about 90,000 proKaryotic genomes.
Journal ArticleDOI

A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life

TL;DR: This work used a concatenated protein phylogeny as the basis for a bacterial taxonomy that conservatively removes polyphyletic groups and normalizes taxonomic ranks on the basis of relative evolutionary divergence.
Journal ArticleDOI

GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database

TL;DR: The accuracy of the GTDB-Tk taxonomic assignments is demonstrated by evaluating its performance on a phylogenetically diverse set of 10 156 bacterial and archaeal metagenome-assembled genomes.
Journal ArticleDOI

MetaBAT 2: an adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies.

TL;DR: Comparing MetaBAT 2 to alternative software tools on over 100 real world metagenome assemblies shows superior accuracy and computing speed, and recommends the community adopts Meta BAT 2 for their meetagenome binning experiments.
References
More filters
Journal ArticleDOI

Pfam: the protein families database.

TL;DR: Pfam as discussed by the authors is a widely used database of protein families, containing 14 831 manually curated entries in the current version, version 27.0, and has been updated several times since 2012.
Journal ArticleDOI

Prodigal: prokaryotic gene recognition and translation initiation site identification

TL;DR: This work developed a new gene prediction algorithm called Prodigal (PROkaryotic DYnamic programming Gene-finding ALgorithm), which achieved good results compared to existing methods, and it is believed it will be a valuable asset to automated microbial annotation pipelines.
Journal ArticleDOI

CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes

TL;DR: An objective measure of genome quality is proposed that can be used to select genomes suitable for specific gene- and genome-centric analyses of microbial communities and is shown to provide accurate estimates of genome completeness and contamination and to outperform existing approaches.
Book

Accelerated Profile HMM Searches

TL;DR: An acceleration heuristic for profile HMMs, the “multiple segment Viterbi” (MSV) algorithm, which computes an optimal sum of multiple ungapped local alignment segments using a striped vector-parallel approach previously described for fast Smith/Waterman alignment.
Related Papers (5)