Recovery of nearly 8,000 metagenome-assembled genomes substantially expands the tree of life
Donovan H. Parks,Christian Rinke,Maria Chuvochina,Pierre-Alain Chaumeil,Ben J. Woodcroft,Paul N. Evans,Philip Hugenholtz,Gene W. Tyson +7 more
TLDR
The recovery of 7,903 bacterial and archaeal metagenome-assembled genomes increases the phylogenetic diversity represented by public genome repositories and provides the first representatives from 20 candidate phyla.Abstract:
Challenges in cultivating microorganisms have limited the phylogenetic diversity of currently available microbial genomes. This is being addressed by advances in sequencing throughput and computational techniques that allow for the cultivation-independent recovery of genomes from metagenomes. Here, we report the reconstruction of 7,903 bacterial and archaeal genomes from >1,500 public metagenomes. All genomes are estimated to be ≥50% complete and nearly half are ≥90% complete with ≤5% contamination. These genomes increase the phylogenetic diversity of bacterial and archaeal genome trees by >30% and provide the first representatives of 17 bacterial and three archaeal candidate phyla. We also recovered 245 genomes from the Patescibacteria superphylum (also known as the Candidate Phyla Radiation) and find that the relative diversity of this group varies substantially with different protein marker sets. The scale and quality of this data set demonstrate that recovering genomes from metagenomes provides an expedient path forward to exploring microbial dark matter.read more
Citations
More filters
Journal Article
Fast Tree: Computing Large Minimum-Evolution Trees with Profiles instead of a Distance Matrix
TL;DR: FastTree as mentioned in this paper uses sequence profiles of internal nodes in the tree to implement neighbor-joining and uses heuristics to quickly identify candidate joins, then uses nearest-neighbor interchanges to reduce the length of the tree.
Journal ArticleDOI
High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries.
Chirag Jain,Luis M. Rodriguez-R,Adam M. Phillippy,Konstantinos T. Konstantinidis,Srinivas Aluru +4 more
TL;DR: FastANI is developed, a method to compute ANI using alignment-free approximate sequence mapping, and it is shown 95% ANI is an accurate threshold for demarcating prokaryotic species by analyzing about 90,000 proKaryotic genomes.
Journal ArticleDOI
A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life
Donovan H. Parks,Maria Chuvochina,David W. Waite,Christian Rinke,Adam Skarshewski,Pierre-Alain Chaumeil,Philip Hugenholtz +6 more
TL;DR: This work used a concatenated protein phylogeny as the basis for a bacterial taxonomy that conservatively removes polyphyletic groups and normalizes taxonomic ranks on the basis of relative evolutionary divergence.
Journal ArticleDOI
GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database
TL;DR: The accuracy of the GTDB-Tk taxonomic assignments is demonstrated by evaluating its performance on a phylogenetically diverse set of 10 156 bacterial and archaeal metagenome-assembled genomes.
Journal ArticleDOI
MetaBAT 2: an adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies.
Dongwan D. Kang,Feng Li,Edward Kirton,Ashleigh Thomas,Rob Egan,Hong An,Zhong Wang,Zhong Wang,Zhong Wang +8 more
TL;DR: Comparing MetaBAT 2 to alternative software tools on over 100 real world metagenome assemblies shows superior accuracy and computing speed, and recommends the community adopts Meta BAT 2 for their meetagenome binning experiments.
References
More filters
Journal ArticleDOI
Pfam: the protein families database.
Robert D. Finn,Alex Bateman,Jody Clements,Penelope Coggill,Ruth Y. Eberhardt,Sean R. Eddy,Andreas Heger,Kirstie Hetherington,Liisa Holm,Jaina Mistry,Erik L. L. Sonnhammer,John Tate,Marco Punta +12 more
TL;DR: Pfam as discussed by the authors is a widely used database of protein families, containing 14 831 manually curated entries in the current version, version 27.0, and has been updated several times since 2012.
Journal ArticleDOI
Prodigal: prokaryotic gene recognition and translation initiation site identification
Doug Hyatt,Doug Hyatt,Gwo Liang Chen,Philip F. LoCascio,Miriam Land,Frank W. Larimer,Frank W. Larimer,Loren Hauser +7 more
TL;DR: This work developed a new gene prediction algorithm called Prodigal (PROkaryotic DYnamic programming Gene-finding ALgorithm), which achieved good results compared to existing methods, and it is believed it will be a valuable asset to automated microbial annotation pipelines.
Journal ArticleDOI
ARB: a software environment for sequence data
Wolfgang Ludwig,Oliver Strunk,Ralf Westram,Lothar Richter,Harald Meier,Yadhukumar,Arno Buchner,Tina Lai,Susanne Steppi,Gangolf Jobb,Wolfram Förster,Igor Brettske,Stefan Gerber,Anton W. Ginhart,Oliver Gross,Silke Grumann,Stefan Hermann,Ralf Jost,Andreas König,Thomas Liss,Ralph Lüßmann,Michael May,Björn Nonhoff,Boris Reichel,Robert Strehlow,Alexandros Stamatakis,Norbert Stuckmann,Alexander Vilbig,Michael Lenke,Thomas Ludwig,Arndt Bode,Karl-Heinz Schleifer +31 more
TL;DR: The ARB program package comprises a variety of directly interacting software tools for sequence database maintenance and analysis which are controlled by a common graphical user interface.
Journal ArticleDOI
CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes
TL;DR: An objective measure of genome quality is proposed that can be used to select genomes suitable for specific gene- and genome-centric analyses of microbial communities and is shown to provide accurate estimates of genome completeness and contamination and to outperform existing approaches.
Book
Accelerated Profile HMM Searches
TL;DR: An acceleration heuristic for profile HMMs, the “multiple segment Viterbi” (MSV) algorithm, which computes an optimal sum of multiple ungapped local alignment segments using a striped vector-parallel approach previously described for fast Smith/Waterman alignment.