scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Metagenome-assembled genome binning methods with short reads disproportionately fail for plasmids and genomic Islands

01 Oct 2020-Vol. 6, Iss: 10
TL;DR: Short-read MAG approaches are largely ineffective for the analysis of mobile genes, including those of public-health importance, such as AMR and VF genes, and it is proposed that researchers should explore developing methods that optimize for this issue.
Abstract: Metagenomic methods enable the simultaneous characterization of microbial communities without time-consuming and bias-inducing culturing. Metagenome-assembled genome (MAG) binning methods aim to reassemble individual genomes from this data. However, the recovery of mobile genetic elements (MGEs), such as plasmids and genomic islands (GIs), by binning has not been well characterized. Given the association of antimicrobial resistance (AMR) genes and virulence factor (VF) genes with MGEs, studying their transmission is a public-health priority. The variable copy number and sequence composition of MGEs makes them potentially problematic for MAG binning methods. To systematically investigate this issue, we simulated a low-complexity metagenome comprising 30 GI-rich and plasmid-containing bacterial genomes. MAGs were then recovered using 12 current prediction pipelines and evaluated. While 82-94 % of chromosomes could be correctly recovered and binned, only 38-44 % of GIs and 1-29 % of plasmid sequences were found. Strikingly, no plasmid-borne VF nor AMR genes were recovered, and only 0-45 % of AMR or VF genes within GIs. We conclude that short-read MAG approaches, without further optimization, are largely ineffective for the analysis of mobile genes, including those of public-health importance, such as AMR and VF genes. We propose that researchers should explore developing methods that optimize for this issue and consider also using unassembled short reads and/or long-read approaches to more fully characterize metagenomic data.
Citations
More filters
Journal ArticleDOI
TL;DR: In this paper, a review on the lichen symbiosis in general and especially on the model species Lobaria pulmonaria L. Hoffm, which is a large foliose lichen that occurs worldwide on tree trunks in undisturbed forests with long ecological continuity.
Abstract: Lichens represent self-supporting symbioses, which occur in a wide range of terrestrial habitats and which contribute significantly to mineral cycling and energy flow at a global scale. Lichens usually grow much slower than higher plants. Nevertheless, lichens can contribute substantially to biomass production. This review focuses on the lichen symbiosis in general and especially on the model species Lobaria pulmonaria L. Hoffm., which is a large foliose lichen that occurs worldwide on tree trunks in undisturbed forests with long ecological continuity. In comparison to many other lichens, L. pulmonaria is less tolerant to desiccation and highly sensitive to air pollution. The name-giving mycobiont (belonging to the Ascomycota), provides a protective layer covering a layer of the green-algal photobiont (Dictyochloropsis reticulata) and interspersed cyanobacterial cell clusters (Nostoc spec.). Recently performed metaproteome analyses confirm the partition of functions in lichen partnerships. The ample functional diversity of the mycobiont contrasts the predominant function of the photobiont in production (and secretion) of energy-rich carbohydrates, and the cyanobiont’s contribution by nitrogen fixation. In addition, high throughput and state-of-the-art metagenomics and community fingerprinting, metatranscriptomics, and MS-based metaproteomics identify the bacterial community present on L. pulmonaria as a surprisingly abundant and structurally integrated element of the lichen symbiosis. Comparative metaproteome analyses of lichens from different sampling sites suggest the presence of a relatively stable core microbiome and a sampling site-specific portion of the microbiome. Moreover, these studies indicate how the microbiota may contribute to the symbiotic system, to improve its health, growth and fitness.

63 citations

Journal ArticleDOI
03 Nov 2020
TL;DR: The gut microbiomes of a large set of mostly wild animal species consisting of mammals, birds, reptiles, amphibians, and fish are studied to generate 5,596 metagenome-assembled genomes and gene data sets that greatly expand the microbial genome repertoire and provide a broad view of microbial adaptations to the vertebrate gut.
Abstract: Large-scale metagenome assemblies of human microbiomes have produced a vast catalogue of previously unseen microbial genomes; however, comparatively few microbial genomes derive from other vertebrates. Here, we generated 5,596 metagenome-assembled genomes (MAGs) from the gut metagenomes of 180 predominantly wild animal species representing 5 classes, in addition to 14 existing animal gut metagenome data sets. The MAGs comprised 1,522 species-level genome bins (SGBs), most of which were novel at the species, genus, or family level, and the majority were enriched in host versus environment metagenomes. Many traits distinguished SGBs enriched in host or environmental biomes, including the number of antimicrobial resistance genes. We identified 1,986 diverse biosynthetic gene clusters; only 23 clustered with any MIBiG database references. Gene-based assembly revealed tremendous gene diversity, much of it host or environment specific. Our MAG and gene data sets greatly expand the microbial genome repertoire and provide a broad view of microbial adaptations to the vertebrate gut. IMPORTANCE Microbiome studies on a select few mammalian species (e.g., humans, mice, and cattle) have revealed a great deal of novel genomic diversity in the gut microbiome. However, little is known of the microbial diversity in the gut of other vertebrates. We studied the gut microbiomes of a large set of mostly wild animal species consisting of mammals, birds, reptiles, amphibians, and fish. Unfortunately, we found that existing reference databases commonly used for metagenomic analyses failed to capture the microbiome diversity among vertebrates. To increase database representation, we applied advanced metagenome assembly methods to our animal gut data and to many public gut metagenome data sets that had not been used to obtain microbial genomes. Our resulting genome and gene cluster collections comprised a great deal of novel taxonomic and genomic diversity, which we extensively characterized. Our findings substantially expand what is known of microbial genomic diversity in the vertebrate gut.

36 citations


Cites background from "Metagenome-assembled genome binning..."

  • ...Still, researchers who may utilize this set of MAGs should use caution when analyzing individual single nucleotide polymorphisms (SNPs), plasmids, genomic islands, or other potentially missing or misassembled genomic features (26)....

    [...]

Posted ContentDOI
05 Jun 2020-bioRxiv
TL;DR: The gut microbiome of a large set of mostly wild animal species consisting of mammals, birds, reptiles, amphibians, and fish is studied to find that existing reference databases commonly used for metagenomic analyses failed to capture the microbiome diversity among vertebrates.
Abstract: Large-scale metagenome assemblies of human microbiomes have produced a vast catalogue of previously unseen microbial genomes; however, comparatively few microbial genomes derive from other vertebrates. Here, we generated 4374 metagenome assembled genomes (MAGs) from gut samples of 180 predominantly wild animal species representing 5 classes. Combined with existing datasets, we produced 5596 non-redundant, quality MAGs and 1522 species-level genome bins (SGBs). Most SGBs were novel at the species, genus, or family levels, and the majority were enriched in host versus environment metagenomes. Many traits distinguished SGBs enriched in host or environmental biomes, including the number of antimicrobial resistance genes. We identified 1986 diverse and largely novel biosynthetic gene clusters. Gene-based assembly revealed tremendous gene diversity, much of it host or environment specific. Our MAG and gene datasets greatly expand the microbial genome repertoire and provide a broad view of microbial adaptations to life within a living host.

33 citations

Posted ContentDOI
28 Oct 2021-bioRxiv
TL;DR: In this article, the authors evaluated the use of different combinations of short (Illumina) and long-read technologies (Nanopore R9.4, R10.3, and PacBio CCS) for recovering high-quality metagenome assembled genomes (HQ MAGs) from a complex microbial community (anaerobic digester).
Abstract: Short-read DNA sequencing has led to a massive growth of genome databases but mainly with highly fragmented metagenome assembled genomes from environmental systems. The fragmentation is a result of closely related species, strains, and genome repeats that cannot be resolved with short reads. To confidently explore the functional potential of a microbial community, high-quality reference genomes are needed. In this study, we evaluated the use of different combinations of short (Illumina) and long-read technologies (Nanopore R9.4, R10.3, and PacBio CCS) for recovering high-quality metagenome assembled genomes (HQ MAGs) from a complex microbial community (anaerobic digester). Depending on the sequencing approach, 33 to 86 HQ MAGs (encompassing up to 34 % of the assembly and 49 % of the reads) were recovered using long reads, with Nanopore R9 featuring the lowest sequencing costs per HQ MAG recovered. PacBio CCS was also found to be an effective platform for genome-centric metagenomics (74 HQ MAGs) and produced HQ MAGs with the lowest fragmentation (median of 9 contigs) as a stand-alone technology. Using PacBio CCS MAGs as reference, we show that, although a high number of high-quality MAGs can be generated using Nanopore R9, systematic indel errors are still present, which can lead to truncated gene calling. However, polishing the Nanopore MAGs with short-read Illumina data, enabled recovery of MAGs with similar quality as MAGs from PacBio CCS.

31 citations

Journal ArticleDOI
TL;DR: The evidence for the role of commensal gut microbes in encoding antimicrobial resistance genes, the degree to which they are shared both with other commensals and with pathogens, and the host and environmental factors that can impact resistome dynamics are assessed.
Abstract: ABSTRACT A global rise in antimicrobial resistance among pathogenic bacteria has proved to be a major public health threat, with the rate of multidrug-resistant bacterial infections increasing over time. The gut microbiome has been studied as a reservoir of antibiotic resistance genes (ARGs) that can be transferred to bacterial pathogens via horizontal gene transfer (HGT) of conjugative plasmids and mobile genetic elements (the gut resistome). Advances in metagenomic sequencing have facilitated the identification of resistome modulators, including live microbial therapeutics such as probiotics and fecal microbiome transplantation that can either expand or reduce the abundances of ARG-carrying bacteria in the gut. While many different gut microbes encode for ARGs, they are not uniformly distributed across, or transmitted by, various members of the microbiome, and not all are of equal clinical relevance. Both experimental and theoretical approaches in microbial ecology have been applied to understand differing frequencies of ARG horizontal transfer between commensal microbes as well as between commensals and pathogens. In this commentary, we assess the evidence for the role of commensal gut microbes in encoding antimicrobial resistance genes, the degree to which they are shared both with other commensals and with pathogens, and the host and environmental factors that can impact resistome dynamics. We further discuss novel sequencing-based approaches for identifying ARGs and predicting future transfer events of clinically relevant ARGs from commensals to pathogens.

26 citations

References
More filters
Journal ArticleDOI
TL;DR: Bowtie 2 combines the strengths of the full-text minute index with the flexibility and speed of hardware-accelerated dynamic programming algorithms to achieve a combination of high speed, sensitivity and accuracy.
Abstract: As the rate of sequencing increases, greater throughput is demanded from read aligners. The full-text minute index is often used to make alignment very fast and memory-efficient, but the approach is ill-suited to finding longer, gapped alignments. Bowtie 2 combines the strengths of the full-text minute index with the flexibility and speed of hardware-accelerated dynamic programming algorithms to achieve a combination of high speed, sensitivity and accuracy.

37,898 citations

Journal ArticleDOI
TL;DR: It is shown that a combination of hill-climbing approaches and a stochastic perturbation method can be time-efficiently implemented and found higher likelihoods between 62.2% and 87.1% of the studied alignments, thus efficiently exploring the tree-space.
Abstract: Large phylogenomics data sets require fast tree inference methods, especially for maximum-likelihood (ML) phylogenies. Fast programs exist, but due to inherent heuristics to find optimal trees, it is not clear whether the best tree is found. Thus, there is need for additional approaches that employ different search strategies to find ML trees and that are at the same time as fast as currently available ML programs. We show that a combination of hill-climbing approaches and a stochastic perturbation method can be time-efficiently implemented. If we allow the same CPU time as RAxML and PhyML, then our software IQ-TREE found higher likelihoods between 62.2% and 87.1% of the studied alignments, thus efficiently exploring the tree-space. If we use the IQ-TREE stopping rule, RAxML and PhyML are faster in 75.7% and 47.1% of the DNA alignments and 42.2% and 100% of the protein alignments, respectively. However, the range of obtaining higher likelihoods with IQ-TREE improves to 73.3-97.1%. IQ-TREE is freely available at http://www.cibiv.at/software/iqtree.

13,668 citations

Journal ArticleDOI
TL;DR: The new BLAST command-line applications, compared to the current BLAST tools, demonstrate substantial speed improvements for long queries as well as chromosome length database sequences.
Abstract: Sequence similarity searching is a very important bioinformatics task. While Basic Local Alignment Search Tool (BLAST) outperforms exact methods through its use of heuristics, the speed of the current BLAST software is suboptimal for very long queries or database sequences. There are also some shortcomings in the user-interface of the current command-line applications. We describe features and improvements of rewritten BLAST software and introduce new command-line applications. Long query sequences are broken into chunks for processing, in some cases leading to dramatically shorter run times. For long database sequences, it is possible to retrieve only the relevant parts of the sequence, reducing CPU time and memory usage for searches of short queries against databases of contigs or chromosomes. The program can now retrieve masking information for database sequences from the BLAST databases. A new modular software library can now access subject sequence data from arbitrary data sources. We introduce several new features, including strategy files that allow a user to save and reuse their favorite set of options. The strategy files can be uploaded to and downloaded from the NCBI BLAST web site. The new BLAST command-line applications, compared to the current BLAST tools, demonstrate substantial speed improvements for long queries as well as chromosome length database sequences. We have also improved the user interface of the command-line applications.

13,223 citations

Journal ArticleDOI
TL;DR: Zdobnov et al. as discussed by the authors proposed a measure for quantitative assessment of genome assembly and annotation completeness based on evolutionarily informed expectations of gene content, and implemented the assessment procedure in open-source software, with sets of Benchmarking Universal Single-Copy Orthologs.
Abstract: Motivation Genomics has revolutionized biological research, but quality assessment of the resulting assembled sequences is complicated and remains mostly limited to technical measures like N50. Results We propose a measure for quantitative assessment of genome assembly and annotation completeness based on evolutionarily informed expectations of gene content. We implemented the assessment procedure in open-source software, with sets of Benchmarking Universal Single-Copy Orthologs, named BUSCO. Availability and implementation Software implemented in Python and datasets available for download from http://busco.ezlab.org. Contact evgeny.zdobnov@unige.ch Supplementary information Supplementary data are available at Bioinformatics online.

7,747 citations

Journal ArticleDOI
TL;DR: DIAMOND is introduced, an open-source algorithm based on double indexing that is 20,000 times faster than BLASTX on short reads and has a similar degree of sensitivity.
Abstract: The alignment of sequencing reads against a protein reference database is a major computational bottleneck in metagenomics and data-intensive evolutionary projects. Although recent tools offer improved performance over the gold standard BLASTX, they exhibit only a modest speedup or low sensitivity. We introduce DIAMOND, an open-source algorithm based on double indexing that is 20,000 times faster than BLASTX on short reads and has a similar degree of sensitivity.

7,164 citations