CLMB: deep contrastive learning for robust metagenomic binning.

doi:10.1101/2021.11.15.468566

Open AccessPosted ContentDOI

CLMB: deep contrastive learning for robust metagenomic binning.

Pengfei Zhang, +5 more

- 15 Nov 2021 -

bioRxiv

Chats0

TLDR

Zhang et al. as mentioned in this paper proposed a deep contrastive learning framework for metagenome binning, which can efficiently eliminate the disturbance of noise and produce more stable and robust results.

Abstract:

The reconstruction of microbial genomes from large metagenomic datasets is a critical procedure for finding uncultivated microbial populations and defining their microbial functional roles. To achieve that, we need to perform metagenomic binning, clustering the assembled contigs into draft genomes. Despite the existing computational tools, most of them neglect one important property of the metagenomic data, that is, the noise. To further improve the metagenomic binning step and reconstruct better metagenomes, we propose a deep Contrastive Learning framework for Metagenome Binning (CLMB), which can efficiently eliminate the disturbance of noise and produce more stable and robust results. Essentially, instead of denoising the data explicitly, we add simulated noise to the training data and force the deep learning model to produce similar and stable representations for both the noise-free data and the distorted data. Consequently, the trained model will be robust to noise and handle it implicitly during usage. CLMB outperforms the previous state-of-the-art binning methods significantly, recovering the most near-complete genomes on almost all the benchmarking datasets (up to 17% more reconstructed genomes compared to the second-best method). It also improves the performance of bin refinement, reconstructing 8-22 more high-quality genomes and 15-32 more middle-quality genomes than the second-best result. Impressively, in addition to being compatible with the binning refiner, single CLMB even recovers on average 15 more HQ genomes than the refiner of VAMB and Maxbin on the benchmarking datasets. On a real mother-infant microbiome dataset with 110 samples, CLMB is scalable and practical to recover 365 high-quality and middle-quality genomes (including 21 new ones), providing insights into the microbiome transmission. CLMB is open-source and available at https://github.com/zpf0117b/CLMB/.

CLMB: deep contrastive learning for robust metagenomic binning.

Citations

conST: an interpretable multi-modal contrastive learning framework for spatial transcriptomics

HiC-LDNet: A general and robust deep learning framework for accurate chromatin loop detection in genome-wide contact maps

Environment and taxonomy shape the genomic signature of prokaryotic extremophiles

References

Adam: A Method for Stochastic Optimization

The Sequence Alignment/Map format and SAMtools

Fast and accurate short read alignment with Burrows–Wheeler transform

Visualizing Data using t-SNE

Auto-Encoding Variational Bayes

Related Papers (5)

A probabilistic approach to accurate abundance-based binning of metagenomic reads

MetaBAT 2: an adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies.

Low-density locality-sensitive hashing boosts metagenomic binning

MetaBinner: a high-performance and stand-alone ensemble binning method to recover individual genomes from complex microbial communities

SPHINX—an algorithm for taxonomic binning of metagenomic sequences