Integrating taxonomic, functional, and strain-level profiling of diverse microbial communities with bioBakery 3
read more
Citations
Intestinal Akkermansia muciniphila predicts clinical response to PD-1 blockade in patients with advanced non-small-cell lung cancer
Nivolumab plus ipilimumab with or without live bacterial supplementation in metastatic renal cell carcinoma: a randomized phase 1 trial
Cross-cohort gut microbiome associations with immune checkpoint inhibitor response in advanced melanoma
Targeted suppression of human IBD-associated gut microbiota commensals by phage consortia for treatment of intestinal inflammation
References
Basic Local Alignment Search Tool
Random Forests
Trimmomatic: a flexible trimmer for Illumina sequence data
Fast gapped-read alignment with Bowtie 2
Gene Ontology: tool for the unification of biology
Related Papers (5)
Frequently Asked Questions (13)
Q2. What have the authors stated for future works in "Integrating taxonomic, functional, and strain-level profiling of diverse microbial communities with biobakery 3" ?
The bioBakery 3 begins to overcome this challenge by combining a greatly expanded set of reference sequences with ways of “ falling back ” gracefully when encountering new sequences, while also paving the way for further integration of assembly-based discovery in the future ( discussed below ). The authors thus anticipate improved integration of reference- and assembly-based meta-omic analyses to be one of the main areas of future development for the bioBakery, along with expanded methods for other types of multi-omics in addition to transcription. In addition to making a novel sub-species phylogenetic and biogeographic structure apparent, the combination of MetaPhlAn, HUMAnN, PanPhlAn, StrainPhlAn, and PhyloPhlAn together confirmed that most R. bromii strains are “ personal ” ( i. e. specific to and retained within individuals, like most microbiome members ), rarely transmissible across hosts, and that genomic differences characterize each subspecies ( suggesting a degree of functional adaptation and specialization ). These components of metagenomes - and, for RNA viruses, metatranscriptomes - are often measured with surprising heterogeneity during the initial generation of sequencing data themselves ( Zolfo et al., 2019 ), suggesting necessary improvements in analytical quality control and normalization as well.
Q3. What is the way to improve the quality of the read mapping?
To further improve the quality of the read mapping, the authors adopted quality controls before and after mapping by discarding low-quality sequences and alignments (reads shorter than 70bp and alignment with a MAPQ value less than 5).
Q4. What is the default for consensus markers?
By default, consensus markers reconstructed with less than 8 reads or with a breadth of coverage (i.e. fraction of the marker covered by reads) lower than 80% are discarded (“--breadth_threshold” parameter).
Q5. What is the main area of improvement for the bioBakery?
A final area of improvement for the bioBakery, relatedly, is the increased integration between reference-based and assembly-based approaches - begun here via PhyloPhlAn 3 - in order to better leverage MAGs (Almeida et al., 2020), SGBs (Pasolli et al., 2019), and novel gene families.
Q6. What is the significance of the inclusion of DNA abundance in the above model?
The inclusion of DNA abundance as a covariate in the above model accounts for the strong dependence between a function’s gene (metagenomic) copy number and its metatranscriptomic abundance.
Q7. How did the authors construct additional synthetic metagenomes?
The authors constructed additional synthetic metagenomes by sampling sequencing reads from curated microbial genome sets using ART (Huang et al., 2012) with an Illumina HiSeq 2500 error model.
Q8. What is the main goal of the bioBakery?
Feedback on any aspect of the methods or their applications in diverse host-associated or environmental microbiome settings can be submitted at https://forum.biobakery.org, and the authors hope the bioBakery will continue to provide a flexible, convenient, reproducible, and accurate discovery platform for microbial community biology.
Q9. How did the authors identify the representative of each pan-proteome?
To obtain a nucleotide representation of each pan-proteome, the authors identified a representative of the cluster for each pan-protein by selecting a UniProtKB or UniParc entry taxonomically assigned to the desired species.
Q10. How did the RFs perform in the IBD dataset?
As in previous studies (Pasolli et al., 2016; Thomas et al., 2019), RFs using functional features performed similarly (0.69 Cross Validation and 0.71 LODO ROC AUC on pathways relative abundance), indicating a tight link between strain-specific taxonomy and gene carriage in this setting.
Q11. What is the reason for the absence of biomarkers for active UC?
The relative absence of biomarkers for active UC may result both from its generally more benign phenotype (Lloyd-Price et al., 2019) and from the smaller number of active UC samples (n=23) compared with active CD samples (n=76); as a result, the authors focused their subsequent analyses on expression differences within the CD subcohort.
Q12. What is the common way to filter low-quality taxonomic annotations?
the regular expressions used to filter low-quality taxonomic annotations are:“ (C|c)andidat(e|us) | _sp(_.*|$) | (.*_|^)(b|B)acterium(_.*|) | .*(eury|)archaeo(n_|te|n$).* | .*(endo|)symbiont.* | .*genomosp_.* | .*unidentified.* | .*_bacteria_.* | .*_taxon_.* | .*_et_al_.* | .*_and_.* | .*(cyano|proteo|actino)bacterium_.*)
Q13. What are the available synthetic metagenomes and gold standards?
Human and murine synthetic metagenomes and gold standards provided by the CAMI Challenge are available at https://data.cami-challenge.org/participate.Non-human synthetic metagenomes and gold standards are available at http://segatalab.cibio.unitn.it/tools/biobakery/.