scispace - formally typeset
Search or ask a question

Showing papers by "Liqing Zhang published in 2020"


Journal ArticleDOI
Min Oh1, Liqing Zhang1
TL;DR: This work proposes DeepMicro, a deep representation learning framework allowing for an effective representation of microbiome profiles that outperforms the current best approaches based on the strain-level marker profile in five different datasets in disease prediction.
Abstract: Human microbiota plays a key role in human health and growing evidence supports the potential use of microbiome as a predictor of various diseases. However, the high-dimensionality of microbiome data, often in the order of hundreds of thousands, yet low sample sizes, poses great challenge for machine learning-based prediction algorithms. This imbalance induces the data to be highly sparse, preventing from learning a better prediction model. Also, there has been little work on deep learning applications to microbiome data with a rigorous evaluation scheme. To address these challenges, we propose DeepMicro, a deep representation learning framework allowing for an effective representation of microbiome profiles. DeepMicro successfully transforms high-dimensional microbiome data into a robust low-dimensional representation using various autoencoders and applies machine learning classification algorithms on the learned representation. In disease prediction, DeepMicro outperforms the current best approaches based on the strain-level marker profile in five different datasets. In addition, by significantly reducing the dimensionality of the marker profile, DeepMicro accelerates the model training and hyperparameter optimization procedure with 8X-30X speedup over the basic approach. DeepMicro is freely available at https://github.com/minoh0201/DeepMicro.

59 citations


Posted ContentDOI
Min Oh1, Liqing Zhang1
10 Mar 2020-bioRxiv
TL;DR: This work proposes DeepMicro, a deep representation learning framework allowing for an effective representation of microbiome profiles that outperforms the current best approaches based on the strain-level marker profile in five different datasets in disease prediction.
Abstract: Human microbiota plays a key role in human health and growing evidence supports the potential use of microbiome as a predictor of various diseases. However, the high-dimensionality of microbiome data, often in the order of hundreds of thousands, yet low sample sizes, poses great challenge for machine learning-based prediction algorithms. This imbalance induces the data to be highly sparse, preventing from learning a better prediction model. Also, there has been little work on deep learning applications to microbiome data with a rigorous evaluation scheme. To address these challenges, we propose DeepMicro, a deep representation learning framework allowing for an effective representation of microbiome profiles. DeepMicro successfully transforms high-dimensional microbiome data into a robust low-dimensional representation using various autoencoders and applies machine learning classification algorithms on the learned representation. In disease prediction, DeepMicro outperforms the current best approaches based on the strain-level marker profile in five different datasets. In addition, by significantly reducing the dimensionality of the marker profile, DeepMicro accelerates the model training and hyperparameter optimization procedure with 8X-30X speedup over the basic approach. DeepMicro is freely available at https://github.com/minoh0201/DeepMicro.

36 citations


Journal ArticleDOI
TL;DR: A new web-based curation system that supports the annotation and inspection of several key attributes of potential ARGs, including gene name, antibiotic category, resistance mechanism, evidence for mobility and occurrence in clinically-important bacterial strains is proposed.
Abstract: Curation of antibiotic resistance gene (ARG) databases is a labor-intensive process that requires expert knowledge to manually collect, correct, and/or annotate individual genes. Correspondingly, updates to existing databases tend to be infrequent, commonly requiring years for completion and often containing inconsistences. Further, because of limitations of manual curation, most existing ARG databases contain only a small proportion of known ARGs (~5k genes). A new approach is needed to achieve a truly comprehensive ARG database, while also maintaining a high level of accuracy. Here we propose a new web-based curation system, ARG-miner, which supports annotation of ARGs at multiple levels, including: gene name, antibiotic category, resistance mechanism, and evidence for mobility and occurrence in clinically-important bacterial strains. To overcome limitations of manual curation, we employ crowdsourcing as a novel strategy for expanding curation capacity towards achieving a truly comprehensive, up-to-date database. We develop and validate the approach by comparing performance of multiple cohorts of curators with varying levels of expertise, demonstrating that ARG-miner is more cost effective and less time-consuming relative to traditional expert curation. We further demonstrate the reliability of a trust validation filter for rejecting confounding input generated by spammers. Crowdsourcing was found to be as accurate as expert annotation, with an accuracy >90% for the annotation of a diverse test set of ARGs. ARG-miner provides a public API and database available at http://bench.cs.vt.edu/argminer.

34 citations


Journal ArticleDOI
25 Jun 2020-PLOS ONE
TL;DR: The results demonstrate that the human gut microbial community diversity and population size is significantly impacted by triclosan at a high dose in vitro, and that the community is recoverable within this system.
Abstract: The recent ban of the antimicrobial compound triclosan from use in consumer soaps followed research that showcased the risk it poses to the environment and to human health Triclosan has been found in human plasma, urine and milk, demonstrating that it is present in human tissues Previous work has also demonstrated that consumption of triclosan disrupts the gut microbial community of mice and zebrafish Due to the widespread use of triclosan and ubiquity in the environment, it is imperative to understand the impact this chemical has on the human body and its symbiotic resident microbes To that end, this study is the first to explore how triclosan impacts the human gut microbial community in vitro both during and after treatment Through our in vitro system simulating three regions of the human gut; the ascending colon, transverse colon, and descending colon regions, we found that treatment with triclosan significantly impacted the community structure in terms of reduced population, diversity, and metabolite production, most notably in the ascending colon region Given a 2 week recovery period, most of the population levels, community structure, and diversity levels were recovered for all colon regions Our results demonstrate that the human gut microbial community diversity and population size is significantly impacted by triclosan at a high dose in vitro, and that the community is recoverable within this system

6 citations


Journal ArticleDOI
TL;DR: Enterobacteriaceae and ARG profiles support the hypothesized concerns that recreational waterways are a potential source of community-acquired MDR-Ent.
Abstract: Community-acquired multidrug resistant Enterobacteriaceae (MDR-Ent) infections continue to increase in the United States. In prior studies, we identified neighboring regions in Chicago, Illinois, where children have 5 to 6 times greater odds of MDR-Ent infections. To prevent community spread of MDR-Ent, we need to identify the MDR-Ent reservoirs. A pilot study of 4 Chicago waterways for MDR-Ent and associated antibiotic resistance genes (ARGs) was conducted. Three waterways (A1 to A3) are labeled safe for "incidental contact recreation" (e.g., kayaking), and A4 is a nonrecreational waterway that carries nondisinfected water. Surface water samples were collected and processed for standard bacterial culture and shotgun metagenomic sequencing. Generally, A3 and A4 (neighboring waterways which are not hydraulically connected) were strikingly similar in bacterial taxa, ARG profiles, and abundances of corresponding clades and genera within the Enterobacteriaceae Additionally, total ARG abundances recovered from the full microbial community were strongly correlated between A3 and A4 (R 2 = 0.97). Escherichia coli numbers (per 100 ml water) were highest in A4 (783 most probable number [MPN]) and A3 (200 MPN) relative to A2 (84 MPN) and A1 (32 MPN). We found concerning ARGs in Enterobacteriaceae such as MCR-1 (colistin), Qnr and OqxA/B (quinolones), CTX-M, OXA and ACT/MIR (beta-lactams), and AAC (aminoglycosides). We found significant correlations in microbial community composition between nearby waterways that are not hydraulically connected, suggesting cross-seeding and the potential for mobility of ARGs. Enterobacteriaceae and ARG profiles support the hypothesized concerns that recreational waterways are a potential source of community-acquired MDR-Ent.

3 citations


Proceedings ArticleDOI
25 Jun 2020
TL;DR: In this article, the authors present a systematic scheme to transform applications written in Python into a set of functions that can then be automatically deployed atop platforms such as AWS Lamda, and they target a Bioinformatics cyberinfrastructure pipeline, CIWARS, that provides waste-water analysis for the identification of antibiotic-resistant bacteria and viruses such as SARS-CoV-2.
Abstract: Function-as-a-Service (FaaS) and the serverless computing model offer a powerful abstraction for supporting large-scale applications in the cloud. A major hurdle in this context is that it is non-trivial to transform an application, even an already containerized one, to a FaaS implementation. In this paper, we take the first step towards supporting easier and efficient application transformation to FaaS. We present a systematic scheme to transform applications written in Python into a set of functions that can then be automatically deployed atop platforms such as AWS Lamda. We target a Bioinformatics cyberinfrastructure pipeline, CIWARS, that provides waste-water analysis for the identification of antibiotic-resistant bacteria and viruses such as SARS-CoV-2. Based on our experience with enabling FaaS-based CIWARS, we develop a methodology that would help the conversion of other similar applications to the FaaS model. Our evaluation shows that our approach can correctly transform CIWARS to FaaS, and the new FaaS-based CIWARS incurs only negligible(≤2%) less than 2% overhead for representative workloads.

Posted ContentDOI
13 Mar 2020
TL;DR: FVE-novel, a computational pipeline for reconstructing complete or near-complete viral draft genomes from metagenomic data, is developed and represents a powerful new approach for exploring viral diversity in metagenomics data.
Abstract: Background Despite the recent surge of viral metagenomic studies, recovering complete virus/phage genomes from metagenomic data is still extremely difficult and most viral contigs generated from de novo assembly programs are highly fragmented, posing serious challenges to downstream analysis and inference. Results Here we develop FVE-novel, a computational pipeline for reconstructing complete or near-complete viral draft genomes from metagenomic data. FVE-novel deploys FastViromeExplorer to efficiently map metagenomic reads to viral reference genomes or contigs, performs de novo assembly of the mapped reads to generate scaffolds, and extends the scaffolds via iterative assembly to produce final viral scaffolds. We applied FVE-novel to an ocean metagenomic sample and obtained 268 viral scaffolds that potentially come from novel viruses. Through manual examination and validation of the ten longest scaffolds, we successfully recovered four complete viral genomes, two are novel as they cannot be found in the existing databases and the other two are related to known phages. Conclusions The hybrid reference-based and de novo assembly approach used by FVE-novel represents a powerful new approach for exploring viral diversity in metagenomic data. FVE-novel is freely available at https://github.com/saima-tithi/FVE-novel .