scispace - formally typeset
Search or ask a question

Showing papers by "Liqing Zhang published in 2018"


Journal ArticleDOI
TL;DR: The deep learning models developed here offer more accurate antimicrobial resistance annotation relative to current bioinformatics practice, and DeepARG does not require strict cutoffs, which enables identification of a much broader diversity of ARGs.
Abstract: Growing concerns about increasing rates of antibiotic resistance call for expanded and comprehensive global monitoring. Advancing methods for monitoring of environmental media (e.g., wastewater, agricultural waste, food, and water) is especially needed for identifying potential resources of novel antibiotic resistance genes (ARGs), hot spots for gene exchange, and as pathways for the spread of ARGs and human exposure. Next-generation sequencing now enables direct access and profiling of the total metagenomic DNA pool, where ARGs are typically identified or predicted based on the “best hits” of sequence searches against existing databases. Unfortunately, this approach produces a high rate of false negatives. To address such limitations, we propose here a deep learning approach, taking into account a dissimilarity matrix created using all known categories of ARGs. Two deep learning models, DeepARG-SS and DeepARG-LS, were constructed for short read sequences and full gene length sequences, respectively. Evaluation of the deep learning models over 30 antibiotic resistance categories demonstrates that the DeepARG models can predict ARGs with both high precision (> 0.97) and recall (> 0.90). The models displayed an advantage over the typical best hit approach, yielding consistently lower false negative rates and thus higher overall recall (> 0.9). As more data become available for under-represented ARG categories, the DeepARG models’ performance can be expected to be further enhanced due to the nature of the underlying neural networks. Our newly developed ARG database, DeepARG-DB, encompasses ARGs predicted with a high degree of confidence and extensive manual inspection, greatly expanding current ARG repositories. The deep learning models developed here offer more accurate antimicrobial resistance annotation relative to current bioinformatics practice. DeepARG does not require strict cutoffs, which enables identification of a much broader diversity of ARGs. The DeepARG models and database are available as a command line version and as a Web service at http://bench.cs.vt.edu/deeparg .

402 citations


Journal ArticleDOI
Min Oh1, Amy Pruden1, Chaoqi Chen1, Lenwood S. Heath1, Kang Xia1, Liqing Zhang1 
TL;DR: MetaCompare is introduced, a publicly available tool for ranking ‘resistome risk', which is defined as the potential for antibiotic resistance genes (ARGs) to be associated with mobile genetic elements (MGEs) and mobilize to pathogens based on metagenomic data.

76 citations


Journal ArticleDOI
12 Jan 2018-PeerJ
TL;DR: Both simulated and real data from human microbiome and ocean environmental samples are used to validate FastViromeExplorer as a reliable tool to quickly and accurately identify viruses and their abundances in large datasets.
Abstract: With the increase in the availability of metagenomic data generated by next generation sequencing, there is an urgent need for fast and accurate tools for identifying viruses in host-associated and environmental samples. In this paper, we developed a stand-alone pipeline called FastViromeExplorer for the detection and abundance quantification of viruses and phages in large metagenomic datasets by performing rapid searches of virus and phage sequence databases. Both simulated and real data from human microbiome and ocean environmental samples are used to validate FastViromeExplorer as a reliable tool to quickly and accurately identify viruses and their abundances in large datasets.

66 citations


Journal ArticleDOI
18 Sep 2018-Mbio
TL;DR: In this article, the authors track the genome evolution of the globally abundant marine bacterial phylum Marinimicrobia across its diversification into modern marine environments and demonstrate that extant lineages are partitioned between epipelagic and mesopelagic habitats and that these habitat preferences are associated with fundamental differences in genomic organization, cellular bioenergetics, and metabolic modalities.
Abstract: Diverse bacterial and archaeal lineages drive biogeochemical cycles in the global ocean, but the evolutionary processes that have shaped their genomic properties and physiological capabilities remain obscure. Here we track the genome evolution of the globally abundant marine bacterial phylum Marinimicrobia across its diversification into modern marine environments and demonstrate that extant lineages are partitioned between epipelagic and mesopelagic habitats. Moreover, we show that these habitat preferences are associated with fundamental differences in genomic organization, cellular bioenergetics, and metabolic modalities. Multiple lineages present in epipelagic niches independently acquired genes necessary for phototrophy and environmental stress mitigation, and their genomes convergently evolved key features associated with genome streamlining. In contrast, lineages residing in mesopelagic waters independently acquired nitrate respiratory machinery and a variety of cytochromes, consistent with the use of alternative terminal electron acceptors in oxygen minimum zones (OMZs). Further, while epipelagic clades have retained an ancestral Na+-pumping respiratory complex, mesopelagic lineages have largely replaced this complex with canonical H+-pumping respiratory complex I, potentially due to the increased efficiency of the latter together with the presence of the more energy-limiting environments deep in the ocean's interior. These parallel evolutionary trends indicate that key features of genomic streamlining and cellular bioenergetics have occurred repeatedly and congruently in disparate clades and underscore the importance of environmental conditions and nutrient dynamics in driving the evolution of diverse bacterioplankton lineages in similar ways throughout the global ocean.IMPORTANCE Understanding long-term patterns of microbial evolution is critical to advancing our knowledge of past and present role microbial life in driving global biogeochemical cycles. Historically, it has been challenging to study the evolution of environmental microbes due to difficulties in obtaining genome sequences from lineages that could not be cultivated, but recent advances in metagenomics and single-cell genomics have begun to obviate many of these hurdles. Here we present an evolutionary genomic analysis of the Marinimicrobia, a diverse bacterial group that is abundant in the global ocean. We demonstrate that distantly related Marinimicrobia species that reside in similar habitats have converged to assume similar genome architectures and cellular bioenergetics, suggesting that common factors shape the evolution of a broad array of marine lineages. These findings broaden our understanding of the evolutionary forces that have given rise to microbial life in the contemporary ocean.

29 citations


Journal ArticleDOI
TL;DR: A comprehensive analysis of the host transcriptome including mRNA, lncRNA, and alternative splicing was performed using human cell lines expressing dCas9-SAM and HIV-targeting msgRNAs demonstrating the rare off-target effects of the HIV-specific dCas 9-SAM system in human cells.
Abstract: CRISPR/CAS9 (epi)genome editing revolutionized the field of gene and cell therapy Our previous study demonstrated that a rapid and robust reactivation of the HIV latent reservoir by a catalytically-deficient Cas9 (dCas9)-synergistic activation mediator (SAM) via HIV long terminal repeat (LTR)-specific MS2-mediated single guide RNAs (msgRNAs) directly induces cellular suicide without additional immunotherapy However, potential off-target effect remains a concern for any clinical application of Cas9 genome editing and dCas9 epigenome editing After dCas9 treatment, potential off-target responses have been analyzed through different strategies such as mRNA sequence analysis, and functional screening In this study, a comprehensive analysis of the host transcriptome including mRNA, lncRNA, and alternative splicing was performed using human cell lines expressing dCas9-SAM and HIV-targeting msgRNAs The control scrambled msgRNA (LTR_Zero), and two LTR-specific msgRNAs (LTR_L and LTR_O) groups show very similar expression profiles of the whole transcriptome Among 839 identified lncRNAs, none exhibited significantly different expression in LTR_L vs LTR_Zero group In LTR_O group, only TERC and scaRNA2 lncRNAs were significantly decreased Among 142,791 mRNAs, four genes were differentially expressed in LTR_L vs LTR_Zero group There were 21 genes significantly downregulated in LTR_O vs either LTR_Zero or LTR_L group and one third of them are histone related The distributions of different types of alternative splicing were very similar either within or between groups There were no apparent changes in all the lncRNA and mRNA transcripts between the LTR_L and LTR_Zero groups This is an extremely comprehensive study demonstrating the rare off-target effects of the HIV-specific dCas9-SAM system in human cells This finding is encouraging for the safe application of dCas9-SAM technology to induce target-specific reactivation of latent HIV for an effective “shock-and-kill” strategy

18 citations


Journal ArticleDOI
30 May 2018
TL;DR: For the first time, changes in growth and gene expression for commensal gut bacteria in response to naringenin were documented.
Abstract: In this study, the effect of the flavanone naringenin on the growth and genetic expression of the commensal gut microbes, Ruminococcus gauvreauii, Bifidobacterium catenulatum, and Enterococcus caccae, was analyzed. Analysis of growth curves revealed that Ruminococcus gauvreauii was unaffected by naringenin, Bifidobacterium catenulatum was slightly enhanced by naringenin, and Enterococcus caccae was severely inhibited by naringenin. Changes in genetic expression due to naringenin were determined using single-molecule RNA sequencing. Analysis revealed the following responses to naringenin: Ruminococcus gauvreauii upregulated genes involved in iron uptake; Bifidobacterium catenulatum upregulated genes involved in cellular metabolism, DNA repair and molecular transport, and downregulated genes involved in thymidine biosynthesis and metabolism; Enterococcus caccae upregulated pathways involved in transcription and protein transport and downregulated genes responsible for sugar transport and purine synthesis. For the first time, changes in growth and gene expression for commensal gut bacteria in response to naringenin were documented.

14 citations


Journal ArticleDOI
TL;DR: A large-scale RNA interference screen in K562 human chronic myeloid leukemia cells identified genes that regulate autophagy at different stages, which helps decode Autophagy regulation in cancer and offers novel avenues to develop autophagic-related therapies for cancer.
Abstract: Dysregulated autophagy is central to the pathogenesis and therapeutic development of cancer. However, how autophagy is regulated in cancer is not well understood and genes that modulate cancer autophagy are not fully defined. To gain more insights into autophagy regulation in cancer, we performed a large-scale RNA interference screen in K562 human chronic myeloid leukemia cells using monodansylcadaverine staining, an autophagy-detecting approach equivalent to immunoblotting of the autophagy marker LC3B or fluorescence microscopy of GFP-LC3B. By coupling monodansylcadaverine staining with fluorescence-activated cell sorting, we successfully isolated autophagic K562 cells where we identified 336 short hairpin RNAs. After candidate validation using Cyto-ID fluorescence spectrophotometry, LC3B immunoblotting, and quantitative RT-PCR, 82 genes were identified as autophagy-regulating genes. 20 genes have been reported previously and the remaining 62 candidates are novel autophagy mediators. Bioinformatic analyses revealed that most candidate genes were involved in molecular pathways regulating autophagy, rather than directly participating in the autophagy process. Further autophagy flux assays revealed that 57 autophagy-regulating genes suppressed autophagy initiation, whereas 21 candidates promoted autophagy maturation. Our RNA interference screen identifies identified genes that regulate autophagy at different stages, which helps decode autophagy regulation in cancer and offers novel avenues to develop autophagy-related therapies for cancer.

12 citations


Posted ContentDOI
01 Mar 2018-bioRxiv
TL;DR: A new web-based curation system, ARG-miner, which supports annotation of ARGs at multiple levels, including: gene name, antibiotic category, resistance mechanism, and evidence for mobility and occurrence in clinically-important bacterial strains is proposed.
Abstract: Curation of antibiotic resistance gene (ARG) databases is a labor-intensive process that requires expert knowledge to manually collect, correct, and/or annotate individual genes. Correspondingly, updates to existing databases tend to be infrequent, commonly requiring years for completion and often containing inconsistences. Further, because of limitations of manual curation, most existing ARG databases contain only a small proportion of known ARGs (~5k genes). A new approach is needed to achieve a truly comprehensive ARG database, while also maintaining a high level of accuracy. Here we propose a new web-based curation system, ARG-miner, which supports annotation of ARGs at multiple levels, including: gene name, antibiotic category, resistance mechanism, and evidence for mobility and occurrence in clinically-important bacterial strains. To overcome limitations of manual curation, we employ crowdsourcing as a novel strategy for expanding curation capacity towards achieving a truly comprehensive, up-to-date database. We develop and validate the approach by comparing performance of multiple cohorts of curators with varying levels of expertise, demonstrating that ARG-miner is more cost effective and less time-consuming relative to traditional expert curation. We further demonstrate the reliability of a trust validation filter for rejecting confounding input generated by spammers. Crowdsourcing was found to be as accurate as expert annotation, with an accuracy >90% for the annotation of a diverse test set of ARGs. ARG-miner provides a public API and database available at http://bench.cs.vt.edu/argminer.

9 citations


Journal ArticleDOI
TL;DR: A local discriminant gait recognition method is proposed by integrating weighted adaptive center symmetric local binary pattern (WACS-LBP) with local linear discriminate projection (LLDP) with results show that the proposed method is not only effective, but also can be clearly interpreted.
Abstract: With the increasing demands of the remote surveillance system, the gait based personal identification research has obtained more and more attention from biometric recognition researchers The gait sequence is easier to be affected by factors than other biometric feathers In order to achieve better performance of the gait based identification system, in the paper, a local discriminant gait recognition method is proposed by integrating weighted adaptive center symmetric local binary pattern (WACS-LBP) with local linear discriminate projection (LLDP) The proposed method consists of two stages In the first stage, the robust local weighted histogram feature vector is extracted from each gait image by WACS-LBP In the second stage, the dimensionality of the extracted feature vector is reduced by LLDP The highlights of the proposed method are (1) the extracted feature is robust to rotation invariant, and is also tolerant to illumination and pose changes; (2) the low dimensional feature vector reduced by LLDP can preserve the discriminating ability; and (3) the small-sample-size (SSS) problem is avoided naturally The proposed method is validated and compared with the existing algorithms on a public gait database The experimental results show that the proposed method is not only effective, but also can be clearly interpreted

8 citations


Posted ContentDOI
Dhoha Abid1, Liqing Zhang1
23 Nov 2018-bioRxiv
TL;DR: Two models trained with k-mer features to predict capsid and tail proteins respectively are developed by using deep neural networks with composition-based features and outperform state-of-the-art methods with improved F-1 scores.
Abstract: The capsid and tail proteins are considered the main structural proteins for phages and also their footprint since they exist only in phage genomes. These proteins are known to lack sequence conservation, making them extremely diverse and thus posing a major challenge to identify and annotate them in genomic sequences. In this study, we aim to overcome this challenge and predict these proteins by using deep neural networks with composition-based features. We develop two models trained with $k$-mer features to predict capsid and tail proteins respectively. Evaluating the models on two different testing sets shows that they outperform state-of-the-art methods with improved F-1 scores.

6 citations


Journal ArticleDOI
08 Feb 2018-Genes
TL;DR: Analysis of empirical Arabidopsis thaliana data under varying glyphosate dosages, and the analysis of monozygotic twins who have different pain sensitivities show that WFMM can identify more relevant DMCs related to the phenotype of interest than methylKit.
Abstract: Deoxyribonucleic acid (DNA) methylation is an epigenetic alteration crucial for regulating stress responses. Identifying large-scale DNA methylation at single nucleotide resolution is made possible by whole genome bisulfite sequencing. An essential task following the generation of bisulfite sequencing data is to detect differentially methylated cytosines (DMCs) among treatments. Most statistical methods for DMC detection do not consider the dependency of methylation patterns across the genome, thus possibly inflating type I error. Furthermore, small sample sizes and weak methylation effects among different phenotype categories make it difficult for these statistical methods to accurately detect DMCs. To address these issues, the wavelet-based functional mixed model (WFMM) was introduced to detect DMCs. To further examine the performance of WFMM in detecting weak differential methylation events, we used both simulated and empirical data and compare WFMM performance to a popular DMC detection tool methylKit. Analyses of simulated data that replicated the effects of the herbicide glyphosate on DNA methylation in Arabidopsis thaliana show that WFMM results in higher sensitivity and specificity in detecting DMCs compared to methylKit, especially when the methylation differences among phenotype groups are small. Moreover, the performance of WFMM is robust with respect to small sample sizes, making it particularly attractive considering the current high costs of bisulfite sequencing. Analysis of empirical Arabidopsis thaliana data under varying glyphosate dosages, and the analysis of monozygotic (MZ) twins who have different pain sensitivities—both datasets have weak methylation effects of <1%—show that WFMM can identify more relevant DMCs related to the phenotype of interest than methylKit. Differentially methylated regions (DMRs) are genomic regions with different DNA methylation status across biological samples. DMRs and DMCs are essentially the same concepts, with the only difference being how methylation information across the genome is summarized. If methylation levels are determined by grouping neighboring cytosine sites, then they are DMRs; if methylation levels are calculated based on single cytosines, they are DMCs.

Posted ContentDOI
29 Nov 2018-bioRxiv
TL;DR: NanoARG is an online computational resource that takes advantage of long reads produced by MinION nanopore sequencing to enable identification of ARGs in the context of relevant neighboring genes, providing relevant insight into mobility, co-selection, and pathogenicity.
Abstract: Direct selection pressures imposed by antibiotics, indirect pressures by co-selective agents, and horizontal gene transfer are fundamental drivers of the evolution and spread of antibiotic resistance. Therefore, effective environmental monitoring tools should ideally capture not only antibiotic resistance genes (ARGs), but also mobile genetic elements (MGEs) and indicators of co-selective forces, such as metal resistance genes (MRGs). Further, a major challenge towards characterizing potential human risk is the ability to identify bacterial host organisms, especially human pathogens. Historically, short reads yielded by next-generation sequencing technology has hampered confidence in assemblies for achieving these purposes. Here we introduce NanoARG, an online computational resource that takes advantage of long reads produced by MinION nanopore sequencing. Specifically, long nanopore reads enable identification of ARGs in the context of relevant neighboring genes, providing relevant insight into mobility, co-selection, and pathogenicity. NanoARG allows users to upload sequence data online and provides various means to analyze and visualize the data, including quantitative and simultaneous profiling of ARG, MRG, MGE, and pathogens. NanoARG is publicly available and freely accessible at http://bench.cs.vt.edu/nanoARG.

Posted ContentDOI
07 Dec 2018-bioRxiv
TL;DR: Genesis-indel is a computational pipeline that explores the unmapped reads to identify novel indels that are initially missed in the alignment procedure and is able to identify 72,997 small to large novel high-quality indels previously not found in the original alignments.
Abstract: In current practice, Next Generation Sequencing (NGS) applications start with mapping/aligning short reads to the reference genome, with the aim of identifying genetic mutations. While most short reads can be mapped to the reference genome accurately by existing alignment tools, a significant number remain unmapped and excluded from downstream analyses thus potentially discarding important biological information hidden in the unmapped reads. This paper describes Genesis-indel, a computational pipeline that explores the unmapped reads to identify novel indels that are initially missed in the alignment procedure. Genesis-indel is applied to the unmapped reads of 30 Breast Cancer patients from TCGA. Results show that the unmapped reads are conserved between the two subtypes of breast cancer investigated in this study and might contribute to the divergence between the subtypes. Genesis-indel is able to leverage the unmapped reads to identify 72,997 small to large novel high-quality indels previously not found in the original alignments and among them, 16,141 have not been annotated in the widely used mutation database. Statistical analysis shows that these new indels mostly altered the oncogenes and tumor suppressor genes. Functional annotation further reveals that these indels are strongly correlated to pathways of cancer and can have high to moderate impact on protein functions. Additionally, these indels overlap with the genes that are missed in the indels from the originally mapped reads and contribute to the tumorigenesis in multiple carcinomas.

Posted ContentDOI
19 Mar 2018-bioRxiv
TL;DR: This work evaluated BisPin and BFAST-Gap, a new multiprocess bisulfite-treated short DNA read mapper written in Python 2.7 that performs alignments using BFAST, leveraging its multithreading functionality and thorough hash-based indexing strategy.
Abstract: Background: BisPin is a new multiprocess bisulfite-treated short DNA read mapper written in Python 2.7. It performs alignments using BFAST, leveraging its multithreading functionality and thorough hash-based indexing strategy. BisPin is feature rich and supports directional, nondirectional, PBAT, and hairpin construction strategies. BisPin approaches read mapping by converting the Cs to Ts and the Gs to As in both the reads and the reference genome. BisPin uses fast rescoring to disambiguate ambiguously aligned reads for a superior amount of uniquely mapped reads compared to other mappers. The performance of BisPin was evaluated on both real and simulated data in comparison to other read mappers. BFAST-Gap is a modified version of BFAST meant for Ion Torrent reads. It uses a parameterized logistic function to determine the weights of the gap open and extension penalties based on the homopolymer run length of the DNA read. This is because the Ion Torrent sequencing technology can overcall and undercall homopolymer runs. BisPin works with both BFAST-Gap and BFAST. BFAST-Gap is compatible with indexes built with BFAST. There are few mappers that specifically address Ion Torrent data. BFAST-Gap works with Illumina reads as well. Results: BisPin with BFAST consistently had a higher amount of uniquely mapped reads compared to other mappers on real data using a variety of construction strategies. Using a hairpin validation strategy, BisPin was superior using the maximum score, and it mapped 73% of reads correctly. BisPin with BFAST-Gap on Ion Torrent reads with a logistic gap open penalty function improved mapping accuracy with real and simulated data. On simulated bisulfite Ion Torrent data, the area under the curve was improved by approximately seven, and on one real data set, the uniquely mapped percent was improved by seven percent. BFAST-Gap performed better than TMAP on simulated regular Ion Torrent reads, and TMAP is designed for Ion Torrent reads. Other read mappers had worse performance. Conclusions: BisPin and BFAST-Gap have consistently good accuracy with a variety of data. BisPin is feature-rich. This makes BisPin and BFAST-Gap useful additions to read mapping software.

Posted ContentDOI
30 Apr 2018-bioRxiv
TL;DR: Parallel evolutionary trends across disparate clades suggest that the evolution of key features of genomic organization and cellular bioenergetics in abundant marine lineages may in some ways be predictable and driven largely by environmental conditions and nutrient dynamics.
Abstract: Diverse bacterial and archaeal lineages drive biogeochemical cycles in the global ocean, but the evolutionary processes that have shaped their genomic properties and physiological capabilities remain obscure. Here we track the genome evolution of the globally-abundant marine bacterial phylum Marinimicrobia across its diversification into modern marine environments and demonstrate that extant lineages have repeatedly switched between epipelagic and mesopelagic habitats. Moreover, we show that these habitat transitions have been accompanied by repeated and fundamental shifts in genomic organization, cellular bioenergetics, and metabolic modalities. Lineages present in epipelagic niches independently acquired genes necessary for phototrophy and environmental stress mitigation, and their genomes convergently evolved key features associated with genome streamlining. Conversely, lineages residing in mesopelagic waters independently acquired nitrate respiratory machinery and a variety of cytochromes, consistent with the use of alternative terminal electron acceptors in oxygen minimum zones (OMZs). Further, while surface water clades have retained an ancestral Na + -pumping respiratory complex, deep water lineages have replaced this complex with a canonical H + -pumping respiratory complex I, potentially due to the increased efficiency of the latter together with more energy-limiting environments deep in the ocean9s interior. These parallel evolutionary trends across disparate clades suggest that the evolution of key features of genomic organization and cellular bioenergetics in abundant marine lineages may in some ways be predictable and driven largely by environmental conditions and nutrient dynamics.