scispace - formally typeset
Search or ask a question

Showing papers by "Liqing Zhang published in 2021"


Journal ArticleDOI
TL;DR: The emergence of Next Generation Sequencing (NGS) is revolutionizing the potential to address complex microbiological challenges in the water industry as discussed by the authors, which can provide holistic insight into microbial communities and their functional capacities in water and wastewater systems, thus eliminating the need to develop new assay for each target organism or gene.

37 citations


Journal ArticleDOI
TL;DR: In this paper, the authors evaluate the impact of assembly leveraging short reads, nanopore MinION long-reads, and a combination of the two (hybrid) on ARG contextualization for ten environmental metagenomes using seven prominent assemblers.
Abstract: In the fight to limit the global spread of antibiotic resistance, the assembly of environmental metagenomes has the potential to provide rich contextual information (eg, taxonomic hosts, carriage on mobile genetic elements) about antibiotic resistance genes (ARG) in the environment However, computational challenges associated with assembly can impact the accuracy of downstream analyses This work critically evaluates the impact of assembly leveraging short reads, nanopore MinION long-reads, and a combination of the two (hybrid) on ARG contextualization for ten environmental metagenomes using seven prominent assemblers (IDBA-UD, MEGAHIT, Canu, Flye, Opera-MS, metaSpades and HybridSpades) While short-read and hybrid assemblies produced similar patterns of ARG contextualization, raw or assembled long nanopore reads produced distinct patterns Based on an in-silico spike-in experiment using real and simulated reads, we show that low to intermediate coverage species are more likely to be incorporated into chimeric contigs across all assemblers and sequencing technologies, while more abundant species produce assemblies with a greater frequency of inversions and insertion/deletions (indels) In sum, our analyses support hybrid assembly as a valuable technique for boosting the reliability and accuracy of assembly-based analyses of ARGs and neighboring genes at environmentally-relevant coverages, provided that sufficient short-read sequencing depth is achieved

37 citations


Journal ArticleDOI
TL;DR: In this article, the authors provide an overview of data analytics frameworks suitable for various Environmental Science and Engineering (ESE) research applications and present current applications of ML algorithms within the ESE domain using three representative case studies.
Abstract: The advent of new data acquisition and handling techniques has opened the door to alternative and more comprehensive approaches to environmental monitoring that will improve our capacity to understand and manage environmental systems. Researchers have recently begun using machine learning (ML) techniques to analyze complex environmental systems and their associated data. Herein, we provide an overview of data analytics frameworks suitable for various Environmental Science and Engineering (ESE) research applications. We present current applications of ML algorithms within the ESE domain using three representative case studies: (1) Metagenomic data analysis for characterizing and tracking antimicrobial resistance in the environment; (2) Nontarget analysis for environmental pollutant profiling; and (3) Detection of anomalies in continuous data generated by engineered water systems. We conclude by proposing a path to advance incorporation of data analytics approaches in ESE research and application.

23 citations


Journal ArticleDOI
TL;DR: In this paper, a protein-protein interaction (PPI) network of the most significant module and a TF-miRNA-lncRNA regulatory network of major depressive disorder (MDD) were constructed using bioinformatics analysis tools.
Abstract: Major depressive disorder (MDD) is a highly prevalent disease and one of the main causes of disability worldwide. Although many studies have partially revealed the occurrence and development process of MDD, the pathogeny and molecular mechanisms are not fully understood. Weighted gene coexpression network analysis (WGCNA) was used to explore the co-expression modules and hub genes in MDD. A protein-protein interaction (PPI) network of the most significant module and a TF-miRNA-lncRNA regulatory network of MDD were constructed using bioinformatics analysis tools. A KEGG pathway and gene ontology (GO) functional enrichment analysis of the genes in the significant module was performed using DAVID. Five hub genes in the PPI network and 10 genes in the TF-miRNA-lncRNA regulatory network with high degree values were identified, which may provide new insights for the investigation of key pathways, diagnostic bio-markers, and therapeutic targets of MDD. This study brings a novel perspective and provides valuable information to explore the molecular mechanism of MDD.

7 citations


Posted ContentDOI
27 Aug 2021-bioRxiv
TL;DR: MobileOG-db as discussed by the authors provides a comprehensive database of 6,140 manually curated protein families that are linked to the "life cycle" (integration, excision, replication/recombination/repair, transfer, and stability/defense).
Abstract: Currently available databases of bacterial mobile genetic elements (MGEs) contain both "core" and accessory MGE functional modules, the latter of which are often only transiently associated with the element. The presence of these accessory genes, which are often close homologs to primarily immobile genes, limits the usability of these databases for MGE annotation. To overcome this limitation, we analysed 10,776,212 protein sequences derived from seven MGE databases to compile a comprehensive database of 6,140 manually curated protein families that are linked to the "life cycle" (integration, excision, replication/recombination/repair, transfer, and stability/defense) of all major classes of bacterial MGEs. We overlay experimental information where available to create a tiered annotation scheme of high-quality annotations and annotations inferred exclusively through bioinformatic evidence. We additionally provide an MGE-class label for each entry (e.g., plasmid, integrative element) derived from the source database, and assign a list of keywords to each entry to delineate different MGE functional modules and to facilitate annotation. The resulting database, mobileOG-db (for mobile orthologous groups), provides a simple and readily interpretable foundation for an array of MGE-centred analyses. mobileOG-db can be accessed at mobileogdb.flsi.cloud.vt.edu/, where users can browse and design, refine, and analyse custom subsets of the dynamic mobilome.

3 citations


Posted ContentDOI
Min Oh1, Liqing Zhang1
07 May 2021-bioRxiv
TL;DR: DeepGeni as discussed by the authors proposed a deep generalized interpretable autoencoder to improve the generalizability and interpretability of microbiome profiles by augmenting data and by introducing interpretable links in the autoencoders.
Abstract: Recent studies revealed that gut microbiota modulates the response to cancer immunotherapy and fecal microbiota transplantation has clinical benefit in melanoma patients during the treatment. Understanding microbiota affecting individual response is crucial to advance precision oncology. However, it is challenging to identify the key microbial taxa with limited data as statistical and machine learning models often lose their generalizability. In this study, DeepGeni, a deep generalized interpretable autoencoder, is proposed to improve the generalizability and interpretability of microbiome profiles by augmenting data and by introducing interpretable links in the autoencoder. DeepGeni-based machine learning classifier outperforms state-of-the-art classifier in the microbiome-driven prediction of responsiveness of melanoma patients treated with immune checkpoint inhibitors. DeepGeni-based machine learning classifier outperforms state-of-the-art classifier in the microbiome-driven responsiveness prediction of melanoma patients treated with immune checkpoint inhibitors. Also, the interpretable links of DeepGeni elucidate the most informative microbiota associated with cancer immunotherapy response.

3 citations


Journal ArticleDOI
TL;DR: In this paper, a genome-wide SNP genotyping of the last 10 generations of the Banna minipig inbred (BMI) line was conducted, with an average decrease in heterozygosity rate of 0.0078 per generation.
Abstract: Inbred pigs are promising animal models for biomedical research and xenotransplantation. Established in 1980, the Banna minipig inbred (BMI) line originated from a sow and its own male offspring. It was selected from a small backcountry minority Lahu village, where records show that no other pig breed has ever been introduced. During the inbreeding process, we perfomed extreme inbreeding over 23 consecutive generations using full-sibling or parent-offspring mating. In order to investigate the inbreeding effects in BMI pigs across generations over the past 40 years, in this study we conducted a genome-wide SNP genotyping of the last 10 generations, representing generations 14-23. In total, we genotyped 57,746 SNPs, corresponding to an average decrease in heterozygosity rate of 0.0078 per generation. Furthermore, we were only able to identify 18,216 polymorphic loci with a MAF larger than 0.05, which is substantially lower than the values in previous reports on other pig breeds. In addition, we sequenced the genome of the first pig in the twenty-third generation (inbreeding coefficient 99.28%) to an average coverage of 12.4× to evaluate at the genome level the impact of advanced inbreeding. ROH analysis indicates that BMI pigs have longer ROHs than Wuzhishan and Duroc pigs. Those long ROH regions in BMI pigs are enriched for distinct functions compared with the highly polymorphic regions. Our study reveals a genome-wide allele diversity loss during the progress of inbreeding in BMI pigs and characterizes ROH and polymorphic regions as a result of inbreeding. Overall, our results indicate the successful establishment of the BMI line, which paves the way for further in-depth studies.

3 citations


Journal ArticleDOI
TL;DR: MetaMLP as mentioned in this paper is a machine learning method that represents sequences as numerical vectors (embeddings) and uses a simple one hidden layer neural network to profile functional categories, enabling partial matching by using a reduced alphabet to build sequence embeddings from full and partial k-mers.
Abstract: The functional profile of metagenomic samples enables improved understanding of microbial populations in the environment. Such analysis consists of assigning short sequencing reads to a particular functional category. Normally, manually curated databases are used for functional assignment, and genes are arranged into different classes. Sequence alignment has been widely used to profile metagenomic samples against curated databases. However, this method is time consuming and requires high computational resources. While several alignment-free methods based on k-mer composition have been developed in recent years, they still require large amounts of computer main memory. In this article, MetaMLP (Metagenomics Machine Learning Profiler), a machine learning method that represents sequences as numerical vectors (embeddings) and uses a simple one hidden layer neural network to profile functional categories, is developed. Unlike other methods, MetaMLP enables partial matching by using a reduced alphabet to build sequence embeddings from full and partial k-mers. MetaMLP is able to identify a slightly larger number of reads compared with DIAMOND (one of the fastest sequence alignment methods), as well as to perform accurate predictions with 0.99 precision and 0.99 recall. MetaMLP can process 100M reads in ∼10 minutes on a laptop computer, which is 50 times faster than DIAMOND.

2 citations


Journal ArticleDOI
TL;DR: AgroSeek as mentioned in this paper is a web-based system that provides computational tools for analysis and comparison of metagenomic data sets tailored specifically to researchers and other users in the agricultural sector interested in tracking and mitigating the spread of ARGs.
Abstract: Metagenomics is gaining attention as a powerful tool for identifying how agricultural management practices influence human and animal health, especially in terms of potential to contribute to the spread of antibiotic resistance. However, the ability to compare the distribution and prevalence of antibiotic resistance genes (ARGs) across multiple studies and environments is currently impossible without a complete re-analysis of published datasets. This challenge must be addressed for metagenomics to realize its potential for helping guide effective policy and practice measures relevant to agricultural ecosystems, for example, identifying critical control points for mitigating the spread of antibiotic resistance. Here we introduce AgroSeek, a centralized web-based system that provides computational tools for analysis and comparison of metagenomic data sets tailored specifically to researchers and other users in the agricultural sector interested in tracking and mitigating the spread of ARGs. AgroSeek draws from rich, user-provided metagenomic data and metadata to facilitate analysis, comparison, and prediction in a user-friendly fashion. Further, AgroSeek draws from publicly-contributed data sets to provide a point of comparison and context for data analysis. To incorporate metadata into our analysis and comparison procedures, we provide flexible metadata templates, including user-customized metadata attributes to facilitate data sharing, while maintaining the metadata in a comparable fashion for the broader user community and to support large-scale comparative and predictive analysis. AgroSeek provides an easy-to-use tool for environmental metagenomic analysis and comparison, based on both gene annotations and associated metadata, with this initial demonstration focusing on control of antibiotic resistance in agricultural ecosystems. Agroseek creates a space for metagenomic data sharing and collaboration to assist policy makers, stakeholders, and the public in decision-making. AgroSeek is publicly-available at https://agroseek.cs.vt.edu/ .

2 citations


Posted ContentDOI
Min Oh1, Liqing Zhang1
07 May 2021-bioRxiv
TL;DR: DeepBioGen as mentioned in this paper augments the visual patterns of sequencing profiles to generate realistic profiles based on a deep generative model capturing the patterns, and generalizes the subsequent classifiers.
Abstract: Predictive models trained on sequencing profiles often fail to achieve expected performance when externally validated on unseen profiles. While many factors such as batch effects, small data sets, and technical errors contribute to the gap between source and unseen data distributions, it is a challenging problem to generalize the predictive models across studies without any prior knowledge of the unseen data distribution. Here, this study proposes DeepBioGen, a sequencing profile augmentation procedure that characterizes visual patterns of sequencing profiles, generates realistic profiles based on a deep generative model capturing the patterns, and generalizes the subsequent classifiers. DeepBioGen outperforms other methods in terms of enhancing the generalizability of the prediction models on unseen data. The generalized classifiers surpass state-of-the-arts methods, evaluated on RNA sequencing tumor expression profiles for anti-PD1 therapy response prediction and WGS human gut microbiome profiles for type 2 diabetes diagnosis.

1 citations


Posted ContentDOI
04 May 2021-bioRxiv
TL;DR: In this paper, a defining feature distinguishing the two pluripotent states lies in the ability of naive but not primed cells to erase gene occlusion, which is a mode of epigenetic inactivation that renders genes unresponsive to their cognate transcriptional activators.
Abstract: Pluripotent stem cells can exist in either the naive state representing a developmental blank slate or the downstream primed state poised for differentiation. Currently, known differences between these two states are mostly phenomenological, and none can adequately explain why the two states should differ in developmental priming. Gene occlusion is a mode of epigenetic inactivation that renders genes unresponsive to their cognate transcriptional activators. It plays a crucial role in lineage restriction. Here, we report that a defining feature distinguishing the two pluripotent states lies in the ability of naive but not primed cells to erase occlusion. This "deocclusion" capacity requires Esrrb, a gene expressed only in the naive but not primed state. Notably, Esrrb silencing in the primed state is itself due to occlusion. Collectively, our data argue that the Esrrb-dependent deocclusion capacity in naive cells is key for sustaining naive pluripotency, and the loss of this capacity in the primed state via the occlusion of Esrrb poises cells for differentiation.

Posted ContentDOI
18 May 2021-bioRxiv
TL;DR: The METTL5-TRMT112 complex is the methyltransferase responsible for installing m6A at position 1832 of human 18S rRNA as discussed by the authors, which may become surface-exposed under some circumstances and thus may play a regulatory role in translation of specific transcripts.
Abstract: Ribosomal RNAs (rRNAs) have long been known to carry modifications, including numerous sites of 2’O-methylation and pseudouridylation, as well as N6-methyladenosine (m6A), and N6,6-dimethyladenosine. While the functions of many of these modifications are unclear, some are highly conserved and occur in regions of the ribosome critical for mRNA decoding. Both 28S rRNA and 18S rRNA carry m6A, and while ZCCHC4 has been identified as the methyltransferase responsible for the 28S rRNA m6A site, the methyltransferase responsible for the 18S rRNA m6A site has remained uncharacterized until recently. Here, we show that the METTL5-TRMT112 complex is the methyltransferase responsible for installing m6A at position 1832 of human 18S rRNA. TRMT112 is required for the metabolic stability of METTL5, and human METTL5 mutations associated with microcephaly and intellectual disability disrupt this interaction. Loss of METTL5 in human cancer lines alters the translation of transcripts associated with mitochondrial biogenesis and function. Mettl5 knockout mice display reduced body size and evidence of metabolic defects. This m6A site is located on the 3’ end of 18S rRNA, which may become surface-exposed under some circumstances and thus may play a regulatory role in translation of specific transcripts. While recent work has focused heavily on m6A modifications in mRNA and its roles in mRNA processing and translation, deorphanizing putative methyltransferase enzymes is revealing previously unappreciated regulatory roles for m6A in noncoding RNAs.