scispace - formally typeset
Search or ask a question

Showing papers in "Computational and structural biotechnology journal in 2020"


Journal ArticleDOI
TL;DR: The result showed that atazanavir, an antiretroviral medication used to treat and prevent the human immunodeficiency virus (HIV), is the best chemical compound, showing an inhibitory potency with Kd of 94.94 nM against the SARS-CoV-2 3C-like proteinase.
Abstract: The infection of a novel coronavirus found in Wuhan of China (SARS-CoV-2) is rapidly spreading, and the incidence rate is increasing worldwide. Due to the lack of effective treatment options for SARS-CoV-2, various strategies are being tested in China, including drug repurposing. In this study, we used our pre-trained deep learning-based drug-target interaction model called Molecule Transformer-Drug Target Interaction (MT-DTI) to identify commercially available drugs that could act on viral proteins of SARS-CoV-2. The result showed that atazanavir, an antiretroviral medication used to treat and prevent the human immunodeficiency virus (HIV), is the best chemical compound, showing an inhibitory potency with Kd of 94.94 nM against the SARS-CoV-2 3C-like proteinase, followed by remdesivir (113.13 nM), efavirenz (199.17 nM), ritonavir (204.05 nM), and dolutegravir (336.91 nM). Interestingly, lopinavir, ritonavir, and darunavir are all designed to target viral proteinases. However, in our prediction, they may also bind to the replication complex components of SARS-CoV-2 with an inhibitory potency with Kd

573 citations


Journal ArticleDOI
TL;DR: In this article, a review of the current knowledge about the genome distribution of oxidative DNA damage, repair intermediates, and mutations is presented, focusing on the various methodologies to measure the DNA damage distribution and discuss the mechanistic conclusions derived from different approaches.
Abstract: Reactive oxygen species are a constant threat to DNA as they modify bases with the risk of disrupting genome function, inducing genome instability and mutation. Such risks are due to primary oxidative DNA damage and also mediated by the repair process. This leads to a delicate decision process for the cell as to whether to repair a damaged base at a specific genomic location or better leave it unrepaired. Persistent DNA damage can disrupt genome function, but on the other hand it can also contribute to gene regulation by serving as an epigenetic mark. When such processes are out of balance, pathophysiological conditions could get accelerated, because oxidative DNA damage and resulting mutagenic processes are tightly linked to ageing, inflammation, and the development of multiple age-related diseases, such as cancer and neurodegenerative disorders. Recent technological advancements and novel data analysis strategies have revealed that oxidative DNA damage, its repair, and related mutations distribute heterogeneously over the genome at multiple levels of resolution. The involved mechanisms act in the context of genome sequence, in interaction with genome function and chromatin. This review addresses what we currently know about the genome distribution of oxidative DNA damage, repair intermediates, and mutations. It will specifically focus on the various methodologies to measure oxidative DNA damage distribution and discuss the mechanistic conclusions derived from the different approaches. It will also address the consequences of oxidative DNA damage, specifically how it gives rise to mutations, genome instability, and how it can act as an epigenetic mark.

175 citations


Journal ArticleDOI
TL;DR: This review discusses the evolution of predictive methods for one-dimensional and two-dimensional Protein Structure Annotations, from the simple statistical methods of the early days to the computationally intensive highly-sophisticated Deep Learning algorithms of the last decade.
Abstract: Protein Structure Prediction is a central topic in Structural Bioinformatics. Since the '60s statistical methods, followed by increasingly complex Machine Learning and recently Deep Learning methods, have been employed to predict protein structural information at various levels of detail. In this review, we briefly introduce the problem of protein structure prediction and essential elements of Deep Learning (such as Convolutional Neural Networks, Recurrent Neural Networks and basic feed-forward Neural Networks they are founded on), after which we discuss the evolution of predictive methods for one-dimensional and two-dimensional Protein Structure Annotations, from the simple statistical methods of the early days, to the computationally intensive highly-sophisticated Deep Learning algorithms of the last decade. In the process, we review the growth of the databases these algorithms are based on, and how this has impacted our ability to leverage knowledge about evolution and co-evolution to achieve improved predictions. We conclude this review outlining the current role of Deep Learning techniques within the wider pipelines to predict protein structures and trying to anticipate what challenges and opportunities may arise next.

137 citations


Journal ArticleDOI
TL;DR: In this article, the authors provide a comprehensive historical background of the improvements in DNA sequencing technologies that have accompanied the major milestones in genome sequencing and assembly, ranging from early sequencing methods to Next-Generation Sequencing platforms.
Abstract: Genomes represent the starting point of genetic studies. Since the discovery of DNA structure, scientists have devoted great efforts to determine their sequence in an exact way. In this review we provide a comprehensive historical background of the improvements in DNA sequencing technologies that have accompanied the major milestones in genome sequencing and assembly, ranging from early sequencing methods to Next-Generation Sequencing platforms. We then focus on the advantages and challenges of the current technologies and approaches, collectively known as Third Generation Sequencing. As these technical advancements have been accompanied by progress in analytical methods, we also review the bioinformatic tools currently employed in de novo genome assembly, as well as some applications of Third Generation Sequencing technologies and high-quality reference genomes.

132 citations


Journal ArticleDOI
TL;DR: This minireview is to provide the reader an overview of the publicly available KRAS structural data, insights to conformational dynamics revealed by experiments and what the authors have learned from MD simulations.
Abstract: One of the most common drivers in human cancer is the mutant KRAS protein. Not so long ago KRAS was considered as an undruggable oncoprotein. After a long struggle, however, we finally see some light at the end of the tunnel as promising KRAS targeted therapies are in or approaching clinical trials. In recent years, together with the promising progress in RAS drug discovery, our understanding of KRAS has increased tremendously. This progress has been accompanied with a resurgence of publicly available KRAS structures, which were limited to nine structures less than ten years ago. Furthermore, the ever-increasing computational capacity has made biologically relevant timescales accessible, enabling molecular dynamics (MD) simulations to study the dynamics of KRAS protein in more detail at the atomistic level. In this minireview, my aim is to provide the reader an overview of the publicly available KRAS structural data, insights to conformational dynamics revealed by experiments and what we have learned from MD simulations. Also, I will discuss limitations of the current data and provide suggestions for future research related to KRAS, which would fill out the existing gaps in our knowledge and provide guidance in deciphering this enigmatic oncoprotein.

118 citations


Journal ArticleDOI
TL;DR: In this article, the authors proposed the use of the Next Generation Sequencing (NGS) platform for predicting cancer risk, early detection, diagnosis by sequencing and medical imaging, accurate prognosis, biomarker identification and identification of therapeutic targets for novel drug discovery.
Abstract: Artificial intelligence (AI) and machine learning have significantly influenced many facets of the healthcare sector. Advancement in technology has paved the way for analysis of big datasets in a cost- and time-effective manner. Clinical oncology and research are reaping the benefits of AI. The burden of cancer is a global phenomenon. Efforts to reduce mortality rates requires early diagnosis for effective therapeutic interventions. However, metastatic and recurrent cancers evolve and acquire drug resistance. It is imperative to detect novel biomarkers that induce drug resistance and identify therapeutic targets to enhance treatment regimes. The introduction of the next generation sequencing (NGS) platforms address these demands, has revolutionised the future of precision oncology. NGS offers several clinical applications that are important for risk predictor, early detection of disease, diagnosis by sequencing and medical imaging, accurate prognosis, biomarker identification and identification of therapeutic targets for novel drug discovery. NGS generates large datasets that demand specialised bioinformatics resources to analyse the data that is relevant and clinically significant. Through these applications of AI, cancer diagnostics and prognostic prediction are enhanced with NGS and medical imaging that delivers high resolution images. Regardless of the improvements in technology, AI has some challenges and limitations, and the clinical application of NGS remains to be validated. By continuing to enhance the progression of innovation and technology, the future of AI and precision oncology show great promise.

105 citations


Journal ArticleDOI
TL;DR: This review discusses how knowledge graphs are constructed and applied with a particular focus on how machine learning approaches are changing these processes, and notes potential avenues for future work with knowledge graphs that appear particularly promising.
Abstract: Knowledge graphs can support many biomedical applications. These graphs represent biomedical concepts and relationships in the form of nodes and edges. In this review, we discuss how these graphs are constructed and applied with a particular focus on how machine learning approaches are changing these processes. Biomedical knowledge graphs have often been constructed by integrating databases that were populated by experts via manual curation, but we are now seeing a more robust use of automated systems. A number of techniques are used to represent knowledge graphs, but often machine learning methods are used to construct a low-dimensional representation that can support many different applications. This representation is designed to preserve a knowledge graph’s local and/or global structure. Additional machine learning methods can be applied to this representation to make predictions within genomic, pharmaceutical, and clinical domains. We frame our discussion first around knowledge graph construction and then around unifying representational learning techniques and unifying applications. Advances in machine learning for biomedicine are creating new opportunities across many domains, and we note potential avenues for future work with knowledge graphs that appear particularly promising.

102 citations


Journal ArticleDOI
TL;DR: In this paper, the authors focus on two classes of methods: sequential learning and recommender systems, which are active biomedical fields of research, and particularly focus on recommender system for drug development.
Abstract: Due to the huge amount of biological and medical data available today, along with well-established machine learning algorithms, the design of largely automated drug development pipelines can now be envisioned. These pipelines may guide, or speed up, drug discovery; provide a better understanding of diseases and associated biological phenomena; help planning preclinical wet-lab experiments, and even future clinical trials. This automation of the drug development process might be key to the current issue of low productivity rate that pharmaceutical companies currently face. In this survey, we will particularly focus on two classes of methods: sequential learning and recommender systems, which are active biomedical fields of research.

101 citations


Journal ArticleDOI
TL;DR: A bird’s-eye view at the past, present, and future developments of deep learning, starting from science at large, to biomedical imaging, and bioimage analysis in particular.
Abstract: Deep learning of artificial neural networks has become the de facto standard approach to solving data analysis problems in virtually all fields of science and engineering. Also in biology and medicine, deep learning technologies are fundamentally transforming how we acquire, process, analyze, and interpret data, with potentially far-reaching consequences for healthcare. In this mini-review, we take a bird's-eye view at the past, present, and future developments of deep learning, starting from science at large, to biomedical imaging, and bioimage analysis in particular.

99 citations


Journal ArticleDOI
TL;DR: A comprehensive and up-to-date overview of computational approaches for guide RNA design can be found in this paper, which can help users to select the optimal tools for their research.
Abstract: The Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR)/ CRISPR-associated (Cas) system has emerged as the main technology for gene editing. Successful editing by CRISPR requires an appropriate Cas protein and guide RNA. However, low cleavage efficiency and off-target effects hamper the development and application of CRISPR/Cas systems. To predict cleavage efficiency and specificity, numerous computational approaches have been developed for scoring guide RNAs. Most scores are empirical or trained by experimental datasets, and scores are implemented using various computational methods. Herein, we discuss these approaches, focusing mainly on the features or computational methods they utilise. Furthermore, we summarise these tools and give some suggestions for their usage. We also recommend three versatile web-based tools with user-friendly interfaces and preferable functions. The review provides a comprehensive and up-to-date overview of computational approaches for guide RNA design that could help users to select the optimal tools for their research.

96 citations


Journal ArticleDOI
TL;DR: In this article, the authors summarize and explain the concepts for Boolean network modeling and present application examples and guidelines to work with and analyze Boolean network models, which can be applied to unravel the mechanisms regulating the properties of the system or to identify promising intervention targets.
Abstract: Boolean network models are one of the simplest models to study complex dynamic behavior in biological systems. They can be applied to unravel the mechanisms regulating the properties of the system or to identify promising intervention targets. Since its introduction by Stuart Kauffman in 1969 for describing gene regulatory networks, various biologically based networks and tools for their analysis were developed. Here, we summarize and explain the concepts for Boolean network modeling. We also present application examples and guidelines to work with and analyze Boolean network models.

Journal ArticleDOI
TL;DR: This review introduces the research background of predicting protein–ligand binding sites and classify the methods into four categories, namely, 3D structure- based, template similarity-based, traditional machine learning- based and deep learning-based methods.
Abstract: Proteins participate in various essential processes in vivo via interactions with other molecules. Identifying the residues participating in these interactions not only provides biological insights for protein function studies but also has great significance for drug discoveries. Therefore, predicting protein-ligand binding sites has long been under intense research in the fields of bioinformatics and computer aided drug discovery. In this review, we first introduce the research background of predicting protein-ligand binding sites and then classify the methods into four categories, namely, 3D structure-based, template similarity-based, traditional machine learning-based and deep learning-based methods. We describe representative algorithms in each category and elaborate on machine learning and deep learning-based prediction methods in more detail. Finally, we discuss the trends and challenges of the current research such as molecular dynamics simulation based cryptic binding sites prediction, and highlight prospective directions for the near future.

Journal ArticleDOI
TL;DR: Zhang et al. as mentioned in this paper applied an unsupervised sequence embedding technique (doc2vec) to represent protein sequences as rich feature vectors of low dimensionality, and trained a Random Forest (RF) classifier through a training dataset that covers known PPIs between human and all viruses, obtaining excellent predictive accuracy outperforming various combinations of machine learning algorithms and commonly-used sequence encoding schemes.
Abstract: The identification of human-virus protein-protein interactions (PPIs) is an essential and challenging research topic, potentially providing a mechanistic understanding of viral infection. Given that the experimental determination of human-virus PPIs is time-consuming and labor-intensive, computational methods are playing an important role in providing testable hypotheses, complementing the determination of large-scale interactome between species. In this work, we applied an unsupervised sequence embedding technique (doc2vec) to represent protein sequences as rich feature vectors of low dimensionality. Training a Random Forest (RF) classifier through a training dataset that covers known PPIs between human and all viruses, we obtained excellent predictive accuracy outperforming various combinations of machine learning algorithms and commonly-used sequence encoding schemes. Rigorous comparison with three existing human-virus PPI prediction methods, our proposed computational framework further provided very competitive and promising performance, suggesting that the doc2vec encoding scheme effectively captures context information of protein sequences, pertaining to corresponding protein-protein interactions. Our approach is freely accessible through our web server as part of our host-pathogen PPI prediction platform (http://zzdlab.com/InterSPPI/). Taken together, we hope the current work not only contributes a useful predictor to accelerate the exploration of human-virus PPIs, but also provides some meaningful insights into human-virus relationships.

Journal ArticleDOI
TL;DR: Polysaccharide intercellular adhesin (PIA) is a key part of the extracellular matrix that contributes to important mechanisms of bacterial pathogenicity, most notably biofilm formation and immune evasion as mentioned in this paper.
Abstract: Exopolysaccharide is a key part of the extracellular matrix that contributes to important mechanisms of bacterial pathogenicity, most notably biofilm formation and immune evasion. In the human pathogens Staphylococcus aureus and S. epidermidis, as well as in many other staphylococcal species, the only exopolysaccharide is polysaccharide intercellular adhesin (PIA), a cationic, partially deacetylated homopolymer of N-acetylglucosamine, whose biosynthetic machinery is encoded in the ica locus. PIA production is strongly dependent on environmental conditions and controlled by many regulatory systems. PIA contributes significantly to staphylococcal biofilm formation and immune evasion mechanisms, such as resistance to antimicrobial peptides and ingestion and killing by phagocytes, and presence of the ica genes is associated with infectivity. Due to its role in pathogenesis, PIA has raised considerable interest as a potential vaccine component or target.

Journal ArticleDOI
TL;DR: In this paper, a review of the current state-of-the-art approaches for predicting protein stability upon mutation is presented, highlighting new challenges required to improve current tools and to achieve more reliable predictions.
Abstract: Protein stability predictions are becoming essential in medicine to develop novel immunotherapeutic agents and for drug discovery. Despite the large number of computational approaches for predicting the protein stability upon mutation, there are still critical unsolved problems: 1) the limited number of thermodynamic measurements for proteins provided by current databases; 2) the large intrinsic variability of ΔΔG values due to different experimental conditions; 3) biases in the development of predictive methods caused by ignoring the anti-symmetry of ΔΔG values between mutant and native protein forms; 4) over-optimistic prediction performance, due to sequence similarity between proteins used in training and test datasets. Here, we review these issues, highlighting new challenges required to improve current tools and to achieve more reliable predictions. In addition, we provide a perspective of how these methods will be beneficial for designing novel precision medicine approaches for several genetic disorders caused by mutations, such as cancer and neurodegenerative diseases.

Journal ArticleDOI
Ranen Aviner1
TL;DR: The current state of puromycin-based research is reviewed, including structure and mechanism of action, relevant derivatives, use in advanced methodologies and some of the major insights generated using such techniques both in the lab and the clinic are reviewed.
Abstract: Puromycin is a naturally occurring aminonucleoside antibiotic that inhibits protein synthesis by ribosome-catalyzed incorporation into the C-terminus of elongating nascent chains, blocking further extension and resulting in premature termination of translation. It is most commonly known as a selection marker for cell lines genetically engineered to express a resistance transgene, but its additional uses as a probe for protein synthesis have proven invaluable across a wide variety of model systems, ranging from purified ribosomes and cell-free translation to intact cultured cells and whole animals. Puromycin is comprised of a nucleoside covalently bound to an amino acid, mimicking the 3' end of aminoacylated tRNAs that participate in delivery of amino acids to elongating ribosomes. Both moieties can tolerate some chemical substitutions and modifications without significant loss of activity, generating a diverse toolbox of puromycin-based reagents with added functionality, such as biotin for affinity purification or fluorophores for fluorescent microscopy detection. These reagents, as well as anti-puromycin antibodies, have played a pivotal role in advancing our understanding of the regulation and dysregulation of protein synthesis in normal and pathological processes, including immune response and neurological function. This manuscript reviews the current state of puromycin-based research, including structure and mechanism of action, relevant derivatives, use in advanced methodologies and some of the major insights generated using such techniques both in the lab and the clinic.

Journal ArticleDOI
TL;DR: Recent progress in Streptomyces genome sequencing and the application of genome mining approaches to identify and characterize silent secondary metabolite biosynthetic gene clusters are discussed.
Abstract: Streptomyces are a large and valuable resource of bioactive and complex secondary metabolites, many of which have important clinical applications. With the advances in high throughput genome sequencing methods, various in silico genome mining strategies have been developed and applied to the mapping of the Streptomyces genome. These studies have revealed that Streptomyces possess an even more significant number of uncharacterized silent secondary metabolite biosynthetic gene clusters (smBGCs) than previously estimated. Linking smBGCs to their encoded products has played a critical role in the discovery of novel secondary metabolites, as well as, knowledge-based engineering of smBGCs to produce altered products. In this mini review, we discuss recent progress in Streptomyces genome sequencing and the application of genome mining approaches to identify and characterize smBGCs. Furthermore, we discuss several challenges that need to be overcome to accelerate the genome mining process and ultimately support the discovery of novel bioactive compounds.

Journal ArticleDOI
TL;DR: Given the growing trend on the application of deep learning architectures in genomics research, a mini review of the most prominent models is outlined, to highlight possible pitfalls and discuss future directions.
Abstract: With the evolution of biotechnology and the introduction of the high throughput sequencing, researchers have the ability to produce and analyze vast amounts of genomics data Since genomics produce big data, most of the bioinformatics algorithms are based on machine learning methodologies, and lately deep learning, to identify patterns, make predictions and model the progression or treatment of a disease Advances in deep learning created an unprecedented momentum in biomedical informatics and have given rise to new bioinformatics and computational biology research areas It is evident that deep learning models can provide higher accuracies in specific tasks of genomics than the state of the art methodologies Given the growing trend on the application of deep learning architectures in genomics research, in this mini review we outline the most prominent models, we highlight possible pitfalls and discuss future directions We foresee deep learning accelerating changes in the area of genomics, especially for multi-scale and multimodal data analysis for precision medicine

Journal ArticleDOI
TL;DR: Per-residue energy decomposition analysis suggests that residues T57, H59, S105 and R107 are the key hotspots for drug discovery, and these residues may be useful as potential pharmacophores in drug designing.
Abstract: The emergence of recent SARS-CoV-2 has become a global health issue. This single-stranded positive-sense RNA virus is continuously spreading with increasing morbidities and mortalities. The proteome of this virus contains four structural and sixteen nonstructural proteins that ensure the replication of the virus in the host cell. However, the role of phosphoprotein (N) in RNA recognition, replicating, transcribing the viral genome, and modulating the host immune response is indispensable. Recently, the NMR structure of the N-terminal domain of the Nucleocapsid Phosphoprotein has been reported, but its precise structural mechanism of how the ssRNA interacts with it is not reported yet. Therefore, here, we have used an integrated computational pipeline to identify the key residues, which play an essential role in RNA recognition. We generated multiple variants by using an alanine scanning strategy and performed an extensive simulation for each system to signify the role of each interfacial residue. Our analyses suggest that residues T57A, H59A, S105A, R107A, F171A, and Y172A significantly affected the dynamics and binding of RNA. Furthermore, per-residue energy decomposition analysis suggests that residues T57, H59, S105 and R107 are the key hotspots for drug discovery. Thus, these residues may be useful as potential pharmacophores in drug designing.

Journal ArticleDOI
TL;DR: A review of the state-of-the-art of MinION technology applied to microbiome studies, the current possible application and main challenges for its use on 16S rRNA metabarcoding are presented.
Abstract: Assessment of bacterial diversity through sequencing of 16S ribosomal RNA (16S rRNA) genes has been an approach widely used in environmental microbiology, particularly since the advent of high-throughput sequencing technologies. An additional innovation introduced by these technologies was the need of developing new strategies to manage and investigate the massive amount of sequencing data generated. This situation stimulated the rapid expansion of the field of bioinformatics with the release of new tools to be applied to the downstream analysis and interpretation of sequencing data mainly generated using Illumina technology. In recent years, a third generation of sequencing technologies has been developed and have been applied in parallel and complementarily to the former sequencing strategies. In particular, Oxford Nanopore Technologies (ONT) introduced nanopore sequencing which has become very popular among molecular ecologists. Nanopore technology offers a low price, portability and fast sequencing throughput. This powerful technology has been recently tested for 16S rRNA analyses showing promising results. However, compared with previous technologies, there is a scarcity of bioinformatic tools and protocols designed specifically for the analysis of Nanopore 16S sequences. Due its notable characteristics, researchers have recently started performing assessments regarding the suitability MinION on 16S rRNA sequencing studies, and have obtained remarkable results. Here we present a review of the state-of-the-art of MinION technology applied to microbiome studies, the current possible application and main challenges for its use on 16S rRNA metabarcoding.

Journal ArticleDOI
TL;DR: In this article, deep learning has been successfully applied to various omics data, however, the applications of deep learning in metabolomics are still relatively low compared to others omics.
Abstract: In the past few years, deep learning has been successfully applied to various omics data. However, the applications of deep learning in metabolomics are still relatively low compared to others omics. Currently, data pre-processing using convolutional neural network architecture appears to benefit the most from deep learning. Compound/structure identification and quantification using artificial neural network/deep learning performed relatively better than traditional machine learning techniques, whereas only marginally better results are observed in biological interpretations. Before deep learning can be effectively applied to metabolomics, several challenges should be addressed, including metabolome-specific deep learning architectures, dimensionality problems, and model evaluation regimes.

Journal ArticleDOI
TL;DR: In this article, a conceptual framework outlining four types of processes that may give rise to zero values in sequence count data was developed and compared with different zero-handling models to identify the most differentially expressed sequences.
Abstract: Genomic studies feature multivariate count data from high-throughput DNA sequencing experiments, which often contain many zero values. These zeros can cause artifacts for statistical analyses and multiple modeling approaches have been developed in response. Here, we apply different zero-handling models to gene-expression and microbiome datasets and show models can disagree substantially in terms of identifying the most differentially expressed sequences. Next, to rationally examine how different zero handling models behave, we developed a conceptual framework outlining four types of processes that may give rise to zero values in sequence count data. Last, we performed simulations to test how zero handling models behave in the presence of these different zero generating processes. Our simulations showed that simple count models are sufficient across multiple processes, even when the true underlying process is unknown. On the other hand, a common zero handling technique known as "zero-inflation" was only suitable under a zero generating process associated with an unlikely set of biological and experimental conditions. In concert, our work here suggests several specific guidelines for developing and choosing state-of-the-art models for analyzing sparse sequence count data.

Journal ArticleDOI
TL;DR: In this article, an ensemble predictor named iRNA-m6A was established to identify m6A sites in multiple tissues of human, mouse and rat based on the data from high-throughput sequencing techniques.
Abstract: N6-methyladenosine (m6A) is the methylation of the adenosine at the nitrogen-6 position, which is the most abundant RNA methylation modification and involves a series of important biological processes. Accurate identification of m6A sites in genome-wide is invaluable for better understanding their biological functions. In this work, an ensemble predictor named iRNA-m6A was established to identify m6A sites in multiple tissues of human, mouse and rat based on the data from high-throughput sequencing techniques. In the proposed predictor, RNA sequences were encoded by physical-chemical property matrix, mono-nucleotide binary encoding and nucleotide chemical property. Subsequently, these features were optimized by using minimum Redundancy Maximum Relevance (mRMR) feature selection method. Based on the optimal feature subset, the best m6A classification models were trained by Support Vector Machine (SVM) with 5-fold cross-validation test. Prediction results on independent dataset showed that our proposed method could produce the excellent generalization ability. We also established a user-friendly webserver called iRNA-m6A which can be freely accessible at http://lin-group.cn/server/iRNA-m6A. This tool will provide more convenience to users for studying m6A modification in different tissues.

Journal ArticleDOI
TL;DR: The development of optical mapping in recent decades is reviewed to illustrate its importance in genomic research and its applications and algorithms are detail to show its specific advantages.
Abstract: Recent advances in optical mapping have allowed the construction of improved genome assemblies with greater contiguity. Optical mapping also enables genome comparison and identification of large-scale structural variations. Association of these large-scale genomic features with biological functions is an important goal in plant and animal breeding and in medical research. Optical mapping has also been used in microbiology and still plays an important role in strain typing and epidemiological studies. Here, we review the development of optical mapping in recent decades to illustrate its importance in genomic research. We detail its applications and algorithms to show its specific advantages. Finally, we discuss the challenges required to facilitate the optimization of optical mapping and improve its future development and application.

Journal ArticleDOI
TL;DR: In this article, the authors analyzed the available NM libraries for their suitability for integration with novel nanoinformatics approaches and for the development of NM specific Integrated Approaches to Testing and Assessment (IATA) for human and environmental risk assessment, all within the NanoSolveIT cloud-platform.
Abstract: Nanotechnology has enabled the discovery of a multitude of novel materials exhibiting unique physicochemical (PChem) properties compared to their bulk analogues. These properties have led to a rapidly increasing range of commercial applications; this, however, may come at a cost, if an association to long-term health and environmental risks is discovered or even just perceived. Many nanomaterials (NMs) have not yet had their potential adverse biological effects fully assessed, due to costs and time constraints associated with the experimental assessment, frequently involving animals. Here, the available NM libraries are analyzed for their suitability for integration with novel nanoinformatics approaches and for the development of NM specific Integrated Approaches to Testing and Assessment (IATA) for human and environmental risk assessment, all within the NanoSolveIT cloud-platform. These established and well-characterized NM libraries (e.g. NanoMILE, NanoSolutions, NANoREG, NanoFASE, caLIBRAte, NanoTEST and the Nanomaterial Registry (>2000 NMs)) contain physicochemical characterization data as well as data for several relevant biological endpoints, assessed in part using harmonized Organisation for Economic Co-operation and Development (OECD) methods and test guidelines. Integration of such extensive NM information sources with the latest nanoinformatics methods will allow NanoSolveIT to model the relationships between NM structure (morphology), properties and their adverse effects and to predict the effects of other NMs for which less data is available. The project specifically addresses the needs of regulatory agencies and industry to effectively and rapidly evaluate the exposure, NM hazard and risk from nanomaterials and nano-enabled products, enabling implementation of computational 'safe-by-design' approaches to facilitate NM commercialization.

Journal ArticleDOI
TL;DR: Results from this study provide insight into puerarin's action mechanism, and propose a prompt application of it on COVID-19 patients for assessing its clinical feasibility, and support a view that quercetin is involved in host immunomodulation, which renders it a promising candidate against CO VID-19.
Abstract: The outbreak of COVID-19 raises an urgent need for the therapeutics to contain the emerging pandemic. However, no effective treatment has been found for SARS-CoV-2 infection to date. Here, we identified puerarin (PubChem CID: 5281807), quercetin (PubChem CID: 5280343) and kaempferol (PubChem CID: 5280863) as potential compounds with binding activity to ACE2 by using Traditional Chinese Medicine Systems Pharmacology Database and Analysis Platform (TCMSP). Molecular docking analysis showed that puerarin and quercetin exhibit good binding affinity to ACE2, which was validated by surface plasmon resonance (SPR) assay. Furthermore, SPR-based competition assay revealed that puerarin and quercetin could significantly affect the binding of viral S-protein to ACE2 receptor. Notably, quercetin could also bind to the RBD domain of S-protein, suggesting not only a receptor blocking, but also a virus neutralizing effect of quercetin on SARS-CoV-2. The results from network pharmacology and bioinformatics analysis support a view that quercetin is involved in host immunomodulation, which further renders it a promising candidate against COVID-19. Moreover, given that puerarin is already an existing drug, results from this study not only provide insight into its action mechanism, but also propose a prompt application of it on COVID-19 patients for assessing its clinical feasibility.

Journal ArticleDOI
TL;DR: i et al. as mentioned in this paper proposed an integrated prediction tool, iStable 2.0, which integrates 11 sequence-based and structure-based prediction tools by machine learning and adds protein sequence information as features.
Abstract: Protein mutations can lead to structural changes that affect protein function and result in disease occurrence. In protein engineering, drug design or and optimization industries, mutations are often used to improve protein stability or to change protein properties while maintaining stability. To provide possible candidates for novel protein design, several computational tools for predicting protein stability changes have been developed. Although many prediction tools are available, each tool employs different algorithms and features. This can produce conflicting prediction results that make it difficult for users to decide upon the correct protein design. Therefore, this study proposes an integrated prediction tool, iStable 2.0, which integrates 11 sequence-based and structure-based prediction tools by machine learning and adds protein sequence information as features. Three coding modules are designed for the system, an Online Server Module, a Stand-alone Module and a Sequence Coding Module, to improve the prediction performance of the previous version of the system. The final integrated structure-based classification model has a higher Matthews correlation coefficient than that of the single prediction tool (0.708 vs 0.547, respectively), and the Pearson correlation coefficient of the regression model likewise improves from 0.669 to 0.714. The sequence-based model not only successfully integrates off-the-shelf predictors but also improves the Matthews correlation coefficient of the best single prediction tool by at least 0.161, which is better than the individual structure-based prediction tools. In addition, both the Sequence Coding Module and the Stand-alone Module maintain performance with only a 5% decrease of the Matthews correlation coefficient when the integrated online tools are unavailable. iStable 2.0 is available at http://ncblab.nchu.edu.tw/iStable2.

Journal ArticleDOI
TL;DR: A recent review as discussed by the authors provides a guideline for successful data generation and analysis methods using appropriate software tools and databases for the study of chromatin accessibility at single-cell resolution, along with an up-to-date list of published studies that involved the application of this method.
Abstract: Most genetic variations associated with human complex traits are located in non-coding genomic regions. Therefore, understanding the genotype-to-phenotype axis requires a comprehensive catalog of functional non-coding genomic elements, most of which are involved in epigenetic regulation of gene expression. Genome-wide maps of open chromatin regions can facilitate functional analysis of cis- and trans-regulatory elements via their connections with trait-associated sequence variants. Currently, Assay for Transposase Accessible Chromatin with high-throughput sequencing (ATAC-seq) is considered the most accessible and cost-effective strategy for genome-wide profiling of chromatin accessibility. Single-cell ATAC-seq (scATAC-seq) technology has also been developed to study cell type-specific chromatin accessibility in tissue samples containing a heterogeneous cellular population. However, due to the intrinsic nature of scATAC-seq data, which are highly noisy and sparse, accurate extraction of biological signals and devising effective biological hypothesis are difficult. To overcome such limitations in scATAC-seq data analysis, new methods and software tools have been developed over the past few years. Nevertheless, there is no consensus for the best practice of scATAC-seq data analysis yet. In this review, we discuss scATAC-seq technology and data analysis methods, ranging from preprocessing to downstream analysis, along with an up-to-date list of published studies that involved the application of this method. We expect this review will provide a guideline for successful data generation and analysis methods using appropriate software tools and databases for the study of chromatin accessibility at single-cell resolution.

Journal ArticleDOI
TL;DR: In this review, an overview of the CRISPR-Cas systems will be introduced, including the innovations, the applications in human disease research and gene therapy, as well as the challenges and opportunities that will be faced in the practical application.
Abstract: Genome editing is the modification of genomic DNA at a specific target site in a wide variety of cell types and organisms, including insertion, deletion and replacement of DNA, resulting in inactivation of target genes, acquisition of novel genetic traits and correction of pathogenic gene mutations. Due to the advantages of simple design, low cost, high efficiency, good repeatability and short-cycle, CRISPR-Cas systems have become the most widely used genome editing technology in molecular biology laboratories all around the world. In this review, an overview of the CRISPR-Cas systems will be introduced, including the innovations, the applications in human disease research and gene therapy, as well as the challenges and opportunities that will be faced in the practical application of CRISPR-Cas systems.

Journal ArticleDOI
TL;DR: Here, Val-to-Lys417 mutation in the receptor-binding domains of SARS-CoV-2 is observed, which established a Lys-Asp electrostatic interaction enhancing its ACE2-binding, and unique and structurally conserved conformational-epitopes on RBDs are identified, which can be potential therapeutic targets.
Abstract: Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), which causes coronavirus disease 2019 (COVID-19), is a novel beta coronavirus. SARS-CoV-2 uses spike glycoprotein to interact with host angiotensin-converting enzyme 2 (ACE2) and ensure cell recognition. High infectivity of SARS-CoV-2 raises questions on spike-ACE2 binding affinity and its neutralization by anti-SARS-CoV monoclonal antibodies (mAbs). Here, we observed Val-to-Lys417 mutation in the receptor-binding domains (RBD) of SARS-CoV-2, which established a Lys-Asp electrostatic interaction enhancing its ACE2-binding. Pro-to-Ala475 substitution and Gly482 insertion in the AGSTPCNGV-loop of RBD possibly hinders neutralization of SARS-CoV-2 by anti-SARS-CoV mAbs. In addition, we identified unique and structurally conserved conformational-epitopes on RBDs, which can be potential therapeutic targets. Collectively, we provide new insights into the mechanisms underlying the high infectivity of SARS-CoV-2 and development of effective neutralizing agents.