scispace - formally typeset
Search or ask a question

Showing papers by "Alexander Tropsha published in 2020"


Journal ArticleDOI
TL;DR: This Perspective summarizes recent technological advances in QSAR modeling but it also highlights the applicability of algorithms, modeling methods, and validation practices developed inQSAR to a wide range of research areas outside of traditional QSar boundaries including synthesis planning, nanotechnology, materials science, biomaterials, and clinical informatics.
Abstract: Prediction of chemical bioactivity and physical properties has been one of the most important applications of statistical and more recently, machine learning and artificial intelligence methods in chemical sciences. This field of research, broadly known as quantitative structure–activity relationships (QSAR) modeling, has developed many important algorithms and has found a broad range of applications in physical organic and medicinal chemistry in the past 55+ years. This Perspective summarizes recent technological advances in QSAR modeling but it also highlights the applicability of algorithms, modeling methods, and validation practices developed in QSAR to a wide range of research areas outside of traditional QSAR boundaries including synthesis planning, nanotechnology, materials science, biomaterials, and clinical informatics. As modern research methods generate rapidly increasing amounts of data, the knowledge of robust data-driven modelling methods professed within the QSAR field can become essential for scientists working both within and outside of chemical research. We hope that this contribution highlighting the generalizable components of QSAR modeling will serve to address this challenge.

383 citations


Journal ArticleDOI
TL;DR: The Collaborative Modeling Project for Androgen Receptor Activity (CoMPARA) efforts are described, which follows the steps of the Collaborative Estrogen Recept Activity Prediction Project (CERAPP).
Abstract: BACKGROUND: Endocrine disrupting chemicals (EDCs) are xenobiotics that mimic the interaction of natural hormones and alter synthesis, transport, or metabolic pathways. The prospect of EDCs causing ...

107 citations


Journal ArticleDOI
TL;DR: Quaternary ammonium compounds such as ammonium chloride, cetylpyridinium and miramistin represent widely accessible antiseptic molecules with well-known broad-spectrum antiviral activities and represent a repurposing opportunity as therapeutics against SARS-CoV-2.
Abstract: The COVID-19 pandemic has highlighted an important role for drug repurposing. Quaternary ammonium compounds such as ammonium chloride, cetylpyridinium and miramistin represent widely accessible antiseptic molecules with well-known broad-spectrum antiviral activities and represent a repurposing opportunity as therapeutics against SARS-CoV-2.

91 citations


Journal ArticleDOI
TL;DR: In this article, the authors analyzed the available NM libraries for their suitability for integration with novel nanoinformatics approaches and for the development of NM specific Integrated Approaches to Testing and Assessment (IATA) for human and environmental risk assessment, all within the NanoSolveIT cloud-platform.
Abstract: Nanotechnology has enabled the discovery of a multitude of novel materials exhibiting unique physicochemical (PChem) properties compared to their bulk analogues. These properties have led to a rapidly increasing range of commercial applications; this, however, may come at a cost, if an association to long-term health and environmental risks is discovered or even just perceived. Many nanomaterials (NMs) have not yet had their potential adverse biological effects fully assessed, due to costs and time constraints associated with the experimental assessment, frequently involving animals. Here, the available NM libraries are analyzed for their suitability for integration with novel nanoinformatics approaches and for the development of NM specific Integrated Approaches to Testing and Assessment (IATA) for human and environmental risk assessment, all within the NanoSolveIT cloud-platform. These established and well-characterized NM libraries (e.g. NanoMILE, NanoSolutions, NANoREG, NanoFASE, caLIBRAte, NanoTEST and the Nanomaterial Registry (>2000 NMs)) contain physicochemical characterization data as well as data for several relevant biological endpoints, assessed in part using harmonized Organisation for Economic Co-operation and Development (OECD) methods and test guidelines. Integration of such extensive NM information sources with the latest nanoinformatics methods will allow NanoSolveIT to model the relationships between NM structure (morphology), properties and their adverse effects and to predict the effects of other NMs for which less data is available. The project specifically addresses the needs of regulatory agencies and industry to effectively and rapidly evaluate the exposure, NM hazard and risk from nanomaterials and nano-enabled products, enabling implementation of computational 'safe-by-design' approaches to facilitate NM commercialization.

63 citations


Journal ArticleDOI
TL;DR: It is argued that, to develop effective treatments for COVID-19 and be prepared for future epidemics, long-term, consistent investment in antiviral research is needed.

23 citations


Journal ArticleDOI
TL;DR: The advances in computational approaches to drug discovery of small molecules with epigenetic modulation profiles are reviewed, the current chemogenomics data available for epigenetics targets are summarized, and a perspective on the greater utility of biomedical knowledge mining as a means to advance the epigenetic drug discovery is provided.

22 citations


Journal ArticleDOI
TL;DR: This study has developed and rigorously validated Quantitative Structure-Interference Relationship (QSIR) models of detergent-sensitive aggregation in several HTS campaigns under various assay conditions and screening concentrations and increases the accuracy of aggregation prediction by ~53% in the β-lactamase assay and by ~46% by the cruzain assay compared to previously published methods.
Abstract: Small, colloidally aggregating molecules (SCAMs) are the most common source of false positives in high-throughput screening (HTS) campaigns. Although SCAMs can be experimentally detected and suppressed by the addition of detergent in the assay buffer, detergent sensitivity is not routinely monitored in HTS. Computational methods are thus needed to flag potential SCAMs during HTS triage. In this study, we have developed and rigorously validated quantitative structure-interference relationship (QSIR) models of detergent-sensitive aggregation in several HTS campaigns under various assay conditions and screening concentrations. In particular, we have modeled detergent-sensitive aggregation in an AmpC β-lactamase assay, the preferred HTS counter-screen for aggregation, as well as in another assay that measures cruzain inhibition. Our models increase the accuracy of aggregation prediction by ∼53% in the β-lactamase assay and by ∼46% in the cruzain assay compared to previously published methods. We also discuss the importance of both assay conditions and screening concentrations in the development of QSIR models for various interference mechanisms besides aggregation. The models developed in this study are publicly available for fast prediction within the SCAM detective web application (https://scamdetective.mml.unc.edu/).

19 citations


Journal ArticleDOI
TL;DR: This research presents a novel probabilistic procedure called QSAR without borders, which can be used to assess the severity of the impact of natural disasters on the response of the immune system.
Abstract: Correction for ‘QSAR without borders’ by Eugene N. Muratov et al., Chem. Soc. Rev., 2020, DOI: 10.1039/d0cs00098a.

18 citations


Journal ArticleDOI
TL;DR: The synthesis and structure activity relationships of 5,11-dihydro-6H-benzo[e]pyrimido[5,4-b][1,4]diazepin-6-one DCLK1 inhibitors are described, resulting in the identification of D CLK1-IN-1.
Abstract: Doublecortin-like kinase 1 (DCLK1) is a serine/threonine kinase that is overexpressed in gastrointestinal cancers, including esophageal, gastric, colorectal, and pancreatic cancers. DCLK1 is also used as a marker of tuft cells, which regulate type II immunity in the gut. However, the substrates and functions of DCLK1 are understudied. We recently described the first selective DCLK1/2 inhibitor, DCLK1-IN-1, developed to aid the functional characterization of this important kinase. Here we describe the synthesis and structure-activity relationships of 5,11-dihydro-6H-benzo[e]pyrimido[5,4-b][1,4]diazepin-6-one DCLK1 inhibitors, resulting in the identification of DCLK1-IN-1.

15 citations


Journal ArticleDOI
TL;DR: COVID-KOP can be used effectively to generate new hypotheses concerning repurposing of known drugs and clinical drug candidates against COVID-19 by establishing respective confirmatory pathways of drug action.
Abstract: SUMMARY In response to the COVID-19 pandemic, we established COVID-KOP, a new knowledgebase integrating the existing Reasoning Over Biomedical Objects linked in Knowledge Oriented Pathways (ROBOKOP) biomedical knowledge graph with information from recent biomedical literature on COVID-19 annotated in the CORD-19 collection. COVID-KOP can be used effectively to generate new hypotheses concerning repurposing of known drugs and clinical drug candidates against COVID-19 by establishing respective confirmatory pathways of drug action. AVAILABILITY AND IMPLEMENTATION COVID-KOP is freely accessible at https://covidkop.renci.org/. For code and instructions for the original ROBOKOP, see: https://github.com/NCATS-Gamma/robokop.

15 citations


Posted ContentDOI
25 Nov 2020-ChemRxiv
TL;DR: This work has compiled, curated, and integrated the largest publicly available dataset and developed an ensemble of QSAR models for all six endpoints of acute toxicity tests, and established a publicly accessible Systemic and Topical chemical Toxicity (STopTox) web portal.
Abstract: Since 2009, animal testing for cosmetic products has been prohibited in Europe, and in 2016, US EPA announced their intent to modernize the so-called "6-pack" of acute toxicity tests (acute oral toxicity, acute dermal toxicity, acute inhalation toxicity, skin irritation and corrosion, eye irritation and corrosion, and skin sensitization) and expand acceptance of alternative methods to reduce animal testing of pesticides. We have compiled, curated, and integrated the largest publicly available dataset and developed an ensemble of QSAR models for all six endpoints. All models were validated according to the OECD QSAR principles and tested using newly identified data on compounds not included in the training sets. We have established a publicly accessible Systemic and Topical chemical Toxicity (STopTox) web portal (https://stoptox.mml.unc.edu/) integrating all developed models for “6-pack” assays. This portal can be used by scientists and regulators to identify putative toxicants or non-toxicants in chemical libraries of interest.


Journal ArticleDOI
TL;DR: The results provide systems biology support for using BCG and small-molecule BCG mimics as putative vaccine and drug candidates against emergent viruses including SARS-CoV-2.
Abstract: Coronavirus disease 2019 (COVID-19) is expected to continue to cause worldwide fatalities until the World population develops ‘herd immunity’, or until a vaccine is developed and used as a prevention. Meanwhile, there is an urgent need to identify alternative means of antiviral defense. Bacillus Calmette–Guerin (BCG) vaccine that has been recognized for its off-target beneficial effects on the immune system can be exploited to boast immunity and protect from emerging novel viruses. We developed and employed a systems biology workflow capable of identifying small-molecule antiviral drugs and vaccines that can boast immunity and affect a wide variety of viral disease pathways to protect from the fatal consequences of emerging viruses. Our analysis demonstrates that BCG vaccine affects the production and maturation of naive T cells resulting in enhanced, long-lasting trained innate immune responses that can provide protection against novel viruses. We have identified small-molecule BCG mimics, including antiviral drugs such as raltegravir and lopinavir as high confidence hits. Strikingly, our top hits emetine and lopinavir were independently validated by recent experimental findings that these compounds inhibit the growth of SARS-CoV-2 in vitro. Our results provide systems biology support for using BCG and small-molecule BCG mimics as putative vaccine and drug candidates against emergent viruses including SARS-CoV-2.

Journal ArticleDOI
TL;DR: It is proposed that XLG2, independent of guanine nucleotide binding, regulates the active state of the canonical G protein pathway directly by sequestering Gβγ and indirectly by promoting heterodimer formation.
Abstract: Plants uniquely have a family of proteins called extra-large G proteins (XLG) that share homology in their C-terminal half with the canonical Gα subunits; we carefully detail here that Arabidopsis ...

Journal ArticleDOI
TL;DR: TranQL can be used to ask questions of relevance to translational science, rapidly obtain answers that require assertions from a federation of knowledge sources, and provide valuable insights for translational research and clinical practice.
Abstract: Background: Efforts are underway to semantically integrate large biomedical knowledge graphs using common upper-level ontologies to federate graph-oriented application programming interfaces (APIs) to the data. However, federation poses several challenges, including query routing to appropriate knowledge sources, generation and evaluation of answer subsets, semantic merger of those answer subsets, and visualization and exploration of results. Objective: We aimed to develop an interactive environment for query, visualization, and deep exploration of federated knowledge graphs. Methods: We developed a biomedical query language and web application interphase—termed as Translator Query Language (TranQL)—to query semantically federated knowledge graphs and explore query results. TranQL uses the Biolink data model as an upper-level biomedical ontology and an API standard that has been adopted by the Biomedical Data Translator Consortium to specify a protocol for expressing a query as a graph of Biolink data elements compiled from statements in the TranQL query language. Queries are mapped to federated knowledge sources, and answers are merged into a knowledge graph, with mappings between the knowledge graph and specific elements of the query. The TranQL interactive web application includes a user interface to support user exploration of the federated knowledge graph. Results: We developed 2 real-world use cases to validate TranQL and address biomedical questions of relevance to translational science. The use cases posed questions that traversed 2 federated Translator API endpoints: Integrated Clinical and Environmental Exposures Service (ICEES) and Reasoning Over Biomedical Objects linked in Knowledge Oriented Pathways (ROBOKOP). ICEES provides open access to observational clinical and environmental data, and ROBOKOP provides access to linked biomedical entities, such as “gene,” “chemical substance,” and “disease,” that are derived largely from curated public data sources. We successfully posed queries to TranQL that traversed these endpoints and retrieved answers that we visualized and evaluated. Conclusions: TranQL can be used to ask questions of relevance to translational science, rapidly obtain answers that require assertions from a federation of knowledge sources, and provide valuable insights for translational research and clinical practice.

Journal ArticleDOI
16 Jun 2020
TL;DR: In this paper, the authors developed quantitative structure-property relationship (QSPR) models that predict the stability of the complexes formed by a popular poorly soluble antibiotic, cefuroxime axetil (CA) and different CDs.
Abstract: The poor aqueous solubility of active pharmaceutical ingredients (APIs) places a limit on their therapeutic potential. Cyclodextrins (CDs) have been shown to improve the solubility of APIs, but the magnitude of the improvement depends on the structure of both the CDs and APIs. We have developed quantitative structure–property relationship (QSPR) models that predict the stability of the complexes formed by a popular poorly soluble antibiotic, cefuroxime axetil (CA) and different CDs. We applied this model to five CA–CD systems not included in the modeling set. Two out of three systems predicted to have poor stability and poor CA solubility, and both CA–CD systems predicted to have high stability and high CA solubility were confirmed experimentally. One of the CDs that significantly improved CA solubility, methyl-βCD, is described here for the first time, and we propose this CD as a novel promising excipient. Computational approaches and models developed and validated in this study could help accelerate the development of multifunctional CDs-based formulations.


Posted ContentDOI
18 Jun 2020-ChemRxiv
TL;DR: COVID-KOP is a new knowledgebase integrating the existing ROBOKOP biomedical knowledge graph with information from recent biomedical literature on COVID-19 annotated in the CORD-19 collection that can be used effectively to test new hypotheses concerning repurposing of known drugs and clinical drug candidates against CO VID-19.
Abstract: In response to the COVID-19 pandemic, we established COVID-KOP, a new knowledgebase integrating the existing ROBOKOP biomedical knowledge graph with information from recent biomedical literature on COVID-19 annotated in the CORD-19 collection. COVID-KOP can be used effectively to test new hypotheses concerning repurposing of known drugs and clinical drug candidates against COVID-19. COVID-KOP is freely accessible at https://covidkop.renci.org/ . For code and instructions for the original ROBOKOP, see: https://github.com/NCATS-Gamma/robokop.

Posted Content
TL;DR: A model to extract drug-disease pairs of potential cures to diseases by a simple reasoning over the structure of spoken text from unstructured text sources and successfully identified that Omeprazole can help treat heartburn.
Abstract: Objective: We aim to learn potential novel cures for diseases from unstructured text sources. More specifically, we seek to extract drug-disease pairs of potential cures to diseases by a simple reasoning over the structure of spoken text. Materials and Methods: We use Google Cloud to transcribe podcast episodes of an NPR radio show. We then build a pipeline for systematically pre-processing the text to ensure quality input to the core classification model, which feeds to a series of post-processing steps for obtaining filtered results. Our classification model itself uses a language model pre-trained on PubMed text. The modular nature of our pipeline allows for ease of future developments in this area by substituting higher quality components at each stage of the pipeline. As a validation measure, we use ROBOKOP, an engine over a medical knowledge graph with only validated pathways, as a ground truth source for checking the existence of the proposed pairs. For the proposed pairs not found in ROBOKOP, we provide further verification using Chemotext. Results: We found 30.4% of our proposed pairs in the ROBOKOP database. For example, our model successfully identified that Omeprazole can help treat heartburn.We discuss the significance of this result, showing some examples of the proposed pairs. Discussion and Conclusion: The agreement of our results with the existing knowledge source indicates a step in the right direction. Given the plug-and-play nature of our framework, it is easy to add, remove, or modify parts to improve the model as necessary. We discuss the results showing some examples, and note that this is a potentially new line of research that has further scope to be explored. Although our approach was originally oriented on radio podcast transcripts, it is input-agnostic and could be applied to any source of textual data and to any problem of interest.

Posted ContentDOI
26 Nov 2020-ChemRxiv
TL;DR: The CO VID-19 Knowledge Extractor (COKE), a web application to extract, curate, and annotate essential drug-target relationships from the research literature on COVID-19 to assist drug repurposing efforts, is built.
Abstract: Objective: The COVID-19 pandemic has catalyzed a widespread effort to identify drug candidates and biological targets of relevance to SARS-COV-2 infection, which resulted in large numbers of publications on this subject. We have built the COVID-19 Knowledge Extractor (COKE), a web application to extract, curate, and annotate essential drug-target relationships from the research literature on COVID-19 to assist drug repurposing efforts. Materials and Methods: SciBiteAI ontological tagging of the COVID Open Research Dataset (CORD-19), a repository of COVID-19 scientific publications, was employed to identify drug-target relationships. Entity identifiers were resolved through lookup routines using UniProt and DrugBank. A custom algorithm was used to identify co-occurrences of protein and drug terms, and confidence scores were calculated for each entity pair.

Journal ArticleDOI
TL;DR: Text mining of research papers in the PubMed literature based on Word2Vec analysis followed by a simple similarity comparison or kNN modeling affords excellent predictions of protein-protein interactions between P53 and kinases, and should have wide applications in translational biomedical studies such as repurposing of existing drugs, drug-drug interaction, and elucidation of mechanisms of action for drugs.