scispace - formally typeset
Search or ask a question

Showing papers by "Alexander Tropsha published in 2018"


Journal ArticleDOI
TL;DR: The ReLeaSE method is used to design chemical libraries with a bias toward structural complexity or toward compounds with maximal, minimal, or specific range of physical properties, such as melting point or hydrophobicity.
Abstract: We have devised and implemented a novel computational strategy for de novo design of molecules with desired properties termed ReLeaSE (Reinforcement Learning for Structural Evolution). On the basis of deep and reinforcement learning (RL) approaches, ReLeaSE integrates two deep neural networks—generative and predictive—that are trained separately but are used jointly to generate novel targeted chemical libraries. ReLeaSE uses simple representation of molecules by their simplified molecular-input line-entry system (SMILES) strings only. Generative models are trained with a stack-augmented memory network to produce chemically feasible SMILES strings, and predictive models are derived to forecast the desired properties of the de novo–generated compounds. In the first phase of the method, generative and predictive models are trained separately with a supervised learning algorithm. In the second phase, both models are trained jointly with the RL approach to bias the generation of new chemical structures toward those with the desired physical and/or biological properties. In the proof-of-concept study, we have used the ReLeaSE method to design chemical libraries with a bias toward structural complexity or toward compounds with maximal, minimal, or specific range of physical properties, such as melting point or hydrophobicity, or toward compounds with inhibitory activity against Janus protein kinase 2. The approach proposed herein can find a general use for generating targeted chemical libraries of novel compounds optimized for either a single desired property or multiple properties.

792 citations


Journal ArticleDOI
TL;DR: A bibliometric review of drug repurposing by scanning >25 million papers in PubMed and using text-mining methods to gather, count and analyze chemical-disease therapeutic relationships finds that >60% of the ∼35,000 drugs or drug candidates identified in this study have been tried in more than one disease.

155 citations


Journal ArticleDOI
TL;DR: AFLOW-ML as mentioned in this paper is a RESTful API to access the continuously updated algorithms, which can be transparently integrated into any workflow to retrieve predictions of electronic, thermal and mechanical properties.

72 citations


Journal ArticleDOI
TL;DR: Toxicity values predicted from QSAR models developed in this study were more accurate and precise than those based on HTS assays or mean-based predictions and can fill a critical gap in the risk assessment and management of data-poor chemicals.
Abstract: Background: Human health assessments synthesize human, animal, and mechanistic data to produce toxicity values that are key inputs to risk-based decision making. Traditional assessments are data-, ...

46 citations


Journal ArticleDOI
TL;DR: This work proposes a simple, fast, and reliable method termed Multi-Descriptor Read Across (MuDRA) for developing both accurate and interpretable models and finds that models built with MuDRA show consistently high external accuracy similar to that of conventional QSAR models.
Abstract: Multiple approaches to quantitative structure–activity relationship (QSAR) modeling using various statistical or machine learning techniques and different types of chemical descriptors have been developed over the years. Oftentimes models are used in consensus to make more accurate predictions at the expense of model interpretation. We propose a simple, fast, and reliable method termed Multi-Descriptor Read Across (MuDRA) for developing both accurate and interpretable models. The method is conceptually related to the well-known kNN approach but uses different types of chemical descriptors simultaneously for similarity assessment. To benchmark the new method, we have built MuDRA models for six different end points (Ames mutagenicity, aquatic toxicity, hepatotoxicity, hERG liability, skin sensitization, and endocrine disruption) and compared the results with those generated with conventional consensus QSAR modeling. We find that models built with MuDRA show consistently high external accuracy similar to tha...

33 citations


Journal ArticleDOI
TL;DR: The development of Chemotext is described, a publicly available Web server that mines the entire compendium of published literature in PubMed annotated by Medline Subject Heading (MeSH) terms, to identify all known DTD relationships and infer missing links between vertices of the DTD triangle.
Abstract: Elucidation of the mechanistic relationships between drugs, their targets, and diseases is at the core of modern drug discovery research. Thousands of studies relevant to the drug–target–disease (DTD) triangle have been published and annotated in the Medline/PubMed database. Mining this database affords rapid identification of all published studies that confirm connections between vertices of this triangle or enable new inferences of such connections. To this end, we describe the development of Chemotext, a publicly available Web server that mines the entire compendium of published literature in PubMed annotated by Medline Subject Heading (MeSH) terms. The goal of Chemotext is to identify all known DTD relationships and infer missing links between vertices of the DTD triangle. As a proof-of-concept, we show that Chemotext could be instrumental in generating new drug repurposing hypotheses or annotating clinical outcomes pathways for known drugs. The Chemotext Web server is freely available at http://chemo...

33 citations


Journal ArticleDOI
TL;DR: Compounds identified in this study are among the most potent and well-characterized anti-EBOV inhibitors reported to date.
Abstract: The Ebola virus (EBOV) causes severe human infection that lacks effective treatment. A recent screen identified a series of compounds that block EBOV-like particle entry into human cells. Using data from this screen, quantitative structure–activity relationship models were built and employed for virtual screening of a ∼17 million compound library. Experimental testing of 102 hits yielded 14 compounds with IC50 values under 10 μM, including several sub-micromolar inhibitors, and more than 10-fold selectivity against host cytotoxicity. These confirmed hits include FDA-approved drugs and clinical candidates with non-antiviral indications, as well as compounds with novel scaffolds and no previously known bioactivity. Five selected hits inhibited BSL-4 live-EBOV infection in a dose-dependent manner, including vindesine (0.34 μM). Additional studies of these novel anti-EBOV compounds revealed their mechanisms of action, including the inhibition of NPC1 protein, cathepsin B/L, and lysosomal function. Compounds i...

31 citations


Journal ArticleDOI
TL;DR: A new comprehensive approach is proposed, which integrates multiple QSAR models developed with in vitro, in chemico, animal, and human data, and a Naive Bayes model for predicting human skin sensitization.
Abstract: Traditionally, the skin sensitization potential of chemicals has been assessed using animal models. Due to growing ethical, political, and financial concerns, sustainable alternatives to animal testing need to be developed. As publicly available skin sensitization data continues to grow, computational approaches, such as alert-based systems, read-across, and QSAR models, are expected to reduce or replace animal testing for the prediction of human skin sensitization potential. Herein, we discuss current computational approaches to predicting skin sensitization and provide future perspectives of the field. As a proof-of-concept study, we have compiled the largest skin sensitization data set in the public domain and benchmarked several methods for building skin sensitization models. We propose a new comprehensive approach, which integrates multiple QSAR models developed with in vitro, in chemico, animal, and human data, and a Naive Bayes model for predicting human skin sensitization. Both the data sets and t...

30 citations


Book ChapterDOI
22 Jun 2018

7 citations


Journal ArticleDOI
TL;DR: The presented workflow, based on free-access databases and an association-based inference scheme, provided novel C–E relationships that have been validated post hoc in case reports and may provide an effective computational method for the early detection of potential drug candidate ADEs that can be followed by targeted experimental investigations.
Abstract: Given that adverse drug effects (ADEs) have led to post-market patient harm and subsequent drug withdrawal, failure of candidate agents in the drug development process, and other negative outcomes, it is essential to attempt to forecast ADEs and other relevant drug–target–effect relationships as early as possible. Current pharmacologic data sources, providing multiple complementary perspectives on the drug–target–effect paradigm, can be integrated to facilitate the inference of relationships between these entities. This study aims to identify both existing and unknown relationships between chemicals (C), protein targets (T), and ADEs (E) based on evidence in the literature. Cheminformatics and data mining approaches were employed to integrate and analyze publicly available clinical pharmacology data and literature assertions interrelating drugs, targets, and ADEs. Based on these assertions, a C–T–E relationship knowledge base was developed. Known pairwise relationships between chemicals, targets, and ADEs were collected from several pharmacological and biomedical data sources. These relationships were curated and integrated according to Swanson’s paradigm to form C–T–E triangles. Missing C–E edges were then inferred as C–E relationships. Unreported associations between drugs, targets, and ADEs were inferred, and inferences were prioritized as testable hypotheses. Several C–E inferences, including testosterone → myocardial infarction, were identified using inferences based on the literature sources published prior to confirmatory case reports. Timestamping approaches confirmed the predictive ability of this inference strategy on a larger scale. The presented workflow, based on free-access databases and an association-based inference scheme, provided novel C–E relationships that have been validated post hoc in case reports. With refinement of prioritization schemes for the generated C–E inferences, this workflow may provide an effective computational method for the early detection of potential drug candidate ADEs that can be followed by targeted experimental investigations.

4 citations


Journal ArticleDOI
TL;DR: It is demonstrated that CWAS provides a new framework to interpret predictive QSAR models and derive refined structural alerts for more effective design and safety assessment of drugs and drug candidates.
Abstract: Quantitative structure-activity relationships (QSAR) models are often seen as a "black box" because they are considered difficult to interpret. Meanwhile, qualitative approaches, e.g., structural alerts (SA) or read-across, provide mechanistic insight, which is preferred for regulatory purposes, but predictive accuracy of such approaches is often low. Herein, we introduce the chemistry-wide association study (CWAS) approach, a novel framework that both addresses such deficiencies and combines advantages of statistical QSAR and alert-based approaches. The CWAS framework consists of the following steps: (i) QSAR model building for an end point of interest, (ii) identification of key chemical features, (iii) determination of communities of such features disproportionately co-occurring more frequently in the active than in the inactive class, and (iv) assembling these communities to form larger (and not necessarily chemically connected) novel structural alerts with high specificity. As a proof-of-concept, we have applied CWAS to model Ames mutagenicity and Stevens-Johnson Syndrome (SJS). For the well-studied Ames mutagenicity data set, we identified 76 important individual fragments and assembled co-occurring fragments into SA both replicative of known as well as representing novel mutagenicity alerts. For the SJS data set, we identified 29 important fragments and assembled co-occurring communities into SA including both known and novel alerts. In summary, we demonstrate that CWAS provides a new framework to interpret predictive QSAR models and derive refined structural alerts for more effective design and safety assessment of drugs and drug candidates.

Patent
20 Jul 2018
TL;DR: In this article, two deep neural networks -generative and predictive, represent the general workflow for de-no-drug drug discovery, which is based on deep learning and reinforcement learning techniques.
Abstract: The subject matter described herein includes computational methods, systems and non-transitory computer readable media for de-novo drug discovery, which is based on deep learning and reinforcement learning techniques. The subject matter described herein allows generating chemical compounds with desired properties. Two deep neural networks - generative and predictive, represent the general workflow. The process of training consists of two stages. During the first stage, both models are trained separately with supervised learning algorithms, and during the second stage, models are trained jointly with reinforcement learning approach. In this study, we conduct a computational experiment, which demonstrates the efficiency of proposed strategy to maximize, minimize or impose a desired range to a property. We also thoroughly evaluate our models with quantitative approaches and provide visualization and interpretation of internal representation vectors for both predictive and generative models.

OtherDOI
TL;DR: There is a novel growing source of absorption, distribution, metabolism, excretion, and toxicology (ADMET) data contributed by academic screening centers.