scispace - formally typeset
Search or ask a question

Showing papers by "Francesca Grisoni published in 2019"


Journal ArticleDOI
TL;DR: An ensemble machine learning model is presented to design potent ACPs and four counter-propagation artificial neural-networks were trained to identify peptides that kill breast and/or lung cancer cells.
Abstract: Membranolytic anticancer peptides (ACPs) are drawing increasing attention as potential future therapeutics against cancer, due to their ability to hinder the development of cellular resistance and their potential to overcome common hurdles of chemotherapy, e.g., side effects and cytotoxicity. In this work, we present an ensemble machine learning model to design potent ACPs. Four counter-propagation artificial neural-networks were trained to identify peptides that kill breast and/or lung cancer cells. For prospective application of the ensemble model, we selected 14 peptides from a total of 1000 de novo designs, for synthesis and testing in vitro on breast cancer (MCF7) and lung cancer (A549) cell lines. Six de novo designs showed anticancer activity in vitro, five of which against both MCF7 and A549 cell lines. The novel active peptides populate uncharted regions of ACP sequence space.

36 citations


Journal ArticleDOI
TL;DR: Novel in silico models to identify organic AR modulators in the context of the Collaborative Modeling Project of Androgen Receptor Activity are described, based on a consensus of a multivariate Bernoulli Naive Bayes, a Random Forest, and N-Nearest Neighbor classification models.
Abstract: The nuclear androgen receptor (AR) is one of the most relevant biological targets of Endocrine Disrupting Chemicals (EDCs), which produce adverse effects by interfering with hormonal regulation and endocrine system functioning. This paper describes novel in silico models to identify organic AR modulators in the context of the Collaborative Modeling Project of Androgen Receptor Activity (CoMPARA), coordinated by the National Center of Computational Toxicology (U.S. Environmental Protection Agency). The collaborative project involved 35 international research groups to prioritize the experimental tests of approximatively 40k compounds, based on the predictions provided by each participant. In this paper, we describe our machine learning approach to predict the binding to AR, which is based on a consensus of a multivariate Bernoulli Naive Bayes, a Random Forest, and N-Nearest Neighbor classification models. The approach was developed in compliance with the Organization of Economic Cooperation and Development...

33 citations


Journal ArticleDOI
TL;DR: New Quantitative Structure‐Activity Relationship (QSAR) models for the prediction of very toxic and nontoxic endpoints were developed and demonstrated to be robust and predictive, as determined by a blind validation on a set of external molecules provided in a later stage by the coordinators of the collaborative project.
Abstract: The ICCVAM Acute Toxicity Workgroup (U.S. Department of Health and Human Services), in collaboration with the U.S. Environmental Protection Agency (U.S. EPA, National Center for Computational Toxicology), coordinated the "Predictive Models for Acute Oral Systemic Toxicity" collaborative project to develop in silico models to predict acute oral systemic toxicity for filling regulatory needs. In this framework, new Quantitative Structure-Activity Relationship (QSAR) models for the prediction of very toxic (LD50 lower than 50 mg/kg) and nontoxic (LD50 greater than or equal to 2,000 mg/kg) endpoints were developed, as described in this study. Models were developed on a large set of chemicals (8992), provided by the project coordinators, considering the five OCED principles for QSAR applicability to regulatory endpoints. A Bayesian consensus approach integrating three different classification QSAR algorithms was applied as modelling method. For both the considered endpoints, the proposed approach demonstrated to be robust and predictive, as determined by a blind validation on a set of external molecules provided in a later stage by the coordinators of the collaborative project. Finally, the integration of predictions obtained for the very toxic and nontoxic endpoints allowed the identification of compounds associated to medium toxicity, as well as the analysis of consistency between the predictions obtained for the two endpoints on the same molecules. Predictions of the proposed consensus approach will be integrated with those originated from models proposed by the participants of the collaborative project to facilitate the regulatory acceptance of in-silico predictions and thus reduce or replace experimental tests for acute toxicity.

33 citations


Journal ArticleDOI
TL;DR: The intent of this work is to provide clarity on the correct and incorrect uses of QF32, discussing its behavior towards the training data distribution and illustrating some cases in which QF 32 estimates may be misleading.
Abstract: Quantitative Structure - Activity Relationship (QSAR) models play a central role in medicinal chemistry, toxicology and computer-assisted molecular design, as well as a support for regulatory decisions and animal testing reduction. Thus, assessing their predictive ability becomes an essential step for any prospective application. Many metrics have been proposed to estimate the model predictive ability of QSARs, which have created confusion on how models should be evaluated and properly compared. Recently, we showed that the metric Q F 3 2 is particularly well-suited for comparing the external predictivity of different models developed on the same training dataset. However, when comparing models developed on different training data, this function becomes inadequate and only dispersion measures like the root-mean-square error (RMSE) should be used. The intent of this work is to provide clarity on the correct and incorrect uses of Q F 3 2 , discussing its behavior towards the training data distribution and illustrating some cases in which Q F 3 2 estimates may be misleading. Hereby, we encourage the usage of measures of dispersions when models trained on different datasets have to be compared and evaluated.

28 citations


Journal ArticleDOI
TL;DR: Two of the computer‐generated hits possess an expanded spectrum of bioactivity on targets relevant to the treatment of Alzheimer's disease and are suitable for hit‐to‐lead expansion.
Abstract: A virtual screening protocol based on machine learning models was used to identify mimetics of the natural product (-)-galantamine. This fully automated approach identified eight compounds with bioactivities on at least one of the macromolecular targets of (-)-galantamine, with different polypharmacological profiles. Two of the computer-generated hits possess an expanded spectrum of bioactivity on targets relevant to the treatment of Alzheimer's disease and are suitable for hit-to-lead expansion. These results advocate multitarget drug design by advanced virtual screening protocols based on chemically informed machine learning models.

25 citations


Journal ArticleDOI
18 Dec 2019-Chimia
TL;DR: In this article, a review of de novo molecular design approaches from the field of artificial intelligence focusing on instances of deep generative models, and highlight the prospective application of long short-term memory models to hit and lead finding in medicinal chemistry.
Abstract: Drug discovery benefits from computational models aiding the identification of new chemical matter with bespoke properties. The field of de novo drug design has been particularly revitalized by adaptation of generative machine learning models from the field of natural language processing. These deep neural network models are trained on recognizing molecular structures and generate new molecular entities without relying on pre-determined sets of molecular building blocks and chemical transformations for virtual molecule construction. Implicit representation of chemical knowledge provides an alternative to formulating the molecular design task in terms of the established, explicit chemical vocabulary. Here, we review de novo molecular design approaches from the field of 'artificial intelligence', focusing on instances of deep generative models, and highlight the prospective application of long short-term memory models to hit and lead finding in medicinal chemistry.

17 citations


Journal ArticleDOI
TL;DR: In this article, a decision-support system based on structural alerts is proposed for the identification of substances with bioaccumulation potential, which can be integrated with other sources of information, such as experimental and in silico data, to reduce the uncertainty of the assessment, thereby supporting a weight-of-evidence approach.
Abstract: Legislators have included bioaccumulation in the evaluation of chemicals in the framework of the European Registration, Evaluation, Authorization, and Restriction of Chemicals (REACH) regulation. REACH requires information on the bioconcentration factor (BCF), which is a parameter for assessing bioaccumulation and encourages the use of a weight-of-evidence approach, including predictions from quantitative structure-activity relationships (QSARs). This study presents a novel approach, based on structural alerts, to be used as a decision-support system for the identification of substances with bioaccumulation potential. In a regulatory framework, these alerts can be integrated with other sources of information, such as experimental and in silico data, to reduce the uncertainty of the assessment, thereby supporting a weight-of-evidence approach. Moreover, the identified alerts have a direct connection with relevant structural features, thus fostering the applicability and interpretability of the approach. The structural alerts were identified on 779 chemicals annotated for their fish BCF, and the approach was then validated on 278 external molecules. The developed decision-support system allowed identification of 77% of bioaccumulative chemicals and was competitive with more complex QSAR models used in regulatory assessments. The approach is implemented in an easy-to-use workflow, provided free of charge. Integr Environ Assess Manag 2019;15:19-28. © 2018 SETAC.

10 citations


Journal ArticleDOI
TL;DR: The ensemble machine‐learning model identified six new FXR modulators from a library of 3 million compounds, and these computationally identified bioactive compounds possess four novel scaffolds and appreciably expand the chemical space of known FXRmodulators.
Abstract: The Front Cover shows the application of machine‐learning methods to expand the chemical space of farnesoid X receptor (FXR)‐targeting small molecules, by employing an ensemble of three complementary machine‐learning approaches (counter‐propagation artificial neural network, k‐nearest neighbor learner, and three‐dimensional pharmacophore model). The ensemble machine‐learning model identified six new FXR modulators from a library of 3 million compounds. These computationally identified bioactive compounds possess four novel scaffolds and appreciably expand the chemical space of known FXR modulators. More information can be found in the Full Paper by D. Merk et al. on page 7 in Issue 1, 2019 (DOI: 10.1002/open.201800156).

9 citations


Journal ArticleDOI
TL;DR: This work presents the first-time QSAR approach to predict the laboratory-based fish biomagnification factor (BMF) of organic chemicals, to be used as a supporting tool for assessing bioaccumulation at the regulatory level.
Abstract: This work presents the first-time QSAR approach to predict the laboratory-based fish biomagnification factor (BMF) of organic chemicals, to be used as a supporting tool for assessing bioaccumulation at the regulatory level. The developed strategy is based on 2 levels of prediction, with a varying trade-off between interpretability and performance according to the user's needs. We designed our models to be intrinsically acceptable at the regulatory level (in what we defined as "acceptable-by-design" strategy), by (i) complying with OECD principles directly in the approach development phase, (ii) choosing easy-to-apply modeling techniques, (iii) preferring simple descriptors when possible, and (iv) striving to provide data-driven mechanistic insights. Our novel tool has an error comparable to the observed experimental inter- and intraspecies variability and is stable on borderline compounds (root mean square error [RMSE] ranging from RMSE = 0.45 to RMSE = 0.45 log units on test data). Additionally, the models' molecular descriptors are carefully described and interpreted, allowing us to gather additional mechanistic insights into the structural features controlling the dietary bioaccumulation of chemicals in fish. To improve the transparency and promote the application of the model, the data set and the stand alone prediction tool are provided free of charge at https://github.com/grisoniFr/bmf_qsar Integr Environ Assess Manag 2019;15:51-63. © 2018 SETAC.

8 citations


Journal ArticleDOI
TL;DR: The present paper summarizes the special series articles and highlights their contribution to the topic of increasing the regulatory applicability of effect models, and describes the main research needs for both TK-TD and QSAR approaches.
Abstract: This paper concludes a special series of 7 articles (4 on toxicokinetic-toxicodynamic [TK-TD] models and 3 on quantitative structure-activity relationship [QSAR] models) published in previous issues of Integrated Environmental Assessment and Management (IEAM). The present paper summarizes the special series articles and highlights their contribution to the topic of increasing the regulatory applicability of effect models. For both TK-TD and QSAR approaches, we then describe the main research needs. The use of TK-TD models for describing sublethal effects must be better developed, particularly through the improvement of the dynamic energy budget (DEBtox) approach. The potential of TK-TD models for moving from lower (molecular) to higher (population) hierarchical levels is highlighted as a promising research line. Some relevant issues to improve the acceptance of QSAR models at the regulatory level are also described, such as increased transparency of the performance assessment and of the modeling algorithms, model documentation, relevance of the chosen target for regulatory needs, and improved mechanistic interpretability. Integr Environ Assess Manag 2019;00:000-000. © 2019 SETAC.

6 citations


Journal ArticleDOI
TL;DR: A new ranking approach is presented, named Deep Ranking Analysis by Kendall Eigenvectors (DRAKE), which is based on the Power-Weakness Ratio analysis and provides a set of sequential rankings that allows to gather deeper insights into the analysed dataset.

Posted ContentDOI
07 Nov 2019-ChemRxiv
TL;DR: A deep learning framework for customized compound library generation is presented, aiming to enrich and expand the pharmacologically relevant chemical space with new molecular entities ‘on demand’.
Abstract: Generative machine learning models sample drug-like molecules from chemical space without the need for explicit design rules. A deep learning framework for customized compound library generation is presented, aiming to enrich and expand the pharmacologically relevant chemical space with new molecular entities ‘on demand’. This de novo design approach was used to generate molecules that combine features from bioactive synthetic compounds and natural products, which are a primary source of inspiration for drug discovery. The results show that the data-driven machine intelligence acquires implicit chemical knowledge and generates novel molecules with bespoke properties and structural diversity. The method is available as an open-access tool for medicinal and bioorganic chemistry.

Journal ArticleDOI
TL;DR: With four novel FXR ligand scaffolds, these computationally identified bioactive compounds appreciably expand the chemical space of known FXR modulators and may serve as starting points for hit‐to‐lead expansion.
Abstract: The bile acid activated transcription factor farnesoid X receptor (FXR) has revealed therapeutic potential as a molecular drug target for the treatment of hepatic and metabolic disorders. Despite strong efforts in FXR ligand development, the structural diversity among the known FXR modulators is limited. Only four molecular frameworks account for more than 50 % of the FXR modulators annotated in ChEMBL. Here, we leverage machine learning methods to expand the chemical space of FXR-targeting small molecules by employing an ensemble of three complementary machine learning approaches. A counter-propagation artificial neural network, a k-nearest neighbor learner, and a three-dimensional pharmacophore descriptor were combined to retrieve novel FXR ligands from a collection of more than 3 million compounds. The ensemble machine learning model identified six new FXR modulators among ten top-ranked candidates. These active hits comprise both FXR activators and antagonists with micromolar potencies. With four novel FXR ligand scaffolds, these computationally identified bioactive compounds appreciably expand the chemical space of known FXR modulators and may serve as starting points for hit-to-lead expansion.