TL;DR: Support vector inductive logic programming (SVILP) as mentioned in this paper is a general approach, which extends the essentially qualitative ILP-based structure activity relationship (SAR) to quantitative modeling, and is used to learn rules, the predictions of which are then used within a novel kernel to derive a support vector generalization model.
Abstract: There is a pressing need for accurate in silico methods to predict the toxicity of molecules that are being introduced into the environment or are being developed into new pharmaceuticals. Predictive toxicology is in the realm of structure activity relationships (SAR), and many approaches have been used to derive such SAR. Previous work has shown that inductive logic programming (ILP) is a powerful approach that circumvents several major difficulties, such as molecular superposition, faced by some other SAR methods. The ILP approach reasons with chemical substructures within a relational framework and yields chemically understandable rules. Here, we report a general new approach, support vector inductive logic programming (SVILP), which extends the essentially qualitative ILP-based SAR to quantitative modeling. First, ILP is used to learn rules, the predictions of which are then used within a novel kernel to derive a support-vector generalization model. For a highly heterogeneous dataset of 576 molecules ...
With more than 70 000 chemicals in use today and many more being synthesized, it is vital that there are effective methods to assess the effect of these compounds on the environment and on human health.
Using a recently available dataset of toxicity DSSTox, 10 which provides the toxicities of 576 chemicals for fathead minnow, the authors show that SVILP yields significantly better accuracies than ILP, regression from chemical descriptors, and an industry standard method TOPKAT.
Importantly, the learned logic rules are readily amenable to interpretation as chemical substructures related to activity and thereby provide extensive chemical insights.
METHODS
The SVILP approach 9 uses ILP for learning logic rules, followed by quantitative modeling based on support vector technology as shown in Figure 1 .
The logic relations identify the chemical fragments according to the atom and bond details of the MOL2 structures.
These learned rules form the input for quantitative prediction using the newly developed method SVILP.
One fold is used as a testing set, and the four other folds are for training.
All molecules above the mean value of toxicities in the training set are considered to be positive (more toxic), and the remaining are considered to be negative (less toxic).
RESULTS
The average accuracies of predictions over five folds using chemical descriptor method (CHEM), ILP rules in combination with PLS, and SVILP are given in Table 2a .
In the second part of this study, the molecules were classified into two groups based on their toxicities: that is, toxic (pLC 50 g mean) and nontoxic (pLC 50 < mean), where "mean" is the average of toxicities of molecules in the training set.
For majority of rules, the distances between the chemical fragments are also defined, thereby identifying the relative location of the a C is the compression; p and n are the number of positives and negatives covered by the rule, respectively.
Such chlorinated compounds show toxicity, particularly in aromatic compounds.
In the previous sections, the authors compared the SVILP with four methods: that is, ILP, CHEM, PLS, and TOPKAT.
DISCUSSION
The authors introduced a new quantitative logic-based method, support vector inductive logic programming , which uses the logic-based technology to learn logic rules followed by regression.
The results of this study on a large, public, and diverse dataset show that SVILP predicts the toxicities with higher accuracy than other tested models.
One could interpret the higher accuracy of the SVILP and PLS as a consequence of using more features.
The rules are chemically understandable and describe the chemical alerts which are the cause of activity/toxicity.
The program automatically and consistently detects chemical substructures and properties by construction of rules which are general.
TL;DR: SVMs are currently among the best-performing approaches for chemical and biological property prediction and the computational identification of active compounds and it is anticipated that their use in drug discovery will further increase.
Abstract: Introduction: Support vector machines (SVMs) are supervised machine learning algorithms for binary class label prediction and regression-based prediction of property values. In recent years, SVMs h...
96 citations
Cites methods from "A Novel Logic‐Based Approach for Qu..."
...[66] described the application of ‘support vector inductive logic programming’ (SVILP) that combined inductive logic programming (ILP) and SVMs in the context of compound toxicity prediction....
[...]
...On the basis of the resulting rules, SVR models were trained to facilitate quantitative toxicity predictions [66]....
TL;DR: The VirtualToxLab is an in silico technology for estimating the toxic potential--endocrine and metabolic disruption, some aspects of carcinogenicity and cardiotoxicity--of drugs, chemicals and natural products by interactively analyzing the binding mode of a compound with its target protein(s) in real-time 3D.
Abstract: The VirtualToxLab is an in silico technology for estimating the toxic potential--endocrine and metabolic disruption, some aspects of carcinogenicity and cardiotoxicity--of drugs, chemicals and natural products. The technology is based on an automated protocol that simulates and quantifies the binding of small molecules towards a series of currently 16 proteins, known or suspected to trigger adverse effects: 10 nuclear receptors (androgen, estrogen α, estrogen β, glucocorticoid, liver X, mineralocorticoid, peroxisome proliferator-activated receptor γ, progesterone, thyroid α, thyroid β), four members of the cytochrome P450 enzyme family (1A2, 2C9, 2D6, 3A4), a cytosolic transcription factor (aryl hydrocarbon receptor) and a potassium ion channel (hERG). The toxic potential of a compound--its ability to trigger adverse effects--is derived from its computed binding affinities toward these very proteins: the computationally demanding simulations are executed in client-server model on a Linux cluster of the University of Basel. The graphical-user interface supports all computer platforms, allows building and uploading molecular structures, inspecting and downloading the results and, most important, rationalizing any prediction at the atomic level by interactively analyzing the binding mode of a compound with its target protein(s) in real-time 3D. Access to the VirtualToxLab is available free of charge for universities, governmental agencies, regulatory bodies and non-profit organizations.
90 citations
Cites background from "A Novel Logic‐Based Approach for Qu..."
...…exists for these technologies (see, for example, Cronin et al., 2003; Veith, 2004; Helma, 2005; Piclin et al., 2006; Simon-Hettich et al., 2006; Amini et al., 2007; Aronov et al., 2007; Bender et al., 2007; Custer et al., 2007; Ecker and Chiba, 2007; Ekins, 2007; Serafimova et al., 2007; Enoch…...
[...]
...A large body of both review and research articles exists for these technologies (see, for example, Cronin et al., 2003; Veith, 2004; Helma, 2005; Piclin et al., 2006; Simon-Hettich et al., 2006; Amini et al., 2007; Aronov et al., 2007; Bender et al., 2007; Custer et al., 2007; Ecker and Chiba, 2007; Ekins, 2007; Serafimova et al., 2007; Enoch et al., 2008; Kavlock et al., 2008; Merlot, 2008; Pavan and Worth, 2008; Benfenati et al., 2009; Green and Naven, 2009; Nigsch et al., 2009; Spreafico et al., 2009; Valerio, 2009; Rossato et al., 2010; Cronin and Madden, 2010; Bars et al., 2011; Vuorinen et al., 2013; Gupta et al., 2013; Roncaglioni et al., 2013; Shah and Greene, 2014; Toropov et al., 2014; Schilter et al., 2014; Singh and Gupta, 2014; Ekins, 2014)....
TL;DR: The aim of this paper is to provide an insight into computational technologies that allow for the prediction of toxic effects triggered by pharmaceuticals, based on three-dimensional models of small molecules binding to such entities.
Abstract: Animal testing is still compulsory worldwide, for the approval of drugs and chemicals produced in large quantities Computer-assisted (in silico) technologies are considered to be efficient alterna
TL;DR: In silico models were developed for the prediction of chemical aquatic toxicity in different fish species, and information gain and ChemoTyper methods were used to identify toxic substructures, which could significantly correlate with chemical aquaticoxicity.
Abstract: Aquatic toxicity is an important endpoint in the evaluation of chemically adverse effects on ecosystems. In this study, in silico models were developed for the prediction of chemical aquatic toxicity in different fish species. Firstly, a large data set containing 6422 data points on aquatic toxicity with 1906 diverse chemicals was constructed. Using molecular descriptors and fingerprints to represent the molecules, local and global models were then developed with five machine learning methods based on three fish species (rainbow trout, fathead minnow and bluegill sunfish). For the local models, both binary and ternary classification models were obtained for each of the three fish species. For the global models, data of all the three fish species were used together. The predictive accuracy of both the local and global models was around 0.8 for the test sets. Moreover, data of the sheepshead minnow were used as an external validation set. For the best local model (model 2), the predictive accuracy was 0.875 for the sheepshead minnow, while for the best global model (model 14), the predictive accuracy was 0.872 for the sheepshead minnow. The FN compounds in model 2 and model 14 were 18 and 10, respectively. Hence, model 14 was the best model, and thus could predict the toxicity of other fish species’. Furthermore, information gain and ChemoTyper methods were used to identify toxic substructures, which could significantly correlate with chemical aquatic toxicity. This study provides critical tools for an early evaluation of chemical aquatic toxicity in an environmental hazard assessment.
TL;DR: This study evaluates the use of several Machine Learning algorithms to find useful rules to the elucidation and prediction of toxicity using 1D and 2D molecular descriptors and indicates that machine learning algorithms can effectively use 1D Molecular descriptors to construct accurate and simple models.
Abstract: The rational development of new drugs is a complex and expensive process, comprising several steps. Typically, it starts by screening databases of small organic molecules for chemical structures with potential of binding to a target receptor and prioritizing the most promising ones. Only a few of these will be selected for biological evaluation and further refinement through chemical synthesis. Despite the accumulated knowledge by pharmaceutical companies that continually improve the process of finding new drugs, a myriad of factors affect the activity of putative candidate molecules in vivo and the propensity for causing adverse and toxic effects is recognized as the major hurdle behind the current "target-rich, lead-poor" scenario. In this study we evaluate the use of several Machine Learning algorithms to find useful rules to the elucidation and prediction of toxicity using 1D and 2D molecular descriptors. The results indicate that: i) Machine Learning algorithms can effectively use 1D molecular descriptors to construct accurate and simple models; ii) extending the set of descriptors to include 2D descriptors improve the accuracy of the models.
10 citations
Cites methods or result from "A Novel Logic‐Based Approach for Qu..."
...The problem of estimating the toxicity of drugs has been addressed, mainly, from three methods: i) regression from physical-chemical properties; ii) expert systems and; iii) machine learning [10, 11]....
[...]
...In [10] the ILP (Inductive logic programming) approach was used with support vector machines to extends the essentially qualitative ILP-based SAR to quantitative modelling....
[...]
...Although similar studies have been reported [3, 4, 10, 14], they did not assess the relevancy of molecular descriptors in terms of toxicity prediction....
[...]
...Besides the commercially available programs, other studies have been published using machine learning approaches [3, 13, 10, 6, 11]....
TL;DR: The SVILP approach has a major advantage in that it uses ILP automatically and consistently to derive rules, mostly novel, describing fragments that are toxicity alerts, and has the potential of tackling many problems relevant to chemoinformatics including in silico drug design.
Abstract: There is a pressing need for accurate in silico methods to predict the toxicity of molecules that are being introduced into the environment or are being developed into new pharmaceuticals. Predictive toxicology is in the realm of structure activity relationships (SAR), and many approaches have been used to derive such SAR. Previous work has shown that inductive logic programming (ILP) is a powerful approach that circumvents several major difficulties, such as molecular superposition, faced by some other SAR methods. The ILP approach reasons with chemical substructures within a relational framework and yields chemically understandable rules. Here, we report a general new approach, support vector inductive logic programming (SVILP), which extends the essentially qualitative ILP-based SAR to quantitative modeling. First, ILP is used to learn rules, the predictions of which are then used within a novel kernel to derive a support-vector generalization model. For a highly heterogeneous dataset of 576 molecules ...