scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Review on lazy learning regressors and their applications in QSAR.

01 May 2009-Combinatorial Chemistry & High Throughput Screening (Comb Chem High Throughput Screen)-Vol. 12, Iss: 4, pp 440-450
TL;DR: The present review deals with the second type of problem (regression) with specific attention to one of the most effective machine learning procedures, viz. lazy learning.
Abstract: Building accurate quantitative structure-activity relationships (QSAR) is important in drug design, environmental modeling, toxicology, and chemical property prediction. QSAR methods can be utilized to solve mainly two types of problems viz., pattern recognition, (or classification) where output is discrete (i.e. class information), e.g., active or non-active molecule, binding or non-binding molecule etc., and function approximation, (i.e. regression) where the output is continuous (e.g., actual activity prediction). The present review deals with the second type of problem (regression) with specific attention to one of the most effective machine learning procedures, viz. lazy learning. The methodologies of the algorithm along with the relevant technical information are discussed in detail. We also present three real life case studies to briefly outline the typical characteristics of the modeling formalism.
Citations
More filters
Journal ArticleDOI
TL;DR: The present chapter includes a brief overview of currently used SAR methods in LBDD followed by a more detailed presentation of issues and limitations associated with empirical energy functions and conformational sampling methods.
Abstract: A significant number of drug discovery efforts are based on natural products or high throughput screens from which compounds showing potential therapeutic effects are identified without knowledge of the target molecule or its 3D structure. In such cases computational ligand-based drug design (LBDD) can accelerate the drug discovery processes. LBDD is a general approach to elucidate the relationship of a compound's structure and physicochemical attributes to its biological activity. The resulting structure–activity relationship (SAR) may then act as the basis for the prediction of compounds with improved biological attributes. LBDD methods range from pharmacophore models identifying essential features of ligands responsible for their activity, quantitative structure–activity relationships (QSAR) yielding quantitative estimates of activities based on physiochemical properties, and to similarity searching, which explores compounds with similar properties as well as various combinations of the above. A number of recent LBDD approaches involve the use of multiple conformations of the ligands being studied. One of the basic components to generate multiple conformations in LBDD is molecular mechanics (MM), which apply an empirical energy function to relate conformation to energies and forces. The collection of conformations for ligands is then combined with functional data using methods ranging from regression analysis to neural networks, from which the SAR is determined. Accordingly, for effective application of LBDD for SAR determinations it is important that the compounds be accurately modelled such that the appropriate range of conformations accessible to the ligands is identified. Such accurate modelling is largely based on use of the appropriate empirical force field for the molecules being investigated and the approaches used to generate the conformations. The present chapter includes a brief overview of currently used SAR methods in LBDD followed by a more detailed presentation of issues and limitations associated with empirical energy functions and conformational sampling methods.

72 citations

Journal ArticleDOI
TL;DR: The effects of the loss in validation as an indicator for evaluating the performance of the DL using the toxicity information in the Tox21 qHTP database suggest that optimal thresholds exist to attain the best performance with these prediction models.
Abstract: Numerous chemical compounds are distributed around the world and may affect the homeostasis of the endocrine system by disrupting the normal functions of hormone receptors. Although the risks associated with these compounds have been evaluated by acute toxicity testing in mammalian models, the chronic toxicity of many chemicals remains due to high cost of the compounds and the testing, etc. However, computational approaches may be promising alternatives and reduce these evaluations. Recently, deep learning (DL) has been shown to be promising prediction models with high accuracy for recognition of images, speech, signals, and videos since it greatly benefits from large datasets. Recently, a novel DL-based technique called DeepSnap was developed to conduct QSAR analysis using three-dimensional images of chemical structures. It can be used to predict the potential toxicity of many different chemicals to various receptors without extraction of descriptors. DeepSnap has been shown to have a very high capacity in tests using Tox21 quantitative qHTP datasets. Numerous parameters must be adjusted to use the DeepSnap method but they have not been optimized. In this study, the effects of these parameters on the performance of the DL prediction model were evaluated in terms of the loss in validation as an indicator for evaluating the performance of the DL using the toxicity information in the Tox21 qHTP database. The relations of the parameters of DeepSnap such as (1) number of molecules per SDF split into (2) zoom factor percentage, (3) atom size for van der waals percentage, (4) bond radius, (5) minimum bond distance, and (6) bond tolerance, with the validation loss following quadratic function curves, which suggests that optimal thresholds exist to attain the best performance with these prediction models. Using the parameter values set with the best performance, the prediction model of chemical compounds for CAR agonist was built using 64 images, at 105° angle, with AUC of 0.791. Thus, based on these parameters, the proposed DeepSnap-DL approach will be highly reliable and beneficial to establish models to assess the risk associated with various chemicals.

26 citations


Cites methods from "Review on lazy learning regressors ..."

  • ...Therefore, various approximation methods have been developed to obtain an optimal combination for an approximate solution (Yap et al., 2007; Kulkarni et al., 2009)....

    [...]

Journal ArticleDOI
David Hecht1
TL;DR: In silico modeling of ADMET property models with QSAR and QSPR models has proven to be an effective approach for increasing the efficiency of small molecule drug discovery and development processes.
Abstract: In silico modeling of ADMET property models with QSAR and QSPR models has proven to be an effective approach for increasing the efficiency of small molecule drug discovery and development processes. Development of new, improved models and techniques is currently an active area of research. In recent years, there has been growing interest in adapting tools and techniques from the fields of computational intelligence and machine learning for use in drug discovery and development. This report reviews some of the more popular applications. Drug Dev Res 72: 53–65, 2011. © 2010 Wiley-Liss, Inc.

20 citations


Cites background or methods from "Review on lazy learning regressors ..."

  • ...…from the broad field of computational intelligence (CI) [Zernov et al., 2003; Plewczynski et al., 2006, 2009; Yap et al., 2006; Duch et al., 2007; Burton et al., 2009; Hecht and Fogel, 2009a,b; Hecht et al., 2009; Kulkarni et al., 2009; Ma et al., 2009; Mahé and Vert, 2009; Melville et al., 2009]....

    [...]

  • ...The development of improved and more efficient strategies for QSAR and QSPR modeling is a very active area of research and recently there has been great interest in adopting many of the tools and techniques from the broad field of computational intelligence (CI) [Zernov et al., 2003; Plewczynski et al., 2006, 2009; Yap et al., 2006; Duch et al., 2007; Burton et al., 2009; Hecht and Fogel, 2009a,b; Hecht et al., 2009; Kulkarni et al., 2009; Ma et al., 2009; Mahé and Vert, 2009; Melville et al., 2009]....

    [...]

Journal ArticleDOI
TL;DR: A variation of the Gaussian process model which extends its applicability to the larger data sets common in the industrial drug discovery space, making it relatively novel in the quantitative structure-activity relationship (QSAR) field by incorporating locality-sensitive hashing for fast nearest neighbor searches.
Abstract: While Gaussian process models are typically restricted to smaller data sets, we propose a variation which extends its applicability to the larger data sets common in the industrial drug discovery space, making it relatively novel in the quantitative structure-activity relationship (QSAR) field. By incorporating locality-sensitive hashing for fast nearest neighbor searches, the nearest neighbor Gaussian process model makes predictions with time complexity that is sub-linear with the sample size. The model can be efficiently built, permitting rapid updates to prevent degradation as new data is collected. Given its small number of hyperparameters, it is robust against overfitting and generalizes about as well as other common QSAR models. Like the usual Gaussian process model, it natively produces principled and well-calibrated uncertainty estimates on its predictions. We compare this new model with implementations of random forest, light gradient boosting, and k-nearest neighbors to highlight these promising advantages. The code for the nearest neighbor Gaussian process is available at https://github.com/Merck/nngp.

6 citations