scispace - formally typeset
Search or ask a question
Author

Han van de Waterbeemd

Bio: Han van de Waterbeemd is an academic researcher. The author has contributed to research in topics: Quantitative structure–activity relationship. The author has an hindex of 1, co-authored 1 publications receiving 609 citations.

Papers
More filters
Book
01 Jan 1995
TL;DR: Molecular concepts experimental design in synthesis-planning and structure-property correlations multivariate analysis of chemical and biological data statistical validation of QSAR results.
Abstract: Molecular concepts experimental design in synthesis-planning and structure-property correlations multivariate analysis of chemical and biological data statistical validation of QSAR results.

614 citations


Cited by
More filters
Book
17 May 2013
TL;DR: This research presents a novel and scalable approach called “Smartfitting” that automates the very labor-intensive and therefore time-heavy and therefore expensive and expensive process of designing and implementing statistical models for regression models.
Abstract: General Strategies.- Regression Models.- Classification Models.- Other Considerations.- Appendix.- References.- Indices.

3,672 citations

Journal ArticleDOI
TL;DR: Evidence is presented that only models that have been validated externally, after their internal validation, can be considered reliable and applicable for both external prediction and regulatory purposes.
Abstract: The recent REACH Policy of the European Union has led to scientists and regulators to focus their attention on establishing general validation principles for QSAR models in the context of chemical regulation (previously known as the Setubal, nowadays, the OECD principles). This paper gives a brief analysis of some principles: unambiguous algorithm, Applicability Domain (AD), and statistical validation. Some concerns related to QSAR algorithm reproducibility and an example of a fast check of the applicability domain for MLR models are presented. Common myths and misconceptions related to popular techniques for verifying internal predictivity, particularly for MLR models (for instance crossvalidation, bootstrap), are commented on and compared with commonly used statistical techniques for external validation. The differences in the two validating approaches are highlighted, and evidence is presented that only models that have been validated externally, after their internal validation, can be considered reliable and applicable for both external prediction and regulatory purposes. (“Validation is one of those words...that is constantly used and seldom defined” as stated by A. R. Feinstein in the book Multivariate Analysis: An Introduction, Yale University Press, New Haven, 1996).

1,697 citations

Journal ArticleDOI
TL;DR: In this article, the optimum variable selection strategy for Partial Least Squares (PLS) regression using a model dataset of cytoprotection data is explored, where the compounds of the dataset were classified using K-means clustering technique applied on standardized descriptor matrix and ten combinations of training and test sets were generated based on the obtained clusters.
Abstract: This paper tries to explore the optimum variable selection strategy for Partial Least Squares (PLS) regression using a model dataset of cytoprotection data. The compounds of the dataset were classified using K-means clustering technique applied on standardized descriptor matrix and ten combinations of training and test sets were generated based on the obtained clusters. For a particular training set, PLS models were developed with a number of components optimized by leave-one-out Q2 and then the developed models were validated (externally) using the test set compounds. For each set, PLS model was initially constructed using all descriptors (variables). The variables having least standardized values of regression coefficients were deleted and the next model was developed with a reduced set of variables. These steps were performed several times until further reduction in number of variables did not improve Q2 value. In each case, statistical parameters like predictive R2 (R2pred), squared correlation coefficient between observed and predicted values with (r2) and without () intercept and Root Mean Square Error of Prediction (RMSEP) were calculated from the test set compounds. In case of all ten sets, Q2 values steadily increase on deletion of variables while R2pred values do not show any specific trend. In no case, the highest Q2 and highest R2pred appear in the same trial, i.e., with the same combinations of variables. This suggests that from the viewpoint of external predictability, choice of variables for PLS based on Q2 value may not be optimum. Moreover, a clear separation of r2 and r02 curves in some sets suggests that such models may not be truly predictive in spite of acceptable R2pred values. Another observation is that coefficient of determination R2 for the training set is more immune to changes on deletion of variables than the validation parameters like Q2 and R2pred. Finally, a new parameter rm2 has been suggested to indicate external predictability of QSAR models.

683 citations

Journal ArticleDOI
TL;DR: There is additional evidence that there exists no correlation between the values of q2 for the training set and accuracy of prediction (R2) for the test set and it is argued that this observation is a general property of any QSAR model developed with LOO cross-validation.
Abstract: Quantitative Structure–Activity Relationship (QSAR) models are used increasingly to screen chemical databases and/or virtual chemical libraries for potentially bioactive molecules. These developments emphasize the importance of rigorous model validation to ensure that the models have acceptable predictive power. Using k nearest neighbors (kNN) variable selection QSAR method for the analysis of several datasets, we have demonstrated recently that the widely accepted leave-one-out (LOO) cross-validated R2 (q2) is an inadequate characteristic to assess the predictive ability of the models [Golbraikh, A., Tropsha, A. Beware of q2! J. Mol. Graphics Mod. 20, 269-276, (2002)]. Herein, we provide additional evidence that there exists no correlation between the values of q2 for the training set and accuracy of prediction (R2) for the test set and argue that this observation is a general property of any QSAR model developed with LOO cross-validation. We suggest that external validation using rationally selected training and test sets provides a means to establish a reliable QSAR model. We propose several approaches to the division of experimental datasets into training and test sets and apply them in QSAR studies of 48 functionalized amino acid anticonvulsants and a series of 157 epipodophyllotoxin derivatives with antitumor activity. We formulate a set of general criteria for the evaluation of predictive power of QSAR models.

591 citations

Reference EntryDOI
15 Apr 2002
TL;DR: Abbreviations 3D D three-dimensional; C D molar concentration of a drug; CBG D corticosteroid binding globulin; CoMFA D comparative molecular field analysis; CoMSIAD comparative molecular similarity indices analysis; GOLPE D generating optimal linear PLS estimations; PLS D partial least squares.
Abstract: Abbreviations 3D D three-dimensional; C D molar concentration of a drug; CBG D corticosteroid binding globulin; CoMFA D comparative molecular field analysis; CoMSIA D comparative molecular similarity indices analysis; GOLPE D generating optimal linear PLS estimations; PLS D partial least squares; PRESS D predictive residual sum of squares; RMS D root mean squares; TBG D testosterone binding globulin.

504 citations