Author

Han van de Waterbeemd

Bio: Han van de Waterbeemd is an academic researcher. The author has contributed to research in topics: Quantitative structure–activity relationship. The author has an hindex of 1, co-authored 1 publications receiving 609 citations.

Papers

PDF

Open Access

More filters

Book•

Chemometric methods in molecular design

[...]

Han van de Waterbeemd

01 Jan 1995

TL;DR: Molecular concepts experimental design in synthesis-planning and structure-property correlations multivariate analysis of chemical and biological data statistical validation of QSAR results.

...read moreread less

Abstract: Molecular concepts experimental design in synthesis-planning and structure-property correlations multivariate analysis of chemical and biological data statistical validation of QSAR results.

...read moreread less

614 citations

Cited by

PDF

Open Access

More filters

Book•

Applied Predictive Modeling

[...]

Max Kuhn, Kjell Johnson

17 May 2013

TL;DR: This research presents a novel and scalable approach called “Smartfitting” that automates the very labor-intensive and therefore time-heavy and therefore expensive and expensive process of designing and implementing statistical models for regression models.

...read moreread less

Abstract: General Strategies.- Regression Models.- Classification Models.- Other Considerations.- Appendix.- References.- Indices.

...read moreread less

3,672 citations

Journal Article•DOI•

Principles of QSAR models validation: internal and external

[...]

Paola Gramatica¹•Institutions (1)

University of Insubria¹

01 May 2007-Qsar & Combinatorial Science

TL;DR: Evidence is presented that only models that have been validated externally, after their internal validation, can be considered reliable and applicable for both external prediction and regulatory purposes.

...read moreread less

Abstract: The recent REACH Policy of the European Union has led to scientists and regulators to focus their attention on establishing general validation principles for QSAR models in the context of chemical regulation (previously known as the Setubal, nowadays, the OECD principles). This paper gives a brief analysis of some principles: unambiguous algorithm, Applicability Domain (AD), and statistical validation. Some concerns related to QSAR algorithm reproducibility and an example of a fast check of the applicability domain for MLR models are presented. Common myths and misconceptions related to popular techniques for verifying internal predictivity, particularly for MLR models (for instance crossvalidation, bootstrap), are commented on and compared with commonly used statistical techniques for external validation. The differences in the two validating approaches are highlighted, and evidence is presented that only models that have been validated externally, after their internal validation, can be considered reliable and applicable for both external prediction and regulatory purposes. (“Validation is one of those words...that is constantly used and seldom defined” as stated by A. R. Feinstein in the book Multivariate Analysis: An Introduction, Yale University Press, New Haven, 1996).

...read moreread less

1,697 citations

Journal Article•DOI•

On Some Aspects of Variable Selection for Partial Least Squares Regression Models

[...]

Partha Pratim Roy, Kunal Roy

01 Mar 2008-Qsar & Combinatorial Science

TL;DR: In this article, the optimum variable selection strategy for Partial Least Squares (PLS) regression using a model dataset of cytoprotection data is explored, where the compounds of the dataset were classified using K-means clustering technique applied on standardized descriptor matrix and ten combinations of training and test sets were generated based on the obtained clusters.

...read moreread less

Abstract: This paper tries to explore the optimum variable selection strategy for Partial Least Squares (PLS) regression using a model dataset of cytoprotection data. The compounds of the dataset were classified using K-means clustering technique applied on standardized descriptor matrix and ten combinations of training and test sets were generated based on the obtained clusters. For a particular training set, PLS models were developed with a number of components optimized by leave-one-out Q2 and then the developed models were validated (externally) using the test set compounds. For each set, PLS model was initially constructed using all descriptors (variables). The variables having least standardized values of regression coefficients were deleted and the next model was developed with a reduced set of variables. These steps were performed several times until further reduction in number of variables did not improve Q2 value. In each case, statistical parameters like predictive R2 (R2pred), squared correlation coefficient between observed and predicted values with (r2) and without () intercept and Root Mean Square Error of Prediction (RMSEP) were calculated from the test set compounds. In case of all ten sets, Q2 values steadily increase on deletion of variables while R2pred values do not show any specific trend. In no case, the highest Q2 and highest R2pred appear in the same trial, i.e., with the same combinations of variables. This suggests that from the viewpoint of external predictability, choice of variables for PLS based on Q2 value may not be optimum. Moreover, a clear separation of r2 and r02 curves in some sets suggests that such models may not be truly predictive in spite of acceptable R2pred values. Another observation is that coefficient of determination R2 for the training set is more immune to changes on deletion of variables than the validation parameters like Q2 and R2pred. Finally, a new parameter rm2 has been suggested to indicate external predictability of QSAR models.

...read moreread less

683 citations

Journal Article•DOI•

Rational selection of training and test sets for the development of validated QSAR models.

[...]

Alexander Golbraikh¹, Min Shen¹, Zhiyan Xiao¹, Yun De Xiao¹, Kuo Hsiung Lee¹, Alexander Tropsha¹ - Show less +2 more•Institutions (1)

University of North Carolina at Chapel Hill¹

01 Feb 2003-Journal of Computer-aided Molecular Design

TL;DR: There is additional evidence that there exists no correlation between the values of q2 for the training set and accuracy of prediction (R2) for the test set and it is argued that this observation is a general property of any QSAR model developed with LOO cross-validation.

...read moreread less

Abstract: Quantitative Structure–Activity Relationship (QSAR) models are used increasingly to screen chemical databases and/or virtual chemical libraries for potentially bioactive molecules. These developments emphasize the importance of rigorous model validation to ensure that the models have acceptable predictive power. Using k nearest neighbors (kNN) variable selection QSAR method for the analysis of several datasets, we have demonstrated recently that the widely accepted leave-one-out (LOO) cross-validated R2 (q2) is an inadequate characteristic to assess the predictive ability of the models [Golbraikh, A., Tropsha, A. Beware of q2! J. Mol. Graphics Mod. 20, 269-276, (2002)]. Herein, we provide additional evidence that there exists no correlation between the values of q2 for the training set and accuracy of prediction (R2) for the test set and argue that this observation is a general property of any QSAR model developed with LOO cross-validation. We suggest that external validation using rationally selected training and test sets provides a means to establish a reliable QSAR model. We propose several approaches to the division of experimental datasets into training and test sets and apply them in QSAR studies of 48 functionalized amino acid anticonvulsants and a series of 157 epipodophyllotoxin derivatives with antitumor activity. We formulate a set of general criteria for the evaluation of predictive power of QSAR models.

...read moreread less

591 citations

Reference Entry•DOI•

Comparative Molecular Field Analysis (CoMFA)

[...]

Hugo Kubinyi

15 Apr 2002

TL;DR: Abbreviations 3D D three-dimensional; C D molar concentration of a drug; CBG D corticosteroid binding globulin; CoMFA D comparative molecular field analysis; CoMSIAD comparative molecular similarity indices analysis; GOLPE D generating optimal linear PLS estimations; PLS D partial least squares.

...read moreread less

Abstract: Abbreviations 3D D three-dimensional; C D molar concentration of a drug; CBG D corticosteroid binding globulin; CoMFA D comparative molecular field analysis; CoMSIA D comparative molecular similarity indices analysis; GOLPE D generating optimal linear PLS estimations; PLS D partial least squares; PRESS D predictive residual sum of squares; RMS D root mean squares; TBG D testosterone binding globulin.

...read moreread less

504 citations

Collapse