scispace - formally typeset
Journal ArticleDOI

On Some Aspects of Variable Selection for Partial Least Squares Regression Models

Partha Pratim Roy, +1 more
- 01 Mar 2008 - 
- Vol. 27, Iss: 3, pp 302-313
Reads0
Chats0
TLDR
In this article, the optimum variable selection strategy for Partial Least Squares (PLS) regression using a model dataset of cytoprotection data is explored, where the compounds of the dataset were classified using K-means clustering technique applied on standardized descriptor matrix and ten combinations of training and test sets were generated based on the obtained clusters.
Abstract
This paper tries to explore the optimum variable selection strategy for Partial Least Squares (PLS) regression using a model dataset of cytoprotection data. The compounds of the dataset were classified using K-means clustering technique applied on standardized descriptor matrix and ten combinations of training and test sets were generated based on the obtained clusters. For a particular training set, PLS models were developed with a number of components optimized by leave-one-out Q2 and then the developed models were validated (externally) using the test set compounds. For each set, PLS model was initially constructed using all descriptors (variables). The variables having least standardized values of regression coefficients were deleted and the next model was developed with a reduced set of variables. These steps were performed several times until further reduction in number of variables did not improve Q2 value. In each case, statistical parameters like predictive R2 (R2pred), squared correlation coefficient between observed and predicted values with (r2) and without () intercept and Root Mean Square Error of Prediction (RMSEP) were calculated from the test set compounds. In case of all ten sets, Q2 values steadily increase on deletion of variables while R2pred values do not show any specific trend. In no case, the highest Q2 and highest R2pred appear in the same trial, i.e., with the same combinations of variables. This suggests that from the viewpoint of external predictability, choice of variables for PLS based on Q2 value may not be optimum. Moreover, a clear separation of r2 and r02 curves in some sets suggests that such models may not be truly predictive in spite of acceptable R2pred values. Another observation is that coefficient of determination R2 for the training set is more immune to changes on deletion of variables than the validation parameters like Q2 and R2pred. Finally, a new parameter rm2 has been suggested to indicate external predictability of QSAR models.

read more

Citations
More filters
Journal ArticleDOI

Identification of Hydroxamic Acid Based Selective HDAC1 Inhibitors: Computer Aided Drug Design Studies.

TL;DR: 3D-QSAR provides spatial fingerprints which would be beneficial for the development of potent HDAC1 inhibitors and the trustworthiness of the docking results was further confirmed by molecular dynamics simulations studies.
Journal ArticleDOI

Molecular docking, molecular dynamics simulation, and QSAR model on potent thiazolidine-4-carboxylic acid inhibitors of influenza neuraminidase

TL;DR: The results provide a set of useful guidelines for the rational design of novel influenza virus neuraminidase inhibitors using CART-LS-SVR models, and reveal that the potency of the most active compound binding is governed by electrostatic and van der Waals contacts.
Journal ArticleDOI

A QSAR study of integrase strand transfer inhibitors based on a large set of pyrimidine, pyrimidone, and pyridopyrazine carboxamide derivatives

TL;DR: In this article, a multivariate QSAR study was conducted with 54 molecules employed by Ordered Predictors Selection (OPS) and Partial Least Squares (PLS) for the selection of variables and model construction.
Journal ArticleDOI

3D-QSAR studies and molecular docking on [5-(4-amino-1 H -benzoimidazol-2-yl)-furan-2-yl]-phosphonic acid derivatives as fructose-1,6-biphophatase inhibitors

TL;DR: A set of forty new analogues were designed by utilizing the results revealed in the present study, and were predicted with significantly improved potencies in the developed models, and can be quite useful to aid the designing of new fructose-1,6-biphophatase inhibitors with improved biological response.
Journal ArticleDOI

3D-QSAR and virtual screening studies of thiazolidine-2,4-dione analogs: Validation of experimental inhibitory potencies towards PIM-1 kinase

TL;DR: External validations by various parameters and molecular docking studies at the active site of PIM-1 kinase have proved the reliability of the developed 3D-QSAR model, and may be useful for (medicinal) chemists to design more potent thiazolidine-2,4-dione analogs as PIM
References
More filters
Book

Cluster Analysis

TL;DR: This fourth edition of the highly successful Cluster Analysis represents a thorough revision of the third edition and covers new and developing areas such as classification likelihood and neural networks for clustering.
Journal ArticleDOI

Beware of q2

TL;DR: It is argued that the high value of LOO q2 appears to be the necessary but not the sufficient condition for the model to have a high predictive power, which is the general property of QSAR models developed using LOO cross-validation.
Journal ArticleDOI

PLS regression methods

TL;DR: In this paper, the mathematical and statistical structure of PLS regression is developed and the PLS decomposition of the data matrices involved in model building is analyzed. But the PLP regression algorithm can be interpreted in a model building setting.
Related Papers (5)