Author
Kunal Roy
Other affiliations: University of Calcutta, University of Manchester, Mario Negri Institute for Pharmacological Research
Bio: Kunal Roy is an academic researcher from Jadavpur University. The author has contributed to research in topics: Quantitative structure–activity relationship & Partial least squares regression. The author has an hindex of 47, co-authored 369 publications receiving 10758 citations. Previous affiliations of Kunal Roy include University of Calcutta & University of Manchester.
Papers published on a yearly basis
Papers
More filters
TL;DR: In this article, the optimum variable selection strategy for Partial Least Squares (PLS) regression using a model dataset of cytoprotection data is explored, where the compounds of the dataset were classified using K-means clustering technique applied on standardized descriptor matrix and ten combinations of training and test sets were generated based on the obtained clusters.
Abstract: This paper tries to explore the optimum variable selection strategy for Partial Least Squares (PLS) regression using a model dataset of cytoprotection data. The compounds of the dataset were classified using K-means clustering technique applied on standardized descriptor matrix and ten combinations of training and test sets were generated based on the obtained clusters. For a particular training set, PLS models were developed with a number of components optimized by leave-one-out Q2 and then the developed models were validated (externally) using the test set compounds. For each set, PLS model was initially constructed using all descriptors (variables). The variables having least standardized values of regression coefficients were deleted and the next model was developed with a reduced set of variables. These steps were performed several times until further reduction in number of variables did not improve Q2 value. In each case, statistical parameters like predictive R2 (R2pred), squared correlation coefficient between observed and predicted values with (r2) and without () intercept and Root Mean Square Error of Prediction (RMSEP) were calculated from the test set compounds. In case of all ten sets, Q2 values steadily increase on deletion of variables while R2pred values do not show any specific trend. In no case, the highest Q2 and highest R2pred appear in the same trial, i.e., with the same combinations of variables. This suggests that from the viewpoint of external predictability, choice of variables for PLS based on Q2 value may not be optimum. Moreover, a clear separation of r2 and r02 curves in some sets suggests that such models may not be truly predictive in spite of acceptable R2pred values. Another observation is that coefficient of determination R2 for the training set is more immune to changes on deletion of variables than the validation parameters like Q2 and R2pred. Finally, a new parameter rm2 has been suggested to indicate external predictability of QSAR models.
683 citations
TL;DR: This paper shows the problems associated with the R2 based validation metrics commonly used in QSAR studies, and proposes a guideline for determining the quality of predictions based on MAE and its standard deviation computed from the test set predictions after omitting 5% high residual data points to obviate the influence of any rarely occurring high prediction errors.
Abstract: Validation is the most crucial concept for development and application of quantitative structure–activity relationship (QSAR) models. The validation process confirms the reliability of the developed QSAR models along with the acceptability of each step during model development such as assessing the quality of input data, dataset diversity, predictability on an external set, domain of applicability and mechanistic interpretability. External validation or validation using an independent test set is usually considered as the gold standard in evaluating the quality of predictions from a QSAR model. The external predictivity of QSAR models is commonly described by employing various validation metrics, which can be broadly categorized into two major classes, viz., R2 based metrics namely R2test, Q2(ext_F1), and Q2(ext_F2), and purely error based measures like predicted residual sum of squares (PRESS), root mean square error (RMSE), and mean absolute error (MAE). The problem associated with the error based measures is the absence of any well-defined threshold for determining the quality of predictions making the R2 based metrics more suitable for use due to easy comprehension. However, in this paper, we show the problems associated with the R2 based validation metrics commonly used in QSAR studies, since their values are highly dependent on the range of the response values of the test set compounds and their distribution pattern around the training/test set mean. We also propose a guideline for determining the quality of predictions based on MAE and its standard deviation computed from the test set predictions after omitting 5% high residual data points in order to obviate the influence of any rarely occurring high prediction errors that may significantly obscure the quality of predictions for the whole test set. In this manner, we try to evaluate the prediction performance of a model on most (95%) of the data points present in the external set. An online tool (XternalValidationPlus) for computing the suggested MAE based criteria (along with other conventional metrics) for external validation has been made available at http://dtclab.webs.com/software-tools and http://teqip.jdvu.ac.in/QSAR_Tools/ . The MAE based criteria suggested here along with other commonly used validation metrics may be applied to evaluate predictive performance of QSAR models with a greater degree of confidence.
527 citations
TL;DR: The present study reports that the web application can be easily used for identification of the X-outliers for training set compounds and detection of the test compounds residing outside the AD using the descriptor pool of the training and test sets.
Abstract: Quantitative structure–activity/property/toxicity relationship (QSAR/QSPR/QSTR) modeling has been used in medicinal chemistry, material sciences, environmental fate modeling, risk assessment and computational toxicology for a long time. The Organization for Economic Co-operation and Development (OECD) has recommended that for application of validated QSAR models for prediction of new data points, there is a strict requirement of defining the applicability domain (AD) according to the Principle 3. The AD is a theoretical region in chemical space encompassing both the model descriptors and modeled response which allows one to estimate the uncertainty in the prediction of a particular compound based on how similar it is to the training compounds employed in the model development. The AD is an important tool for reliable application of QSAR models, while characterization of interpolation space is significant in defining the AD. An attempt is made here to suggest a simple method for defining the X-outliers (in the case of the training set) and identifying the compounds that reside outside the AD (in the case of the test set) employing the basic theory of the standardization approach. Further, a standalone application named “Applicability domain using standardization approach” (available at http://dtclab.webs.com/software-tools and http://teqip.jdvu.ac.in/QSAR_Tools/ ) has been developed. The present study reports that the web application can be easily used for identification of the X-outliers for training set compounds and detection of the test compounds residing outside the AD using the descriptor pool of the training and test sets.
517 citations
TL;DR: A test for these two parameters is suggested to be a more stringent requirement than the traditional validation parameters to decide acceptability of a predictive QSAR model, especially when a regulatory decision is involved.
Abstract: Validation is a crucial aspect of quantitative structure-activity relationship (QSAR) modeling. The present paper shows that traditionally used validation parameters (leave-one-out Q(2) for internal validation and predictive R(2) for external validation) may be supplemented with two novel parameters r(m)(2) and R(p)(2) for a stricter test of validation. The parameter r(m)(2)((overall)) penalizes a model for large differences between observed and predicted values of the compounds of the whole set (considering both training and test sets) while the parameter R(p)(2) penalizes model R(2) for large differences between determination coefficient of nonrandom model and square of mean correlation coefficient of random models in case of a randomization test. Two other variants of r(m)(2) parameter, r(m)(2)((LOO)) and r(m)(2)((test)), penalize a model more strictly than Q(2) and R(2)(pred) respectively. Three different data sets of moderate to large size have been used to develop multiple models in order to indicate the suitability of the novel parameters in QSAR studies. The results show that in many cases the developed models could satisfy the requirements of conventional parameters (Q(2) and R(2)(pred)) but fail to achieve the required values for the novel parameters r(m)(2) and R(p)(2). Moreover, these parameters also help in identifying the best models from among a set of comparable models. Thus, a test for these two parameters is suggested to be a more stringent requirement than the traditional validation parameters to decide acceptability of a predictive QSAR model, especially when a regulatory decision is involved.
474 citations
TL;DR: In this article, some additional variants of r m 2 metrics have been proposed and their applications in judging the quality of predictions of QSPR models have been shown by analyzing results of the QSPr models obtained from three different data sets (n = 119, 90, and 384).
Abstract: Quantitative structure–property relationship (QSPR) models are widely used for prediction of properties, activities and/or toxicities of new chemicals. Validation strategies check the reliability of predictions of QSPR models. The classical metrics like Q 2 and R 2 pred ( Q 2 ext ) are commonly used, besides other techniques, for internal validation (mostly leave-one-out) and external validation (test set validation) respectively. Recently, we have proposed a set of novel r m 2 metrics which has been extensively used by us and other research groups for validation of QSPR models. In the present attempt, some additional variants of r m 2 metrics have been proposed and their applications in judging the quality of predictions of QSPR models have been shown by analyzing results of the QSPR models obtained from three different data sets (n = 119, 90, and 384). In each case, 50 combinations of training and test sets have been generated, and models have been developed based on the training set compounds and subsequently applied for prediction of responses of the test set compounds. Finally, models for a particular data set have been ranked according to the quality of predictions. The role of different validation metrics (including classical metrics and different variants of r m 2 metrics) in differentiating the “good” (predictive) models from the “bad” (low predictive) models has been studied. Finally, a set of guidelines has been proposed for checking the predictive quality of QSPR models.
467 citations
Cited by
More filters
[...]
TL;DR: There is, I think, something ethereal about i —the square root of minus one, which seems an odd beast at that time—an intruder hovering on the edge of reality.
Abstract: There is, I think, something ethereal about i —the square root of minus one. I remember first hearing about it at school. It seemed an odd beast at that time—an intruder hovering on the edge of reality.
Usually familiarity dulls this sense of the bizarre, but in the case of i it was the reverse: over the years the sense of its surreal nature intensified. It seemed that it was impossible to write mathematics that described the real world in …
33,785 citations
TL;DR: While the book is a standard fixture in most chemical and physical laboratories, including those in medical centers, it is not as frequently seen in the laboratories of physician's offices (those either in solo or group practice), and I believe that the Handbook can be useful in those laboratories.
Abstract: There is a special reason for reviewing this book at this time: it is the 50th edition of a compendium that is known and used frequently in most chemical and physical laboratories in many parts of the world. Surely, a publication that has been published for 56 years, withstanding the vagaries of science in this century, must have had something to offer. There is another reason: while the book is a standard fixture in most chemical and physical laboratories, including those in medical centers, it is not as frequently seen in the laboratories of physician's offices (those either in solo or group practice). I believe that the Handbook can be useful in those laboratories. One of the reasons, among others, is that the various basic items of information it offers may be helpful in new tests, either physical or chemical, which are continuously being published. The basic information may relate
2,493 citations
TL;DR: The accumulated data on the biological activity of ionic liquids, including their antimicrobial and cytotoxic properties, are discussed in view of possible applications in drug synthesis and drug delivery systems.
Abstract: Ionic liquids are remarkable chemical compounds, which find applications in many areas of modern science. Because of their highly tunable nature and exceptional properties, ionic liquids have become essential players in the fields of synthesis and catalysis, extraction, electrochemistry, analytics, biotechnology, etc. Apart from physical and chemical features of ionic liquids, their high biological activity has been attracting significant attention from biochemists, ecologists, and medical scientists. This Review is dedicated to biological activities of ionic liquids, with a special emphasis on their potential employment in pharmaceutics and medicine. The accumulated data on the biological activity of ionic liquids, including their antimicrobial and cytotoxic properties, are discussed in view of possible applications in drug synthesis and drug delivery systems. Dedicated attention is given to a novel active pharmaceutical ingredient-ionic liquid (API-IL) concept, which suggests using traditional drugs in ...
1,065 citations
[...]
TL;DR: The Merck Index of Chemicals and Drugs is an encyclopedia for the Chemist, Pharmacist, Physician and Allied Professions and thumb-indexed, 8 dollars.
Abstract: The Merck Index of Chemicals and Drugs An Encyclopedia for the Chemist, Pharmacist, Physician and Allied Professions Sixth edition Pp xiv + 1167 (Rahway, NJ: Merck and Company, Inc, 1952) 750 dollars; thumb-indexed, 8 dollars
972 citations