scispace - formally typeset
Search or ask a question
JournalISSN: 1611-020X

Qsar & Combinatorial Science 

Wiley
About: Qsar & Combinatorial Science is an academic journal. The journal publishes majorly in the area(s): Quantitative structure–activity relationship & Molecular descriptor. It has an ISSN identifier of 1611-020X. Over the lifetime, 638 publications have been published receiving 18045 citations.

Papers published on a yearly basis

Papers
More filters
Journal ArticleDOI
TL;DR: A set of simple guidelines for developing validated and predictive QSPR models is presented, highlighting the need to establish the domain of model applicability in the chemical space to flag molecules for which predictions may be unreliable, and some algorithms that can be used for this purpose.
Abstract: This paper emphasizes the importance of rigorous validation as a crucial, integral component of Quantitative Structure Property Relationship (QSPR) model development. We consider some examples of published QSPR models, which in spite of their high fitted accuracy for the training sets and apparent mechanistic appeal, fail rigorous validation tests, and, thus, may lack practical utility as reliable screening tools. We present a set of simple guidelines for developing validated and predictive QSPR models. To this end, we discuss several validation strategies including (1) randomization of the modelled property, also called Y-scrambling, (2) multiple leave-many-out cross-validations, and (3) external validation using rational division of a dataset into training and test sets. We also highlight the need to establish the domain of model applicability in the chemical space to flag molecules for which predictions may be unreliable, and discuss some algorithms that can be used for this purpose. We advocate the broad use of these guidelines in the development of predictive QSPR models.

1,838 citations

Journal ArticleDOI
TL;DR: Evidence is presented that only models that have been validated externally, after their internal validation, can be considered reliable and applicable for both external prediction and regulatory purposes.
Abstract: The recent REACH Policy of the European Union has led to scientists and regulators to focus their attention on establishing general validation principles for QSAR models in the context of chemical regulation (previously known as the Setubal, nowadays, the OECD principles). This paper gives a brief analysis of some principles: unambiguous algorithm, Applicability Domain (AD), and statistical validation. Some concerns related to QSAR algorithm reproducibility and an example of a fast check of the applicability domain for MLR models are presented. Common myths and misconceptions related to popular techniques for verifying internal predictivity, particularly for MLR models (for instance crossvalidation, bootstrap), are commented on and compared with commonly used statistical techniques for external validation. The differences in the two validating approaches are highlighted, and evidence is presented that only models that have been validated externally, after their internal validation, can be considered reliable and applicable for both external prediction and regulatory purposes. (“Validation is one of those words...that is constantly used and seldom defined” as stated by A. R. Feinstein in the book Multivariate Analysis: An Introduction, Yale University Press, New Haven, 1996).

1,697 citations

Journal ArticleDOI
TL;DR: In this article, the optimum variable selection strategy for Partial Least Squares (PLS) regression using a model dataset of cytoprotection data is explored, where the compounds of the dataset were classified using K-means clustering technique applied on standardized descriptor matrix and ten combinations of training and test sets were generated based on the obtained clusters.
Abstract: This paper tries to explore the optimum variable selection strategy for Partial Least Squares (PLS) regression using a model dataset of cytoprotection data. The compounds of the dataset were classified using K-means clustering technique applied on standardized descriptor matrix and ten combinations of training and test sets were generated based on the obtained clusters. For a particular training set, PLS models were developed with a number of components optimized by leave-one-out Q2 and then the developed models were validated (externally) using the test set compounds. For each set, PLS model was initially constructed using all descriptors (variables). The variables having least standardized values of regression coefficients were deleted and the next model was developed with a reduced set of variables. These steps were performed several times until further reduction in number of variables did not improve Q2 value. In each case, statistical parameters like predictive R2 (R2pred), squared correlation coefficient between observed and predicted values with (r2) and without () intercept and Root Mean Square Error of Prediction (RMSEP) were calculated from the test set compounds. In case of all ten sets, Q2 values steadily increase on deletion of variables while R2pred values do not show any specific trend. In no case, the highest Q2 and highest R2pred appear in the same trial, i.e., with the same combinations of variables. This suggests that from the viewpoint of external predictability, choice of variables for PLS based on Q2 value may not be optimum. Moreover, a clear separation of r2 and r02 curves in some sets suggests that such models may not be truly predictive in spite of acceptable R2pred values. Another observation is that coefficient of determination R2 for the training set is more immune to changes on deletion of variables than the validation parameters like Q2 and R2pred. Finally, a new parameter rm2 has been suggested to indicate external predictability of QSAR models.

683 citations

Journal ArticleDOI
TL;DR: The review provides analysis of potential pitfalls of descriptor based similarity analysis – loss of information in the representations of molecular structures – the relevance of a particular representation and chosen similarity measure to the activity.
Abstract: Although the concept of similarity is a convenient for humans, a formal definition of similarity between chemical compounds is needed to enable automatic decision-making. The objective of similarity measures in toxicology and drug design is to allow assessment of chemical activities. The ideal similarity measure should be relevant to the activity of interest. The relevance could be established by exploiting the knowledge about fundamental chemical and biological processes responsible for the activity. Unfortunately, this knowledge is rarely available and therefore different approximations have been developed based on similarity between structures or descriptor values. Various methods are reviewed, ranging from two-dimensional, three-dimensional and field approaches to recent methods based on “Atoms in Molecules” theory. All these methods attempt to describe chemical compounds by a set of numerical values and define some means for comparison between them. The review provides analysis of potential pitfalls of this methodology – loss of information in the representations of molecular structures – the relevance of a particular representation and chosen similarity measure to the activity. A brief review of known methods for descriptor selection is also provided. The popular “neighborhood behavior” principle is criticized, since proximity with respect to descriptors does not necessarily mean proximity with respect to activity. Structural similarity should also be used with care, as it does not always imply similar activity, as shown by examples. We remind that similarity measures and classification techniques based on distances rely on certain data distribution assumptions. If these assumptions are not satisfied for a given dataset, the results could be misleading. A discussion on similarity in descriptor space in the context of applicability domain assessment of QSAR models is also provided. Finally, it is shown that descriptor based similarity analysis is prone to errors if the relationship between the activity and the descriptors has not been previously established. A justification for the usage of a particular similarity measure should be provided for every specific activity by expert knowledge or derived by data modeling techniques.

365 citations

Journal ArticleDOI
TL;DR: The characteristics of bioorthogonal click reactions as well as recent applications toward labeling biomolecules in cells and living organisms are described.
Abstract: The term “click chemistry” defines a powerful set of chemical reactions that are rapid, selective, and high-yielding. These reactions, some of which are less than 10 years old, have been applied in diverse areas, including drug discovery, materials science, and chemical biology. In chemical biology, click chemistry has been used in the selective labeling of biomolecules within living systems, allowing proteins, glycans, and other important biomolecules to be monitored in a physiologically relevant environment rather than in an in vitro setting. This demanding application requires not only the aforementioned characteristics of click chemistry but, additionally, that the reactions are bioorthogonal – that is, non-interacting with biological functionality while proceeding under physiological conditions – and that the reagents are non-toxic. Of the many extant click reactions, only a select few possess this unique combination of attributes, notably the Staudinger ligation of azides and triarylphosphines and [3+2] dipolar cycloadditions of azides with strained alkynes. This minireview describes the characteristics of bioorthogonal click reactions as well as recent applications toward labeling biomolecules in cells and living organisms.

297 citations

Network Information
Related Journals (5)
Journal of Chemical Information and Modeling
6.1K papers, 231.1K citations
88% related
Bioorganic & Medicinal Chemistry
16.2K papers, 503.6K citations
79% related
Journal of Molecular Structure-theochem
11.6K papers, 164.4K citations
79% related
European Journal of Medicinal Chemistry
13.5K papers, 378.7K citations
79% related
Journal of Computational Chemistry
8.7K papers, 719.4K citations
79% related
Performance
Metrics
No. of papers from the Journal in previous years
YearPapers
2009120
200899
200794
200696
200584
200468