••19 May 2005
TL;DR: A unified conceptual framework to describe and quantify the important issue of the Applicability Domains (AD) of Quantitative Structure-Activity Relationships (QSARs) and a first use of untrustworthiness scores resides in prioritization of predictions, without the need to specify a hard AD border.
Abstract: The present work proposes a unified conceptual framework to describe and quantify the important issue of the Applicability Domains (AD) of Quantitative Structure−Activity Relationships (QSARs). AD models are conceived as meta-models μμ designed to associate an untrustworthiness score to any molecule M subject to property prediction by a QSAR model μ. Untrustworthiness scores or “AD metrics” Ψμ(M) are an expression of the relationship between M (represented by its descriptors in chemical space) and the space zones populated by the training molecules at the basis of model μ. Scores integrating some of the classical AD criteria (similarity-based, box-based) were considered in addition to newly invented terms such as the consensus prediction variance, the dissimilarity to outlier-free training sets, and the correlation breakdown count (the former two being most successful). A loose correlation is expected to exist between this untrustworthiness and the error |Pμ(M)-Pexpt(M)| affecting the property Pμ(M) predi...
TL;DR: This chapter is a review of the most recent developments in the field of pharmacophore modeling, covering both methodology and application and is the most used virtual screening technique in medicinal chemistry - notably for "scaffold hopping" approaches, allowing the discovery of new chemical classes carriers of a desired biological activity.
Abstract: This chapter is a review of the most recent developments in the field of pharmacophore modeling, covering both methodology and application. Pharmacophore-based virtual screening is nowadays a mature technology, very well accepted in the medicinal chemistry laboratory. Nevertheless, like any empirical approach, it has specific limitations and efforts to improve the methodology are still ongoing. Fundamentally, the core idea of "stripping" functional groups of their actual chemical nature in order to classify them into very few pharmacophore types, according to their dominant physico-chemical features, is both the main advantage and the main drawback of pharmacophore modeling. The advantage is the one of simplicity - the complex nature of noncovalent ligand binding interactions is rendered intuitive and comprehensible by the human mind. Although computers are much better suited for comparisons of pharmacophore patterns, a chemist's intuition is primarily scaffold-oriented. Its underlying simplifications render pharmacophore modeling unable to provide perfect predictions of ligand binding propensities - not even if all its subsisting technical problems would be solved. Each step in pharmacophore modeling and exploitation has specific drawbacks: from insufficient or inaccurate conformational sampling to ambiguities in pharmacophore typing (mainly due to uncertainty regarding the tautomeric/protonation status of compounds), to computer time limitations in complex molecular overlay calculations, and to the choice of inappropriate anchoring points in active sites when ligand cocrystals structures are not available. Yet, imperfections notwithstanding, the approach is accurate enough in order to be practically useful and actually is the most used virtual screening technique in medicinal chemistry - notably for "scaffold hopping" approaches, allowing the discovery of new chemical classes carriers of a desired biological activity.
TL;DR: 2D-FPT displays excellent neighborhood behavior, outperforming 2D or 3D two-point pharmacophore descriptors or chemical fingerprints, and includes a new similarity scoring formula, acknowledging that the simultaneous absence of a triplet in two molecules is a less-constraining indicator of similarity than its simultaneous presence.
Abstract: This paper introduces a novel molecular descriptiontopological (2D) fuzzy pharmacophore triplets, 2D-FPTusing the number of interposed bonds as the measure of separation between the atoms representing pharmacophore types (hydrophobic, aromatic, hydrogen-bond donor and acceptor, cation, and anion). 2D-FPT features three key improvements with respect to the state-of-the-art pharmacophore fingerprints: (1) The first key novelty is fuzzy mapping of molecular triplets onto the basis set of pharmacophore triplets: unlike in the binary scheme where an atom triplet is set to highlight the bit of a single, best-matching basis triplet, the herein-defined fuzzy approach allows for gradual mapping of each atom triplet onto several related basis triplets, thus minimizing binary classification artifacts. (2) The second innovation is proteolytic equilibrium dependence, by explicitly considering all of the conjugated acids and bases (microspecies). 2D-FPTs are concentration-weighted (as predicted at pH = 7.4) averages o...
TL;DR: Thirteen data sets for which state-of-the-art QSAR models were reported in literature were revisited in order to benchmark 2D-FPT biological activity-explaining propensities and confirmed the higher robustness of nonlinear over linear SQS models.
Abstract: Topological fuzzy pharmacophore triplets (2D-FPT), using the number of interposed bonds to measure separation between the atoms representing pharmacophore types, were employed to establish and validate quantitative structure−activity relationships (QSAR). Thirteen data sets for which state-of-the-art QSAR models were reported in literature were revisited in order to benchmark 2D-FPT biological activity-explaining propensities. Linear and nonlinear QSAR models were constructed for each compound series (following the original author's splitting into training/validation subsets) with three different 2D-FPT versions, using the genetic algorithm-driven Stochastic QSAR sampler (SQS) to pick relevant triplets and fit their coefficients. 2D-FPT QSARs are computationally cheap, interpretable, and perform well in benchmarking. In a majority of cases (10/13), default 2D-FPT models validated better than or as well as the best among those reported, including 3D overlay-dependent approaches. Most of the analogues serie...
TL;DR: The final EU‐OPENSCREEN library, assembled by merging five independent selections of 40K compounds from various expert groups, represents an excellent example of a Europe‐wide collaborative effort toward the common objective of building best‐in‐class European open screening platforms.
Abstract: This work describes a collaborative effort to define and apply a protocol for the rational selection of a general-purpose screening library, to be used by the screening platforms affiliated with the EU-OPENSCREEN initiative. It is designed as a standard source of compounds for primary screening against novel biological targets, at the request of research partners. Given the general nature of the potential applications of this compound collection, the focus of the selection strategy lies on ensuring chemical stability, absence of reactive compounds, screening-compliant physicochemical properties, loose compliance to drug-likeness criteria (as drug design is a major, but not exclusive application), and maximal diversity/coverage of chemical space, aimed at providing hits for a wide spectrum of drugable targets. Finally, practical availability/cost issues cannot be avoided. The main goal of this publication is to inform potential future users of this library about its conception, sources, and characteristics. The outline of the selection procedure, notably of the filtering rules designed by a large committee of European medicinal chemists and chemoinformaticians, may be of general methodological interest for the screening/medicinal chemistry community. The selection task of 200K molecules out of a pre-filtered set of 1.4M candidates was shared by five independent European research groups, each picking a subset of 40K compounds according to their own in-house methodology and expertise. An in-depth analysis of chemical space coverage of the library serves not only to characterize the collection, but also to compare the various chemoinformatics-driven selection procedures of maximal diversity sets. Compound selections contributed by various participating groups were mapped onto general-purpose self-organizing maps (SOMs) built on the basis of marketed drugs and bioactive reference molecules. In this way, the occupancy of chemical space by the EU-OPENSCREEN library could be directly compared with distributions of known bioactives of various classes. This mapping highlights the relevance of the selection and shows how the consensus reached by merging the five different 40K selections contributes to achieve this relevance. The approach also allows one to readily identify subsets of target- or target-class-oriented compounds from the EU-OPENSCREEN library to suit the needs of the diverse range of potential users. The final EU-OPENSCREEN library, assembled by merging five independent selections of 40K compounds from various expert groups, represents an excellent example of a Europe-wide collaborative effort toward the common objective of building best-in-class European open screening platforms.