scispace - formally typeset
Search or ask a question

Showing papers in "Journal of Computer-aided Molecular Design in 2007"


Journal ArticleDOI
TL;DR: Extensions to the well-established Hammett and Taft approaches are used for pKa prediction, namely, mesomer standardization, charge cancellation, and charge spreading to make the predicted results reflect the nature of the molecule itself rather just for the particular Lewis structure used on input.
Abstract: Epik is a computer program for predicting pKa values for drug-like molecules. Epik can use this capability in combination with technology for tautomerization to adjust the protonation state of small drug-like molecules to automatically generate one or more of the most probable forms for use in further molecular modeling studies. Many medicinal chemicals can exchange protons with their environment, resulting in various ionization and tautomeric states, collectively known as protonation states. The protonation state of a drug can affect its solubility and membrane permeability. In modeling, the protonation state of a ligand will also affect which conformations are predicted for the molecule, as well as predictions for binding modes and ligand affinities based upon protein–ligand interactions. Despite the importance of the protonation state, many databases of candidate molecules used in drug development do not store reliable information on the most probable protonation states. Epik is sufficiently rapid and accurate to process large databases of drug-like molecules to provide this information. Several new technologies are employed. Extensions to the well-established Hammett and Taft approaches are used for pKa prediction, namely, mesomer standardization, charge cancellation, and charge spreading to make the predicted results reflect the nature of the molecule itself rather just for the particular Lewis structure used on input. In addition, a new iterative technology for generating, ranking and culling the generated protonation states is employed.

1,309 citations


Journal ArticleDOI
TL;DR: The Surflex flexible molecular docking method has been generalized and extended in two primary areas related to the search component of docking: incorporation of a small-molecule force-field and knowledge of well established molecular interactions between ligand fragments and a target protein can be directly exploited to guide the search process.
Abstract: The Surflex flexible molecular docking method has been generalized and extended in two primary areas related to the search component of docking. First, incorporation of a small-molecule force-field extends the search into Cartesian coordinates constrained by internal ligand energetics. Whereas previous versions searched only the alignment and acyclic torsional space of the ligand, the new approach supports dynamic ring flexibility and all-atom optimization of docked ligand poses. Second, knowledge of well established molecular interactions between ligand fragments and a target protein can be directly exploited to guide the search process. This offers advantages in some cases over the search strategy where ligand alignment is guided solely by a “protomol” (a pre-computed molecular representation of an idealized ligand). Results are presented on both docking accuracy and screening utility using multiple publicly available benchmark data sets that place Surflex’s performance in the context of other molecular docking methods. In terms of docking accuracy, Surflex-Dock 2.1 performs as well as the best available methods. In the area of screening utility, Surflex’s performance is extremely robust, and it is clearly superior to other methods within the set of cases for which comparative data are available, with roughly double the screening enrichment performance.

546 citations


Journal ArticleDOI
TL;DR: The presented approach identifies optimal chemical feature pairs using distance and density characteristics obtained by correlating pharmacophoric geometries and thus proves to be faster than existing combinatorial alignment methods and creates more reasonable alignments than pure atom-based methods.
Abstract: Aligning and overlaying two or more bio-active molecules is one of the key tasks in computational drug discovery and bio-activity prediction. Especially chemical-functional molecule characteristics from the view point of a macromolecular target represented as a 3D pharmacophore are the most interesting similarity measure when describing and analyzing macromolecule-ligand interaction. In this study, a novel approach for aligning rigid three-dimensional molecules according to their chemical-functional pharmacophoric features is presented and compared to the overlay of experimentally determined poses in a comparable macromolecule coordinate frame. The presented approach identifies optimal chemical feature pairs using distance and density characteristics obtained by correlating pharmacophoric geometries and thus proves to be faster than existing combinatorial alignment methods and creates more reasonable alignments than pure atom-based methods. Examples will be provided to demonstrate the feasibility, speed and intuitiveness of this method.

278 citations


Journal ArticleDOI
TL;DR: A scoring method is devised that rapidly evaluates synthetic accessibility of structures based on structural complexity, similarity to available starting materials and assessment of strategic bonds where a structure can be decomposed to obtain simpler fragments.
Abstract: De novo design systems provide powerful methods to suggest a set of novel structures with high estimated binding affinity. One deficiency of these methods is that some of the suggested structures could be synthesized only with great difficulty. We devised a scoring method that rapidly evaluates synthetic accessibility of structures based on structural complexity, similarity to available starting materials and assessment of strategic bonds where a structure can be decomposed to obtain simpler fragments. These individual components were combined to an overall score of synthetic accessibility by an additive scheme. The weights of the scoring function components were calculated by linear regression analysis based on accessibility scores derived from medicinal chemists. The calculated values for synthetic accessibility agree with the values proposed by chemists to an extent that compares well with how chemists agree with each other.

151 citations


Journal ArticleDOI
TL;DR: This perspectives article has been taken from a talk the author gave at the symposium in honor of Yvonne C. Martin’s retirement, held at the American Chemical Society spring meeting in Chicago on March 25, 2007.
Abstract: This perspectives article has been taken from a talk the author gave at the symposium in honor of Yvonne C. Martin's retirement, held at the American Chemical Society spring meeting in Chicago on March 25, 2007. The talk was intended as a somewhat lighthearted attempt to gaze into the future; inevitably, in print, things will come across more seriously than was intended. As we all know-the past is rarely predictive of the future.

140 citations


Journal ArticleDOI
TL;DR: A kernel method is reported that allows the processing of molecules represented by binary, integer and real-valued descriptors, and it is shown that it is little different in screening performance from a previously described kernel that had been developed specifically for the analysis of binary fingerprint representations of molecular structure.
Abstract: Machine-learning methods can be used for virtual screening by analysing the structural characteristics of molecules of known (in)activity, and we here discuss the use of kernel discrimination and naive Bayesian classifier (NBC) methods for this purpose. We report a kernel method that allows the processing of molecules represented by binary, integer and real-valued descriptors, and show that it is little different in screening performance from a previously described kernel that had been developed specifically for the analysis of binary fingerprint representations of molecular structure. We then evaluate the performance of an NBC when the training-set contains only a very few active molecules. In such cases, a simpler approach based on group fusion would appear to provide superior screening performance, especially when structurally heterogeneous datasets are to be processed.

113 citations


Journal ArticleDOI
TL;DR: Trends indicate that chemical probes are similar to leads with respect to some properties, e.g., complexity, solubility, and hydrophobicity.
Abstract: Academic and industrial research continues to be focused on discovering new classes of compounds based on HTS. Post-HTS analyses need to prioritize compounds that are progressed to chemical probe or lead status. We report trends in probe, lead and drug discovery by examining the following categories of compounds: 385 leads and the 541 drugs that emerged from them; ''active'' (152) and ''inactive'' (1488) compounds from the Molecular Libraries Initiative Small Molecule Repository (MLSMR) tested by HTS; ''active'' (46) and ''inactive'' (72) compounds from Nature Chemical Biology (NCB) tested by HTS; compounds in the drug development phase (I, II, III and launched), as indexed in MDDR; and medicinal chemistry compounds from WOMBAT, separated into high-activity (5,784 compounds with nanomolar activ- ity or better) and low-activity (30,690 with micromolar activity or less). We examined Molecular weight (MW), molecular complexity, flexibility, the number of hydrogen bond donors and acceptors, LogP—the oct- anol/water partition coefficient estimated by ClogP and ALOGPS), LogSw (intrinsic water solubility, esti- mated by ALOGPS) and the number of Rule of five (Ro5) criteria violations. Based on the 50% and 90% distribution moments of the above properties, there were no significant difference between leads of known drugs and ''actives'' from MLSMR or NCB (chemical probes). ''Inactives'' from NCB and MLSMR were also found to exhibit similar properties. From these com- bined sets, we conclude that ''Actives'' (569 com- pounds) are less complex, less flexible, and more soluble than drugs (1,651 drugs), and significantly smaller, less complex, less hydrophobic and more sol- uble than the 5,784 high-activity WOMBAT com- pounds. These trends indicate that chemical probes are similar to leads with respect to some properties, e.g., complexity, solubility, and hydrophobicity.

106 citations


Journal ArticleDOI
TL;DR: A ‘global’ model of hERG K+ channel was built to satisfy three basic criteria for QSAR models in drug discovery: assessment of the applicability domain, assuring that model decisions can be interpreted by medicinal chemists and assessment of model performance after the model was built.
Abstract: A 'global' model of hERG K(+) channel was built to satisfy three basic criteria for QSAR models in drug discovery: (1) assessment of the applicability domain, (2) assuring that model decisions can be interpreted by medicinal chemists and (3) assessment of model performance after the model was built. A combination of D-optimal onion design and hierarchical partial least squares modelling was applied to construct a global model of hERG blockade in order to maximize the applicability domain of the model and to enhance its interpretability. Additionally, easily interpretable hERG specific fragment-based descriptors were developed. Model performance was monitored, throughout a time period of 15 months, after model implementation. It was found that after this time duration a greater proportion of molecules were outside the model's applicability domain and that these compounds had a markedly higher average prediction error than those from molecules within the model's applicability domain. The model's predictive performance deteriorated within 4 months after building, illustrating the necessity of regular updating of global models within a drug discovery environment.

97 citations


Journal ArticleDOI
TL;DR: A Bayesian classification model derived from more than 8,800 compounds that have been experimentally assessed for their potential to covalently modify protein targets is described and can be implemented in the large-scale assessment of compound libraries for purchase or design.
Abstract: Non-specific chemical modification of protein thiol groups continues to be a significant source of false positive hits from high-throughput screening campaigns and can even plague certain protein targets and chemical series well into lead optimization. While experimental tools exist to assess the risk and promiscuity associated with the chemical reactivity of existing compounds, computational tools are desired that can reliably identify substructures that are associated with chemical reactivity to aid in triage of HTS hit lists, external compound purchases, and library design. Here we describe a Bayesian classification model derived from more than 8,800 compounds that have been experimentally assessed for their potential to covalently modify protein targets. The resulting model can be implemented in the large-scale assessment of compound libraries for purchase or design. In addition, the individual substructures identified as highly reactive in the model can be used as look-up tables to guide chemists during hit-to-lead and lead optimization campaigns.

84 citations


Journal ArticleDOI
TL;DR: C(6) is suggested as the most promising position of the flavonoid scaffold to introduce chemical modifications to improve affinity, selectivity, and inhibition of PLA2-IIA by flavonoids.
Abstract: The human secretory phospholipase A2 group IIA (PLA2-IIA) is a lipolytic enzyme. Its inhibition leads to a decrease in eicosanoids levels and, thereby, to reduced inflammation. Therefore, PLA2-IIA is of high pharmacological interest in treatment of chronic diseases such as asthma and rheumatoid arthritis. Quercetin and naringenin, amongst other flavonoids, are known for their anti-inflammatory activity by modulation of enzymes of the arachidonic acid cascade. However, the mechanism by which flavonoids inhibit Phospholipase A2 (PLA2) remained unclear so far. Flavonoids are widely produced in plant tissues and, thereby, suitable targets for pharmaceutical extractions and chemical syntheses. Our work focuses on understanding the binding modes of flavonoids to PLA2, their inhibition mechanism and the rationale to modify them to obtain potent and specific inhibitors. Our computational and experimental studies focused on a set of 24 compounds including natural flavonoids and naringenin-based derivatives. Experimental results on PLA2-inhibition showed good inhibitory activity for quercetin, kaempferol, and galangin, but relatively poor for naringenin. Several naringenin derivatives were synthesized and tested for affinity and inhibitory activity improvement. 6-(1,1-dimethylallyl)naringenin revealed comparable PLA2 inhibition to quercetin-like compounds. We characterized the binding mode of these compounds and the determinants for their affinity, selectivity, and inhibitory potency. Based on our results, we suggest C(6) as the most promising position of the flavonoid scaffold to introduce chemical modifications to improve affinity, selectivity, and inhibition of PLA2-IIA by flavonoids.

82 citations


Journal ArticleDOI
TL;DR: The studies suggest that the approach combining validated QSAR modeling and virtual screening could be successfully used as a general tool for the discovery of novel biologically active compounds.
Abstract: A combined approach of validated QSAR modeling and virtual screening was successfully applied to the discovery of novel tylophrine derivatives as anticancer agents. QSAR models have been initially developed for 52 chemically diverse phenanthrine-based tylophrine derivatives (PBTs) with known experimental EC50 using chemical topological descriptors (calculated with the MolConnZ program) and variable selection k nearest neighbor (kNN) method. Several validation protocols have been applied to achieve robust QSAR models. The original dataset was divided into multiple training and test sets, and the models were considered acceptable only if the leave-one-out cross-validated R 2 (q 2) values were greater than 0.5 for the training sets and the correlation coefficient R 2 values were greater than 0.6 for the test sets. Furthermore, the q 2 values for the actual dataset were shown to be significantly higher than those obtained for the same dataset with randomized target properties (Y-randomization test), indicating that models were statistically significant. Ten best models were then employed to mine a commercially available ChemDiv Database (ca. 500 K compounds) resulting in 34 consensus hits with moderate to high predicted activities. Ten structurally diverse hits were experimentally tested and eight were confirmed active with the highest experimental EC50 of 1.8 μM implying an exceptionally high hit rate (80%). The same ten models were further applied to predict EC50 for four new PBTs, and the correlation coefficient (R 2) between the experimental and predicted EC50 for these compounds plus eight active consensus hits was shown to be as high as 0.57. Our studies suggest that the approach combining validated QSAR modeling and virtual screening could be successfully used as a general tool for the discovery of novel biologically active compounds.

Journal ArticleDOI
TL;DR: This work investigates the use of different Machine Learning methods to construct models for aqueous solubility, evaluating all approaches in terms of their prediction accuracy and in how far the individual error bars can faithfully represent the actual prediction error.
Abstract: We investigate the use of different Machine Learning methods to construct models for aqueous solubility. Models are based on about 4000 compounds, including an in-house set of 632 drug discovery molecules of Bayer Schering Pharma. For each method, we also consider an appropriate method to obtain error bars, in order to estimate the domain of applicability (DOA) for each model. Here, we investigate error bars from a Bayesian model (Gaussian Process (GP)), an ensemble based approach (Random Forest), and approaches based on the Mahalanobis distance to training data (for Support Vector Machine and Ridge Regression models). We evaluate all approaches in terms of their prediction accuracy (in cross-validation, and on an external validation set of 536 molecules) and in how far the individual error bars can faithfully represent the actual prediction error.

Journal ArticleDOI
TL;DR: Highly predictive classification models for human liver microsomal stability using the apparent intrinsic clearance (CLint, app) as the end point are developed using Random Forest and Bayesian classification methods with MOE, E-state descriptors, ADME Keys, and ECFP_6 fingerprints.
Abstract: We developed highly predictive classification models for human liver microsomal (HLM) stability using the apparent intrinsic clearance (CL(int, app)) as the end point. HLM stability has been shown to be an important factor related to the metabolic clearance of a compound. Robust in silico models that predict metabolic clearance are very useful in early drug discovery stages to optimize the compound structure and to select promising leads to avoid costly drug development failures in later stages. Using Random Forest and Bayesian classification methods with MOE, E-state descriptors, ADME Keys, and ECFP_6 fingerprints, various highly predictive models were developed. The best performance of the models shows 80 and 75% prediction accuracy for the test and validation sets, respectively. A detailed analysis of results will be shown, including an assessment of the prediction confidence, the significant descriptors, and the application of these models to drug discovery projects.

Journal ArticleDOI
TL;DR: The generation and validation of pharmacophore models for PPARs, as well as a large scale validation of the parallel screening approach by screening PPAR ligands against a large database of structure-based models, confirm the ability of parallel screening to forecast the pharmacological active target for a set of compounds.
Abstract: We describe the generation and validation of pharmacophore models for PPARs, as well as a large scale validation of the parallel screening approach by screening PPAR ligands against a large database of structure-based models. A large test set of 357 PPAR ligands was screened against 48 PPAR models to determine the best models for agonists of PPAR-α, PPAR-δ, and PPAR-γ. Afterwards, a parallel screen was performed using the 357 PPAR ligands and 47 structure-based models for PPARs, which were integrated into a 1537 models comprising in-house pharmacophore database, to assess the enrichment of PPAR ligands within the PPAR hypotheses. For these purposes, we categorized the 1537 database models into 181 protein targets and developed a score that ranks the retrieved targets for each ligand. Thus, we tried to find out if the concept of parallel screening is able to predict the correct pharmacological target for a set of compounds. The PPAR target was ranked first more often than any other target. This confirms the ability of parallel screening to forecast the pharmacological active target for a set of compounds.

Journal ArticleDOI
TL;DR: A homology model of the hH3R based on the crystal structure of bovine rhodopsin was generated and refined by molecular dynamics simulations in a dipalmitoylphosphatidylcholine)/water membrane mimic before the resulting binding pocket was used for high-throughput docking using the program GOLD.
Abstract: The human histamine H3 receptor (hH3R) is a G-protein coupled receptor (GPCR), which modulates the release of various neurotransmitters in the central and peripheral nervous system and therefore is a potential target in the therapy of numerous diseases. Although ligands addressing this receptor are already known, the discovery of alternative lead structures represents an important goal in drug design. The goal of this work was to study the hH3R and its antagonists by means of molecular modelling tools. For this purpose, a strategy was pursued in which a homology model of the hH3R based on the crystal structure of bovine rhodopsin was generated and refined by molecular dynamics simulations in a dipalmitoylphosphatidylcholine (DPPC)/water membrane mimic before the resulting binding pocket was used for high-throughput docking using the program GOLD. Alternatively, a pharmacophore-based procedure was carried out where the alleged bioactive conformations of three different potent hH3R antagonists were used as templates for the generation of pharmacophore models. A pharmacophore-based screening was then carried out using the program Catalyst. Based upon a database of 418 validated hH3R antagonists both strategies could be validated in respect of their performance. Seven hits obtained during this screening procedure were commercially purchased, and experimentally tested in a [3H]Nα-methylhistamine binding assay. The compounds tested showed affinities at hH3R with K i values ranging from 0.079 to 6.3 μM.

Journal ArticleDOI
TL;DR: Investigating the inhibitors in the human CYP11B models using molecular docking and molecular dynamics simulations was able to predict a similar trend in potency for the inhibitors as found in the in vitro assays, and it was possible to understand the enantioselectivity of the human enzymes for the inhibitor fadrazole.
Abstract: Aldosterone is synthesised by aldosterone synthase (CYP11B2). CYP11B2 has a highly homologous isoform, steroid 11β-hydroxylase (CYP11B1), which is responsible for the biosynthesis of aldosterone precursors and glucocorticoids. To investigate aldosterone biosynthesis and facilitate the search for selective CYP11B2 inhibitors, we constructed three-dimensional models for CYP11B1 and CYP11B2 for both human and rat. The models were constructed based on the crystal structure of Pseudomonas Putida CYP101 and Oryctolagus Cuniculus CYP2C5. Small steric active site differences between the isoforms were found to be the most important determinants for the regioselective steroid synthesis. A possible explanation for these steric differences for the selective synthesis of aldosterone by CYP11B2 is presented. The activities of the known CYP11B inhibitors metyrapone, R-etomidate, R-fadrazole and S-fadrazole were determined using assays of V79MZ cells that express human CYP11B1 and CYP11B2, respectively. By investigating the inhibitors in the human CYP11B models using molecular docking and molecular dynamics simulations we were able to predict a similar trend in potency for the inhibitors as found in the in vitro assays. Importantly, based on the docking and dynamics simulations it is possible to understand the enantioselectivity of the human enzymes for the inhibitor fadrazole, the R-enantiomer being selective for CYP11B2 and the S-enantiomer being selective for CYP11B1.

Journal ArticleDOI
TL;DR: The results of the study presented here clearly indicate that pharmacophore-based parallel screening comprises a reliable in silico method to predict the potential biological activities of a compound or a compound library by screening it against a series of Pharmacophore queries.
Abstract: In order to assess bioactivity profiles for small organic molecules we propose to use parallel pharmacophore-based virtual screening. Our aim is to provide a fast, reliable and scalable system that allows for rapid in silico activity profile prediction of virtual molecules. In this proof of principle study, carried out with the new structure-based pharmacophore modelling tool LigandScout and the high-performance database mining platform Catalyst, we present a model work for the application of parallel pharmacophore-based virtual screening on a set of 50 structure-based pharmacophore models built for various viral targets and 100 antiviral compounds. The latter were screened against all pharmacophore models in order to determine if their known biological targets could be correctly predicted via an enrichment of corresponding pharmaco-phores matching these ligands. The results demonstrate that the desired enrichment, i.e. a successful activity profiling, was achieved for approximately 90% of all input molecules. Additionally, we discuss descriptors for output validation, as well as various aspects influencing the analysis of the obtained activity profiles, and the effect of the searching mode utilized for screening. The results of the study presented here clearly indicate that pharmacophore-based parallel screening comprises a reliable in silico method to predict the potential biological activities of a compound or a compound library by screening it against a series of pharmacophore queries.

Journal ArticleDOI
TL;DR: In-silico models were generated to predict the extent of inhibition of cytochrome P450 isoenzymes using a set of relatively interpretable descriptors in conjunction with partial least squares (PLS) and regression trees (RT) as mentioned in this paper.
Abstract: In-silico models were generated to predict the extent of inhibition of cytochrome P450 isoenzymes using a set of relatively interpretable descriptors in conjunction with partial least squares (PLS) and regression trees (RT). The former was chosen due to the conservative nature of the resultant models built and the latter to more effectively account for any non-linearity between dependent and independent variables. All models are statistically significant and agree with the known SAR and they could be used as a guide to P450 liability through a classification based on the continuous pIC50 prediction given by the model. A compound is classified as having either a high or low P450 liability if the predicted pIC50 is at least one root mean square error (RMSE) from the high/low pIC50 cut-off of 5. If predicted within an RMSE of the cut-off we cannot be confident a compound will be experimentally low or high so an indeterminate classification is given. Hybrid models using bulk descriptors and fragmental descriptors do significantly better in modeling CYP450 inhibition, than bulk property QSAR descriptors alone.

Journal ArticleDOI
TL;DR: Comparisons of pharmacophore multiplets searching using random conformations with multiplet searching using single conformations derived from GALAHAD models show that, while query hypotheses based on random conformation are quite effective, hypothesis based on aligned conformations do a better job of discriminating between active and inactive compounds.
Abstract: Pharmacophore multiplets are useful tools for 3D database searching, with the queries used ordinarily being derived from ensembles of random conformations of active ligands. It seems reasonable to expect that their usefulness can be augmented by instead using queries derived from single ligand conformations obtained from aligned ligands. Comparisons of pharmacophore multiplet searching using random conformations with multiplet searching using single conformations derived from GALAHAD (a genetic algorithm with linear assignment for hypermolecular alignment of datasets) models do indeed show that, while query hypotheses based on random conformations are quite effective, hypotheses based on aligned conformations do a better job of discriminating between active and inactive compounds. In particular, the hypothesis created from a neuraminidase inhibitor model was more similar to half of 18 known actives than all but 0.2% of the compounds in a structurally diverse subset of the World Drug Index. Similarly, a model developed from five angiotensin II antagonists yielded hypotheses that placed 65 known antagonists within the top 0.1-1% of decoy databases. The differences in discriminating power ranged from 2 to 20-fold, depending on the protein target and the type of pharmacophore multiplet used.

Journal ArticleDOI
TL;DR: Comparison QSAR analysis on PAMPA/modified PamPA for high throughput profiling of drugs with respect to Caco-2 cells and human intestinal absorption is provided.
Abstract: Despite the dramatic increase in speed of synthesis and biological evaluation of new chemical entities, the number of compounds that survive the rigorous processes associated with drug development is low. Thus, an increased emphasis on thorough ADMET (absorption, distribution, metabolism, excretion and toxicity) studies based on in vitro and in silico approaches allows for early evaluation of new drugs in the development phase. Artificial membrane permeability measurements afford a high throughput, relatively low cost but labor intensive alternative for in vitro determination of drug absorption potential; parallel artificial membrane permeability assays have been extensively utilized to determine drug absorption potentials. The present study provides comparative QSAR analysis on PAMPA/modified PAMPA for high throughput profiling of drugs with respect to Caco-2 cells and human intestinal absorption.

Journal ArticleDOI
TL;DR: Results suggest that SVILP is able to extract additional knowledge from the data, thus improving classification results further, and generally far superior specificity and precision.
Abstract: We investigate the classification performance of circular fingerprints in combination with the Naive Bayes Classifier (MP2D), Inductive Logic Programming (ILP) and Support Vector Inductive Logic Programming (SVILP) on a standard molecular benchmark dataset comprising 11 activity classes and about 102,000 structures. The Naive Bayes Classifier treats features independently while ILP combines structural fragments, and then creates new features with higher predictive power. SVILP is a very recently presented method which adds a support vector machine after common ILP procedures. The performance of the methods is evaluated via a number of statistical measures, namely recall, specificity, precision, F-measure, Matthews Correlation Coefficient, area under the Receiver Operating Characteristic (ROC) curve and enrichment factor (EF). According to the F-measure, which takes both recall and precision into account, SVILP is for seven out of the 11 classes the superior method. The results show that the Bayes Classifier gives the best recall performance for eight of the 11 targets, but has a much lower precision, specificity and F-measure. The SVILP model on the other hand has the highest recall for only three of the 11 classes, but generally far superior specificity and precision. To evaluate the statistical significance of the SVILP superiority, we employ McNemar’s test which shows that SVILP performs significantly (p < 5%) better than both other methods for six out of 11 activity classes, while being superior with less significance for three of the remaining classes. While previously the Bayes Classifier was shown to perform very well in molecular classification studies, these results suggest that SVILP is able to extract additional knowledge from the data, thus improving classification results further.

Journal ArticleDOI
TL;DR: By providing possible synthetic routes to a structure along with its design rationale, AllChem encourages simultaneous consideration of both costs and benefits during each lead discovery and optimization decision, thereby promising to be effective with synthetic chemists among its primary users.
Abstract: AllChem is a system that is intended to make practical the generation and searching of an unprecedentedly vast number ( approximately 10(20)) of synthetically accessible and medicinally relevant structures. Also, by providing possible synthetic routes to a structure along with its design rationale, AllChem encourages simultaneous consideration of both costs and benefits during each lead discovery and optimization decision, thereby promising to be effective with synthetic chemists among its primary users. AllChem is still under intensive development so the following initial description necessarily has more the character of an interim progress report than of a finished research publication.

Journal ArticleDOI
TL;DR: Predictive r2 values from an exploratory new “series trajectory” analysis of these 3D-QSAR though highly variable do not differ much from their q2 values, a phenomenon that seems to encourage prediction even when there are so few structures underlying a 3D, QSAR so that almost all information is unique.
Abstract: Based primarily on further studies of a collection of eleven publications reporting fifteen successful 3D-QSAR relations, several phenomena are preliminarily described. The RMS error of 133 ligand binding energy predictions based on these successful 3D-QSARs is 0.75 kcal/mole, which compares favorably to the prediction accuracies of approaches that include the receptor. A similar result is obtained when topomer alignments are substituted for those published, with seemingly profound implications for the future of 3D-QSAR. The “alignment-averaged” molecular properties, log P and molar refractivity, have very little correlative power for these data sets, either alone or in combination with the 3D-QSAR field descriptors. The q 2 metric for the number of PLS components necessarily tends to discard any unique or unconfirmed SAR information. Large drops in q 2 are thus to be expected whenever such unique information is first encountered. Predictive r 2 values from an exploratory new “series trajectory” analysis of these 3D-QSAR though highly variable do not differ much from their q 2 values, a phenomenon that seems to encourage prediction even when there are so few structures underlying a 3D-QSAR so that almost all information is unique.

Journal ArticleDOI
TL;DR: The use of the Hybrid Structure Based (HSB) method is demonstrated that can be used effectively to screen and identify prospective ligands that bind to GPCRs and proved that the HSB method provides a realistic solution to bridge the gap between the ever-increasing demand for new drugs to treat psychiatric disorders and the lack of efficient screening methods.
Abstract: G-protein coupled receptors (GPCRs) comprise a large superfamily of proteins that are targets for nearly 50% of drugs in clinical use today. In the past, the use of structure-based drug design strategies to develop better drug candidates has been severely hampered due to the absence of the receptor’s three-dimensional structure. However, with recent advances in molecular modeling techniques and better computing power, atomic level details of these receptors can be derived from computationally derived molecular models. Using information from these models coupled with experimental evidence, it has become feasible to build receptor pharmacophores. In this study, we demonstrate the use of the Hybrid Structure Based (HSB) method that can be used effectively to screen and identify prospective ligands that bind to GPCRs. Essentially; this multi-step method combines ligand-based methods for building enriched libraries of small molecules and structure-based methods for screening molecules against the GPCR target. The HSB method was validated to identify retinal and its analogues from a random dataset of ∼300,000 molecules. The results from this study showed that the 9 top-ranking molecules are indeed analogues of retinal. The method was also tested to identify analogues of dopamine binding to the dopamine D2 receptor. Six of the ten top-ranking molecules are known analogues of dopamine including a prodrug, while the other thirty-four molecules are currently being tested for their activity against all dopamine receptors. The results from both these test cases have proved that the HSB method provides a realistic solution to bridge the gap between the ever-increasing demand for new drugs to treat psychiatric disorders and the lack of efficient screening methods for GPCRs.

Journal ArticleDOI
TL;DR: Four different ligand-based virtual screening scenarios are studied: prioritizing compounds for subsequent high-throughput screening (HTS); selecting a predefined (small) number of potentially active compounds from a large chemical database; assessing the probability that a given structure will exhibit a given activity; selecting the most active structure for a biological assay.
Abstract: Four different ligand-based virtual screening scenarios are studied: (1) prioritizing compounds for subsequent high-throughput screening (HTS); (2) selecting a predefined (small) number of potentially active compounds from a large chemical database; (3) assessing the probability that a given structure will exhibit a given activity; (4) selecting the most active structure(s) for a biological assay. Each of the four scenarios is exemplified by performing retrospective ligand-based virtual screening for eight different biological targets using two large databases—MDDR and WOMBAT. A comparison between the chemical spaces covered by these two databases is presented. The performance of two techniques for ligand-based virtual screening—similarity search with subsequent data fusion (SSDF) and novelty detection with Self-Organizing Maps (ndSOM) is investigated. Three different structure representations—2,048-dimensional Daylight fingerprints, topological autocorrelation weighted by atomic physicochemical properties (sigma electronegativity, polarizability, partial charge, and identity) and radial distribution functions weighted by the same atomic physicochemical properties—are compared. Both methods were found applicable in scenario one. The similarity search was found to perform slightly better in scenario two while the SOM novelty detection is preferred in scenario three. No method/descriptor combination achieved significant success in scenario four.

Journal ArticleDOI
TL;DR: A search of ligand-bound X-ray crystal structures from the protein structure database shows that unusual binding mode could be a source of outliers.
Abstract: A lead optimization is usually carried out by structure-activity relationship (SAR) and/or quantitative structure-activity relationship (QSAR) studies. One of the assumptions in SAR and QSAR studies is that similar analogs bind to the same binding site in a similar binding mode. One often observes that there are outliers, especially in QSAR. However, most QSAR studies are carried out focusing their attention to the development of QSAR and leave the outliers without much attention. We searched a number of ligand-bound X-ray crystal structures from the protein structure database to find evidences that could indicate a possible source of outliers in SAR or QSAR. Our results show that unusual binding mode could be a source of outliers.

Journal ArticleDOI
TL;DR: A detailed 3D model of the I7L ligand binding site based on exceptionally high structural conservation of this site in proteases of the ULP family is built and fragments predicted to bind in the prime portion of the active site can be combined with fragments on non-prime side to yield compounds with improved activity and specificity.
Abstract: Essential for viral replication and highly conserved among poxviridae, the vaccinia virus I7L ubiquitin-like proteinase (ULP) is an attractive target for development of smallpox antiviral drugs. At the same time, the I7L proteinase exemplifies several interesting challenges from the rational drug design perspective. In the absence of a published I7L X-ray structure, we have built a detailed 3D model of the I7L ligand binding site (S2–S2′ pocket) based on exceptionally high structural conservation of this site in proteases of the ULP family. The accuracy and limitations of this model were assessed through comparative analysis of available X-ray structures of ULPs, as well as energy based conformational modeling. The 3D model of the I7L ligand binding site was used to perform covalent docking and VLS of a comprehensive library of about 230,000 available ketone and aldehyde compounds. Out of 456 predicted ligands, 97 inhibitors of I7L proteinase activity were confirmed in biochemical assays (∼20% overall hit rate). These experimental results both validate our I7L ligand binding model and provide initial leads for rational optimization of poxvirus I7L proteinase inhibitors. Thus, fragments predicted to bind in the prime portion of the active site can be combined with fragments on non-prime side to yield compounds with improved activity and specificity.

Journal ArticleDOI
TL;DR: The computer program, SPARC, uses computational algorithms based on fundamental chemical structure theory to estimate a large number of chemical reactivity parameters and physical properties for a wide range of organic molecules strictly from molecular structure.
Abstract: Mathematical models for predicting the transport and fate of pollutants in the environment require reactivity parameter values - that is the value of the physical and chemical constants that govern reactivity. Although empirical structure-activity relationships have been developed that allow estimation of some constants, such relationships are generally valid only within limited families of chemicals. The computer program, SPARC, uses computational algorithms based on fundamental chemical structure theory to estimate a large number of chemical reactivity parameters and physical properties for a wide range of organic molecules strictly from molecular structure. Resonance models were developed and calibrated using measured light absorption spectra, whereas electrostatic interaction models were developed using measured ionization pK(a)s in water. Solvation models (i.e., dispersion, induction, H-bonding, etc.) have been developed using various measured physical properties data. At the present time, SPARC's physical property models can predict vapor pressure and heat of vaporization (as a function of temperature), boiling point (as a function of pressure), diffusion coefficient (as a function of pressure and temperature), activity coefficient, solubility, partition coefficient and chromatographic retention time as a function of solvent and temperature. This prediction capability crosses chemical family boundaries to cover a broad range of organic compounds.

Journal ArticleDOI
TL;DR: The discovery of the anticonvulsant activities in the MES test of methyl paraben and propylparaben might be useful for the development of new anticonVulsant medications, specially considering the well-known toxicological profile of these drugs.
Abstract: A discriminant function based on topological descriptors was derived from a training set composed by anticonvulsants of clinical use or in clinical phase of development and compounds with other therapeutic uses. This model was internally and externally validated and applied in the virtual screening of chemical compounds from the Merck Index 13th. Methylparaben (Nipagin), a preservative widely used in food, cosmetics and pharmaceutics, was signaled as active by the discriminant function and tested in mice in the Maximal Electroshock (MES) test (i.p. administration), according to the NIH Program for Anticonvulsant Drug Development. Based on the results of Methylparaben, Propylparaben (Nipasol), another preservative usually used in association with the former, was also tested. Both methyl and propylparaben were found active in mice at doses of 30, 100, and 300 mg/kg. The discovery of the anticonvulsant activities in the MES test of methylparaben and propylparaben might be useful for the development of new anticonvulsant medications, specially considering the well-known toxicological profile of these drugs.

Journal ArticleDOI
TL;DR: The results showed the possibility of conformational changes in a flexible binding site as one possible source of outliers in SAR or QSAR.
Abstract: Structure-activity relationship (SAR) and/or quantitative structure-activity relationship (QSAR) studies play an important role in a lead optimization of drug discovery research. When there is a lack of ligand-bound protein structural information, one of the assumptions in SAR and QSAR studies is that similar analogs bind to the same binding site in a similar binding mode. In such studies, outliers have often been observed, especially in QSAR. However, most of these studies have focused their attention on the development of QSAR and left outliers unattended. We searched ligand-bound X-ray crystal structures from the protein structure database to find evidences that could indicate a possible source of outliers in SAR or QSAR. Our results showed the possibility of conformational changes in a flexible binding site as one possible source of outliers.