scispace - formally typeset
Search or ask a question

Showing papers in "Journal of Computer-aided Molecular Design in 2008"


Journal ArticleDOI
TL;DR: This review analyzes recent literature evaluating 3D virtual screening methods, with focus on molecular docking, and highlights problematic issues and provides guidelines on how to improve the quality of computational studies.
Abstract: Within the last few years a considerable amount of evaluative studies has been published that investigate the performance of 3D virtual screening approaches. Thereby, in particular assessments of protein-ligand docking are facing remarkable interest in the scientific community. However, comparing virtual screening approaches is a non-trivial task. Several publications, especially in the field of molecular docking, suffer from shortcomings that are likely to affect the significance of the results considerably. These quality issues often arise from poor study design, biasing, by using improper or inexpressive enrichment descriptors, and from errors in interpretation of the data output. In this review we analyze recent literature evaluating 3D virtual screening methods, with focus on molecular docking. We highlight problematic issues and provide guidelines on how to improve the quality of computational studies. Since 3D virtual screening protocols are in general assessed by their ability to discriminate between active and inactive compounds, we summarize the impact of the composition and preparation of test sets on the outcome of evaluations. Moreover, we investigate the significance of both classic enrichment parameters and advanced descriptors for the performance of 3D virtual screening methods. Furthermore, we review the significance and suitability of RMSD as a measure for the accuracy of protein-ligand docking algorithms and of conformational space sub sampling algorithms.

328 citations


Journal ArticleDOI
TL;DR: The relaxed complex scheme (RCS) is reviewed and new extensions and improvements of this methodology are discussed in the context of ligand binding to two example targets: kinetoplastid RNA editing ligase 1 and the W191G cavity mutant of cytochrome c peroxidase.
Abstract: The interactions among associating (macro)molecules are dynamic, which adds to the complexity of molecular recognition. While ligand flexibility is well accounted for in computational drug design, the effective inclusion of receptor flexibility remains an important challenge. The relaxed complex scheme (RCS) is a promising computational methodology that combines the advantages of docking algorithms with dynamic structural information provided by molecular dynamics (MD) simulations, therefore explicitly accounting for the flexibility of both the receptor and the docked ligands. Here, we briefly review the RCS and discuss new extensions and improvements of this methodology in the context of ligand binding to two example targets: kinetoplastid RNA editing ligase 1 and the W191G cavity mutant of cytochrome c peroxidase. The RCS improvements include its extension to virtual screening, more rigorous characterization of local and global binding effects, and methods to improve its computational efficiency by reducing the receptor ensemble to a representative set of configurations. The choice of receptor ensemble, its influence on the predictive power of RCS, and the current limitations for an accurate treatment of the solvent contributions are also briefly discussed. Finally, we outline potential methodological improvements that we anticipate will assist future development.

313 citations


Journal ArticleDOI
TL;DR: A modest beginning is proposed, with recommendations for requirements on statistical reporting, requirements for data sharing, and best practices for benchmark preparation and usage.
Abstract: The field of computational chemistry, particularly as applied to drug design, has become increasingly important in terms of the practical application of predictive modeling to pharmaceutical research and development. Tools for exploiting protein structures or sets of ligands known to bind particular targets can be used for binding-mode prediction, virtual screening, and prediction of activity. A serious weakness within the field is a lack of standards with respect to quantitative evaluation of methods, data set preparation, and data set sharing. Our goal should be to report new methods or comparative evaluations of methods in a manner that supports decision making for practical applications. Here we propose a modest beginning, with recommendations for requirements on statistical reporting, requirements for data sharing, and best practices for benchmark preparation and usage.

298 citations


Journal ArticleDOI
TL;DR: The modeling field has a long way to go to provide effective assessment of its approaches, either to itself or to a broader audience, but that there are no technical reasons why progress cannot be made.
Abstract: Two essential aspects of virtual screening are considered: experimental design and performance metrics. In the design of any retrospective virtual screen, choices have to be made as to the purpose of the exercise. Is the goal to compare methods? Is the interest in a particular type of target or all targets? Are we simulating a ‘real-world’ setting, or teasing out distinguishing features of a method? What are the confidence limits for the results? What should be reported in a publication? In particular, what criteria should be used to decide between different performance metrics? Comparing the field of molecular modeling to other endeavors, such as medical statistics, criminology, or computer hardware evaluation indicates some clear directions. Taken together these suggest the modeling field has a long way to go to provide effective assessment of its approaches, either to itself or to a broader audience, but that there are no technical reasons why progress cannot be made.

216 citations


Journal ArticleDOI
TL;DR: The effects of the myriad variables controlling said studies place significant limits on results interpretability are investigated, including analysis of calculation setup variation, the effect of target choice, active/decoy set selection and enrichment data interpretation.
Abstract: Over the last few years many articles have been published in an attempt to provide performance benchmarks for virtual screening tools. While this research has imparted useful insights, the myriad variables controlling said studies place significant limits on results interpretability. Here we investigate the effects of these variables, including analysis of calculation setup variation, the effect of target choice, active/decoy set selection (with particular emphasis on the effect of analogue bias) and enrichment data interpretation. In addition the optimization of the publicly available DUD benchmark sets through analogue bias removal is discussed, as is their augmentation through the addition of large diverse data sets collated using WOMBAT.

176 citations


Journal ArticleDOI
TL;DR: Several ways that DUD can be improved to provide better telemetry to investigators seeking to understand both the strengths and the weaknesses of current docking methods are outlined.
Abstract: Ligand enrichment among top-ranking hits is a key metric of virtual screening. To avoid bias, decoys should resemble ligands physically, so that enrichment is not attributable to simple differences of gross features. We therefore created a directory of useful decoys (DUD) by selecting decoys that resembled annotated ligands physically but not topologically to benchmark docking performance. DUD has 2950 annotated ligands and 95,316 property-matched decoys for 40 targets. It is by far the largest and most comprehensive public data set for benchmarking virtual screening programs that I am aware of. This paper outlines several ways that DUD can be improved to provide better telemetry to investigators seeking to understand both the strengths and the weaknesses of current docking methods. I also highlight several pitfalls for the unwary: a risk of over-optimization, questions about chemical space, and the proper scope for using DUD. Careful attention to both the composition of benchmarks and how they are used is essential to avoid being misled by overfitting and bias.

167 citations


Journal ArticleDOI
TL;DR: This paper compares a number of docking tools for their performance in cognate re-docking (pose prediction) and/or virtual screening using a variety of ligand-based approaches.
Abstract: The recent literature is replete with papers evaluating computational tools (often those operating on 3D structures) for their performance in a certain set of tasks. Most commonly these papers compare a number of docking tools for their performance in cognate re-docking (pose prediction) and/or virtual screening. Related papers have been published on ligand-based tools: pose prediction by conformer generators and virtual screening using a variety of ligand-based approaches. The reliability of these comparisons is critically affected by a number of factors usually ignored by the authors, including bias in the datasets used in virtual screening, the metrics used to assess performance in virtual screening and pose prediction and errors in crystal structures used.

152 citations


Journal ArticleDOI
TL;DR: This paper presents detailed examples of pitfalls in each area of data sharing, data set design and preparation, and statistical reporting and makes recommendations as to best practices.
Abstract: Computational methods for docking ligands to protein binding sites have become ubiquitous in drug discovery Despite the age of the field, no standards have been established with respect to methodological evaluation of docking accuracy, virtual screening utility, or scoring accuracy There are critical issues relating to data sharing, data set design and preparation, and statistical reporting that have an impact on the degree to which a report will translate into real-world performance These issues also have an impact on whether there is a transparent relationship between methodological changes and reported performance improvements This paper presents detailed examples of pitfalls in each area and makes recommendations as to best practices

140 citations


Journal ArticleDOI
TL;DR: The advantages of the HiT QSAR approach reported here are the absence of “molecular alignment” problems, consideration of different physical–chemical properties of atoms, the high adequacy and good interpretability of obtained models and clear ways for molecular design.
Abstract: This article is about the hierarchical quantitative structure–activity relationship technology (HiT QSAR) based on the Simplex representation of molecular structure (SiRMS) and its application for different QSAR/QSP(property)R tasks. The essence of this technology is a sequential solution (with the use of the information obtained on the previous steps) to the QSAR problem by the series of enhanced models of molecular structure description [from one dimensional (1D) to four dimensional (4D)]. It is a system of permanently improved solutions. In the SiRMS approach, every molecule is represented as a system of different simplexes (tetratomic fragments with fixed composition, structure, chirality and symmetry). The level of simplex descriptors detailing increases consecutively from the 1D to 4D representation of the molecular structure. The advantages of the approach reported here are the absence of “molecular alignment” problems, consideration of different physical–chemical properties of atoms (e.g. charge, lipophilicity, etc.), the high adequacy and good interpretability of obtained models and clear ways for molecular design. The efficiency of the HiT QSAR approach is demonstrated by comparing it with the most popular modern QSAR approaches on two representative examination sets. The examples of successful application of the HiT QSAR for various QSAR/QSPR investigations on the different levels (1D–4D) of the molecular structure description are also highlighted. The reliability of developed QSAR models as predictive virtual screening tools and their ability to serve as the base of directed drug design was validated by subsequent synthetic and biological experiments, among others. The HiT QSAR is realized as a complex of computer programs known as HiT QSAR software that also includes a powerful statistical block and a number of useful utilities.

134 citations


Journal ArticleDOI
TL;DR: The QSAR enigma wherein model predictivity is not a necessary component of a model’s usefulness is the focus.
Abstract: This perspective concerns the methods employed within the current drug discovery community to develop predictive quantitative structure–activity relationships (QSAR). Specifically, a number of cautions are provided which may circumvent misuse and misunderstanding of the technique. Ignorance of such caveats has led to a discouraging tendency of the methods to result in poorly predictive models. Among these pitfalls are the fondness with which we associate correlation with causation, the mesmerizing influence of large numbers of molecular descriptors, the incessant misuse of the leave-one-out paradigm, and finally, the QSAR enigma wherein model predictivity is not a necessary component of a model’s usefulness.

130 citations


Journal ArticleDOI
TL;DR: Launched in late 2007, spinetoram provides both improved efficacy and an expanded spectrum while maintaining the exceptional environmental and toxicological profile already established for the spinosyn chemistry.
Abstract: Improvements in the efficacy and spectrum of the spinosyns, novel fermentation derived insecticide, has long been a goal within Dow AgroSciences. As large and complex fermentation products identifying specific modifications to the spinosyns likely to result in improved activity was a difficult process, since most modifications decreased the activity. A variety of approaches were investigated to identify new synthetic directions for the spinosyn chemistry including several explorations of the quantitative structure activity relationships (QSAR) of spinosyns, which initially were unsuccessful. However, application of artificial neural networks (ANN) to the spinosyn QSAR problem identified new directions for improved activity in the chemistry, which subsequent synthesis and testing confirmed. The ANN-based analogs coupled with other information on substitution effects resulting from spinosyn structure activity relationships lead to the discovery of spinetoram (XDE-175). Launched in late 2007, spinetoram provides both improved efficacy and an expanded spectrum while maintaining the exceptional environmental and toxicological profile already established for the spinosyn chemistry.

Journal ArticleDOI
TL;DR: Two modifications to the standard use of receiver operating characteristic (ROC) curves for evaluating virtual screening methods are proposed, including replacing the linear plots usually used with semi-logarithmic ones (pROC plots), including when doing “area under the curve” (AUC) calculations.
Abstract: Two modifications to the standard use of receiver operating characteristic (ROC) curves for evaluating virtual screening methods are proposed. The first is to replace the linear plots usually used with semi-logarithmic ones (pROC plots), including when doing “area under the curve” (AUC) calculations. Doing so is a simple way to bias the statistic to favor identification of “hits” early in the recovery curve rather than late. A second suggested modification entails weighting each active based on the size of the lead series to which it belongs. Two weighting schemes are described: arithmetic, in which the weight for each active is inversely proportional to the size of the cluster from which it comes; and harmonic, in which weights are inversely proportional to the rank of each active within its class. Either scheme is able to distinguish biased from unbiased screening statistics, but the harmonically weighted AUC in particular emphasizes the ability to place representatives of each class of active early in the recovery curve.

Journal ArticleDOI
Istvan J. Enyedy1, William Egan1
TL;DR: The relationship of docking scores with experimentally determined IC50 values measured in-house were tested and it was found that for the test sets considered, MW and sometimes ClogP were as useful as GlideScores and no significant difference was observed between SP and XP scores for differentiating between actives and inactives.
Abstract: Docking and scoring is currently one of the tools used for hit finding and hit-to-lead optimization when structural information about the target is known. Docking scores have been found useful for optimizing ligand binding to reproduce experimentally observed binding modes. The question is, can docking and scoring be used reliably for hit-to-lead optimization? To illustrate the challenges of scoring for hit-to-lead optimization, the relationship of docking scores with experimentally determined IC50 values measured in-house were tested. The influences of the particular target, crystal structure, and the precision of the scoring function on the ability to differentiate between actives and inactives were analyzed by calculating the area under the curve of receiver operator characteristic curves for docking scores. It was found that for the test sets considered, MW and sometimes ClogP were as useful as GlideScores and no significant difference was observed between SP and XP scores for differentiating between actives and inactives. Interpretation by an expert is still required to successfully utilize docking and scoring in hit-to-lead optimization.

Journal ArticleDOI
TL;DR: A general induced fit docking protocol is described that requires only one initial pocket conformation and identifies most of the correct ligand positions as the lowest score and does not deteriorate due to substantial increase of the pocket variability.
Abstract: Protein binding sites undergo ligand specific conformational changes upon ligand binding. However, most docking protocols rely on a fixed conformation of the receptor, or on the prior knowledge of multiple conformations representing the variation of the pocket, or on a known bounding box for the ligand. Here we described a general induced fit docking protocol that requires only one initial pocket conformation and identifies most of the correct ligand positions as the lowest score. We expanded a previously used diverse “cross-docking” benchmark to thirty ligand–protein pairs extracted from different crystal structures. The algorithm systematically scans pairs of neighbouring side chains, replaces them by alanines, and docks the ligand to each ‘gapped’ version of the pocket. All docked positions are scored, refined with original side chains and flexible backbone and re-scored. In the optimal version of the protocol pairs of residues were replaced by alanines and only one best scoring conformation was selected from each ‘gapped’ pocket for refinement. The optimal SCARE (SCan Alanines and REfine) protocol identifies a near native conformation (under 2 A RMSD) as the lowest rank for 80% of pairs if the docking bounding box is defined by the predicted pocket envelope, and for as many as 90% of the pairs if the bounding box is derived from the known answer with ∼5 A margin as used in most previous publications. The presented fully automated algorithm takes about 2 h per pose of a single processor time, requires only one pocket structure and no prior knowledge about the binding site location. Furthermore, the results for conformationally conserved pockets do not deteriorate due to substantial increase of the pocket variability.

Journal ArticleDOI
Rajarshi Guha1
TL;DR: The need for interpretation and an overview of the factors that affect interpretability of QSAR models are discussed and a number of case studies where workers have provide some form of interpretation of aQSAR model are discussed.
Abstract: The goal of a quantitative structure–activity relationship (QSAR) model is to encode the relationship between molecular structure and biological activity or physical property. Based on this encoding, such models can be used for predictive purposes. Assuming the use of relevant and meaningful descriptors, and a statistically significant model, extraction of the encoded structure–activity relationships (SARs) can provide insight into what makes a molecule active or inactive. Such analyses by QSAR models are useful in a number of scenarios, such as suggesting structural modifications to enhance activity, explanation of outliers and exploratory analysis of novel SARs. In this paper we discuss the need for interpretation and an overview of the factors that affect interpretability of QSAR models. We then describe interpretation protocols for different types of models, highlighting the different types of interpretations, ranging from very broad, global, trends to very specific, case-by-case, descriptions of the SAR, using examples from the training set. Finally, we discuss a number of case studies where workers have provide some form of interpretation of a QSAR model.

Journal ArticleDOI
TL;DR: A method to select the ensembles that produce the best enrichments that does not rely on knowledge of active compounds or sophisticated analyses of the 3D receptor structures is presented.
Abstract: While it may seem intuitive that using an ensemble of multiple conformations of a receptor in structure-based virtual screening experiments would necessarily yield improved enrichment of actives relative to using just a single receptor, it turns out that at least in the p38 MAP kinase model system studied here, a very large majority of all possible ensembles do not yield improved enrichment of actives. However, there are combinations of receptor structures that do lead to improved enrichment results. We present here a method to select the ensembles that produce the best enrichments that does not rely on knowledge of active compounds or sophisticated analyses of the 3D receptor structures. In the system studied here, the small fraction of ensembles of up to 3 receptors that do yield good enrichments of actives were identified by selecting ensembles that have the best mean GlideScore for the top 1% of the docked ligands in a database screen of actives and drug-like “decoy” ligands. Ensembles of two receptors identified using this mean GlideScore metric generally outperform single receptors, while ensembles of three receptors identified using this metric consistently give optimal enrichment factors in which, for example, 40% of the known actives outrank all the other ligands in the database.

Journal ArticleDOI
TL;DR: Some considerations to be taken into account by QSAR for modeling drug metabolism, such as the accuracy/consistency of the entire data set, representation and diversity of the training and test sets, and variable selection are described.
Abstract: Quantitative structure-activity relationships (QSAR) methods are urgently needed for predicting ADME/T (absorption, distribution, metabolism, excretion and toxicity) properties to select lead compounds for optimization at the early stage of drug discovery, and to screen drug candidates for clinical trials. Use of suitable QSAR models ultimately results in lesser time-cost and lower attrition rate during drug discovery and development. In the case of ADME/T parameters, drug metabolism is a key determinant of metabolic stability, drug-drug interactions, and drug toxicity. QSAR models for predicting drug metabolism have undergone significant advances recently. However, most of the models used lack sufficient interpretability and offer poor predictability for novel drugs. In this review, we describe some considerations to be taken into account by QSAR for modeling drug metabolism, such as the accuracy/consistency of the entire data set, representation and diversity of the training and test sets, and variable selection. We also describe some novel statistical techniques (ensemble methods, multivariate adaptive regression splines and graph machines), which are not yet used frequently to develop QSAR models for drug metabolism. Subsequently, rational recommendations for developing predictable and interpretable QSAR models are made. Finally, the recent advances in QSAR models for cytochrome P450-mediated drug metabolism prediction, including in vivo hepatic clearance, in vitro metabolic stability, inhibitors and substrates of cytochrome P450 families, are briefly summarized.

Journal ArticleDOI
TL;DR: By analyzing the molecular similarities of known drugs, it is shown that the inductive bias of the historic drug discovery process has a very strong 2D bias.
Abstract: Inductive bias is the set of assumptions that a person or procedure makes in making a prediction based on data. Different methods for ligand-based predictive modeling have different inductive biases, with a particularly sharp contrast between 2D and 3D similarity methods. A unique aspect of ligand design is that the data that exist to test methodology have been largely man-made, and that this process of design involves prediction. By analyzing the molecular similarities of known drugs, we show that the inductive bias of the historic drug discovery process has a very strong 2D bias. In studying the performance of ligand-based modeling methods, it is critical to account for this issue in dataset preparation, use of computational controls, and in the interpretation of results. We propose specific strategies to explicitly address the problems posed by inductive bias considerations.

Journal ArticleDOI
TL;DR: The results demonstrate the effectiveness of the automatic model generation process for two types of data sets commonly encountered in building ADME QSAR models, a small set of in vivo data and a large set of physico-chemical data.
Abstract: In this article, we present an automatic model generation process for building QSAR models using Gaussian Processes, a powerful machine learning modeling method. We describe the stages of the process that ensure models are built and validated within a rigorous framework: descriptor calculation, splitting data into training, validation and test sets, descriptor filtering, application of modeling techniques and selection of the best model. We apply this automatic process to data sets of blood–brain barrier penetration and aqueous solubility and compare the resulting automatically generated models with ‘manually’ built models using external test sets. The results demonstrate the effectiveness of the automatic model generation process for two types of data sets commonly encountered in building ADME QSAR models, a small set of in vivo data and a large set of physico-chemical data.

Journal ArticleDOI
TL;DR: In the present study the desirability function is used for the first time for the analysis of the effects of the electron structure in the process of pattern recognition of active and inactive compounds.
Abstract: A new paradigm is suggested for pattern recognition of drugs The approach is based on the combined application of the 4D/3D quantitative structure–activity relationship (QSAR) algorithms BiS and ConGO The first algorithm, BiS/MC (multiconformational), is used for the search for the conformers interacting with a receptor The second algorithm, ConGO, has been suggested for the detailed study of the selected conformers' electron density and for the search for the electron structure fragments that determine the pharmacophore and antipharmacophore parts of the compounds In this work we suggest using a new AlteQ method for the evaluation of the molecular electron density AlteQ describes the experimental electron density (determined by low-temperature highly accurate X-ray analysis) much better than a number of quantum approaches Herein this is shown using a comparison of the computed electron density with the results of highly accurate X-ray analysis In the present study the desirability function is used for the first time for the analysis of the effects of the electron structure in the process of pattern recognition of active and inactive compounds The suggested method for pattern recognition has been used for the investigation of various sets of compounds such as DNA-antimetabolites, fXa inhibitors, 5-HT1A, and α1-AR receptors inhibitors The pharmacophore and antipharmacophore fragments have been found in the electron structures of the compounds It has been shown that the pattern recognition cross-validation quality for the datasets is unity

Journal ArticleDOI
TL;DR: It is seen that the relative effectiveness of virtual screening methods, as measured by the enrichment factor, is highly dependent on the particular crystal structure or ligand, and on the database being searched.
Abstract: As an extension to a previous published study (McGaughey et al, J Chem Inf Model 47:1504–1519, 2007) comparing 2D and 3D similarity methods to docking, we apply a subset of those virtual screening methods (TOPOSIM, SQW, ROCS-color, and Glide) to a set of protein/ligand pairs where the protein is the target for docking and the cocrystallized ligand is the target for the similarity methods Each protein is represented by a maximum of five crystal structures We search a diverse subset of the MDDR as well as a diverse small subset of the MCIDB, Merck’s proprietary database It is seen that the relative effectiveness of virtual screening methods, as measured by the enrichment factor, is highly dependent on the particular crystal structure or ligand, and on the database being searched 2D similarity methods appear very good for the MDDR, but poor for the MCIDB However, ROCS-color (a 3D similarity method) does well for both databases

Journal ArticleDOI
TL;DR: A novel method of scoring-function optimization that supports the use of additional information to constrain scoring function parameters, which can be used to focus a scoring function’s training towards a particular application, such as screening enrichment.
Abstract: Empirical scoring functions used in protein-ligand docking calculations are typically trained on a dataset of complexes with known affinities with the aim of generalizing across different docking applications. We report a novel method of scoring-function optimization that supports the use of additional information to constrain scoring function parameters, which can be used to focus a scoring function’s training towards a particular application, such as screening enrichment. The approach combines multiple instance learning, positive data in the form of ligands of protein binding sites of known and unknown affinity and binding geometry, and negative (decoy) data of ligands thought not to bind particular protein binding sites or known not to bind in particular geometries. Performance of the method for the Surflex-Dock scoring function is shown in cross-validation studies and in eight blind test cases. Tuned functions optimized with a sufficient amount of data exhibited either improved or undiminished screening performance relative to the original function across all eight complexes. Analysis of the changes to the scoring function suggest that modifications can be learned that are related to protein-specific features such as active-site mobility.

Journal ArticleDOI
TL;DR: This report investigates various aspects of developing computational models to predict cell toxicity based on cell proliferation screening data generated in the MLSCN and presents the results of random forest ensemble models developed using different cell proliferation datasets and highlight protocols to take into account their extremely imbalanced nature.
Abstract: Computational toxicology is emerging as an encouraging alternative to experimental testing. The Molecular Libraries Screening Center Network (MLSCN) as part of the NIH Molecular Libraries Roadmap has recently started generating large and diverse screening datasets, which are publicly available in PubChem. In this report, we investigate various aspects of developing computational models to predict cell toxicity based on cell proliferation screening data generated in the MLSCN. By capturing feature-based information in those datasets, such predictive models would be useful in evaluating cell-based screening results in general (for example from reporter assays) and could be used as an aid to identify and eliminate potentially undesired compounds. Specifically we present the results of random forest ensemble models developed using different cell proliferation datasets and highlight protocols to take into account their extremely imbalanced nature. Depending on the nature of the datasets and the descriptors employed we were able to achieve percentage correct classification rates between 70% and 85% on the prediction set, though the accuracy rate dropped significantly when the models were applied to in vivo data. In this context we also compare the MLSCN cell proliferation results with animal acute toxicity data to investigate to what extent animal toxicity can be correlated and potentially predicted by proliferation results. Finally, we present a visualization technique that allows one to compare a new dataset to the training set of the models to decide whether the new dataset may be reliably predicted.

Journal ArticleDOI
TL;DR: The present study applies the Hierarchical Technology for Quantitative Structure–Activity Relationships (HiT QSAR) for evaluation of the influence of the characteristics of 28 nitroaromatic compounds as to their toxicity and prediction of toxicity for new nitroARomatic derivatives.
Abstract: The present study applies the Hierarchical Technology for Quantitative Structure–Activity Relationships (HiT QSAR) for (i) evaluation of the influence of the characteristics of 28 nitroaromatic compounds (some of which belong to a widely known class of explosives) as to their toxicity; (ii) prediction of toxicity for new nitroaromatic derivatives; (iii) analysis of the effects of substituents in nitroaromatic compounds on their toxicity in vivo. The 50% lethal dose concentration for rats (LD50) was used to develop the QSAR models based on simplex representation of molecular structure. The preliminary 1D QSAR results show that even the information on the composition of molecules reveals the main tendencies of changes in toxicity. The statistic characteristics for partial least squares 2D QSAR models are quite satisfactory (R 2 = 0.96–0.98; Q 2 = 0.91–0.93; R 2 test = 0.89–0.92), which allows us to carry out the prediction of activity for 41 novel compounds designed by the application of new combinations of substituents represented in the training set. The comprehensive analysis of toxicity changes as a function of substituent position and nature was carried out. Molecular fragments that promote and interfere with toxicity were defined on the basis of the obtained models. It was shown that the mutual influence of substituents in the benzene ring plays a crucial role regarding toxicity. The influence of different substituents on toxicity can be mediated via different C–H fragments of the aromatic ring.

Journal ArticleDOI
TL;DR: A novel algorithm for the connecting of fragment molecules is presented and validated for a number of test systems and the general applicability of this approach within the field of fragment-based de novo drug design is discussed.
Abstract: A novel algorithm for the connecting of fragment molecules is presented and validated for a number of test systems. Within the CONFIRM (Connecting Fragments Found in Receptor Molecules) approach a pre-prepared library of bridges is searched to extract those which match a search criterion derived from known experimental or computational binding information about fragment molecules within a target binding site. The resulting bridge ‘hits’ are then connected, in an automated fashion, to the fragments and docked into the target receptor. Docking poses are assessed in terms of root-mean-squared deviation from the known positions of the fragment molecules, as well as docking score should known inhibitors be available. The creation of the bridge library, the full details and novelty of the CONFIRM algorithm, and the general applicability of this approach within the field of fragment-based de novo drug design are discussed.

Journal ArticleDOI
TL;DR: The homology modeling technique has been used to construct the structure of potato 5-LOX and the results correlated well with the experimental data reported, which proved the quality of the model.
Abstract: Lipoxygenases (LOXs) are a group of enzymes involved in the oxygenation of polyunsaturated fatty acids. Among these 5-lipoxygenase (5-LOX) is the key enzyme leading to the formation of pharmacologically important leukotrienes and lipoxins, the mediators of inflammatory and allergic disorders. In view of close functional similarity to mammalian lipoxygenase, potato 5-LOX is used extensively. In this study, the homology modeling technique has been used to construct the structure of potato 5-LOX. The amino acid sequence identity between the target protein and sequence of template protein 1NO3 (soybean LOX-3) searched from NCBI protein BLAST was 63%. Based on the template structure, the protein model was constructed by using the Homology program in InsightII. The protein model was briefly refined by energy minimization steps and validated using Profile-3D, ERRAT and PROCHECK. The results showed that 99.3% of the amino acids were in allowed regions of Ramachandran plot, suggesting that the model is accurate and its stereochemical quality good. Like all LOXs, 5-LOX also has a two-domain structure, the small N-terminal beta-barrel domain and a larger catalytic domain containing a single atom of non-heme iron coordinating with His525, His530, His716 and Ile864. Asn720 is present in the fifth coordination position of iron. The sixth coordination position faces the open cavity occupied here by the ligands which are docked. Our model of the enzyme is further validated by examining the interactions of earlier reported inhibitors and by energy minimization studies which were carried out using molecular mechanics calculations. Four ligands, nordihydroguaiaretic acid (NDGA) having IC(50) of 1.5 microM and analogs of benzyl propargyl ethers having IC(50) values of 760 microM, 45 microM, and no inhibition respectively were selected for our docking and energy minimization studies. Our results correlated well with the experimental data reported earlier, which proved the quality of the model. This model generated can be further used for the design and development of more potent 5-LOX inhibitors.

Journal ArticleDOI
TL;DR: These studies suggest that validated QSAR models could complement structure based docking and scoring approaches in identifying promising hits by virtual screening of molecular libraries and question as to whether true binders and decoys could be distinguished based only on their structural chemical descriptors.
Abstract: The use of inaccurate scoring functions in docking algorithms may result in the selection of compounds with high predicted binding affinity that nevertheless are known experimentally not to bind to the target receptor. Such falsely predicted binders have been termed 'binding decoys'. We posed a question as to whether true binders and decoys could be distinguished based only on their structural chemical descriptors using approaches commonly used in ligand based drug design. We have applied the k-Nearest Neighbor (kNN) classification QSAR approach to a dataset of compounds characterized as binders or binding decoys of AmpC beta-lactamase. Models were subjected to rigorous internal and external validation as part of our standard workflow and a special QSAR modeling scheme was employed that took into account the imbalanced ratio of inhibitors to non-binders (1:4) in this dataset. 342 predictive models were obtained with correct classification rate (CCR) for both training and test sets as high as 0.90 or higher. The prediction accuracy was as high as 100% (CCR = 1.00) for the external validation set composed of 10 compounds (5 true binders and 5 decoys) selected randomly from the original dataset. For an additional external set of 50 known non-binders, we have achieved the CCR of 0.87 using very conservative model applicability domain threshold. The validated binary kNN QSAR models were further employed for mining the NCGC AmpC screening dataset (69653 compounds). The consensus prediction of 64 compounds identified as screening hits in the AmpC PubChem assay disagreed with their annotation in PubChem but was in agreement with the results of secondary assays. At the same time, 15 compounds were identified as potential binders contrary to their annotation in PubChem. Five of them were tested experimentally and showed inhibitory activities in millimolar range with the highest binding constant K(i) of 135 microM. Our studies suggest that validated QSAR models could complement structure based docking and scoring approaches in identifying promising hits by virtual screening of molecular libraries.

Journal ArticleDOI
TL;DR: Experimental results show that cyclo[(-d-Phe-l-Ala)n = 4-] peptides self-assemble into nanotube bundles, and molecular modeling results indicate that cyclic peptide nanotubes with n-= 3, 4, 5 and 6 are very stable.
Abstract: In order to investigate the structures and properties of cyclic peptide nanotubes of cyclo[(-D: -Phe-L: -Ala)( n = 3,4,5,6)-], cyclo[(-D: -Phe-L: -Ala)( n = 4)-] was synthesized and self-assembled to nanotubes, and its structure and morphology of the nanotube were characterized by mass spectrometry (MS), fourier transform infrared spectroscopy (FT-IR) and scanning electron microscopy (SEM). On the basis of these experimental results, the structures of cyclo[(-D: -Phe-L: -Ala)( n = 3,4,5,6)-] were characterized by molecular dynamics. In addition, the motion behaviors of H(2)O molecules in nanotubes were investigated by molecular dynamics using a COMPASS force field. Experimental results show that cyclo[(-D: -Phe-L: -Ala)( n = 4)-] peptides self-assemble into nanotube bundles. Molecular modeling results indicate that cyclic peptide nanotubes with n = 3, 4, 5 and 6 are very stable; these nanotubes have internal diameters of 5.9 A, 8.1 A, 10.8 A and 13.1 A and outer diameters of 18.2 A, 21.7 A, 23.4 A and 25.9 A respectively. Modeling results demonstrate that H(2)O molecules move in cooperation in single nanotube and they diffuse in one dimension, but they did not diffuse unilaterally due to the antiparallel ring stacking arrangement.

Journal ArticleDOI
TL;DR: The LDA-assisted QSAR models presented here could significantly reduce the number of synthesized and tested compounds and could increase the chance of finding new chemical entities with anti-trichomonal activity.
Abstract: Trichomonas vaginalis (Tv) is the causative agent of the most common, non-viral, sexually transmitted disease in women and men worldwide. Since 1959, met- ronidazole (MTZ) has been the drug of choice in the systemic treatment of trichomoniasis. However, resistance to MTZ in some patients and the great cost associated with the development of new trichomonacidals make necessary the development of computational methods that shorten the drug discovery pipeline. Toward this end, bond-based linear indices, new TOMOCOMD-CARDD molecular descriptors, and linear discriminant analysis were used to discover novel trichomonacidal chemicals. The obtained models, using non-stochastic and stochastic indices, are able to classify correctly 89.01% (87.50%) and 82.42% (84.38%) of the chemicals in the training (test) sets, respectively. These results validate the models for their use in the ligand-based virtual screening. In addition, they show large Matthews' correlation coefficients (C) of 0.78 (0.71) and 0.65 (0.65) for the training (test) sets, corre- spondingly. The result of predictions on the 10% full-out cross-validation test also evidences the robustness of the obtained models. Later, both models are applied to the virtual screening of 12 compounds already proved against Tv. As a result, they correctly classify 10 out of 12 (83.33%) and 9 out of 12 (75.00%) of the chemicals, respectively; which is the most important criterion for validating the models. Besides, these classification func- tions are applied to a library of seven chemicals in order to find novel antitrichomonal agents. These compounds are synthesized and tested for in vitro activity against Tv .A s a result, experimental observations approached to theoretical predictions, since it was obtained a correct classification of 85.71% (6 out of 7) of the chemicals. Moreover, out of the

Journal ArticleDOI
TL;DR: It is shown that over a wide range of receptor families, eHiTS LASSO is consistently able to enrich screened databases and provides scaffold hopping ability.
Abstract: Virtual Ligand Screening (VLS) has become an integral part of the drug discovery process for many pharmaceutical companies. Ligand similarity searches provide a very powerful method of screening large databases of ligands to identify possible hits. If these hits belong to new chemotypes the method is deemed even more successful. eHiTS LASSO uses a new interacting surface point types (ISPT) molecular descriptor that is generated from the 3D structure of the ligand, but unlike most 3D descriptors it is conformation independent. Combined with a neural network machine learning technique, LASSO screens molecular databases at an ultra fast speed of 1 million structures in under 1 min on a standard PC. The results obtained from eHiTS LASSO trained on relatively small training sets of just 2, 4 or 8 actives are presented using the diverse directory of useful decoys (DUD) dataset. It is shown that over a wide range of receptor families, eHiTS LASSO is consistently able to enrich screened databases and provides scaffold hopping ability.