scispace - formally typeset
Search or ask a question

Showing papers in "Journal of Computer-aided Molecular Design in 2018"


Journal ArticleDOI
TL;DR: The outcome of GC2 underscores the pressing need for methods development in pose prediction, particularly for ligand scaffolds not currently represented in the Protein Data Bank (pdb.org), and in affinity ranking and scoring of bound ligands.
Abstract: The Drug Design Data Resource (D3R) ran Grand Challenge 2 (GC2) from September 2016 through February 2017. This challenge was based on a dataset of structures and affinities for the nuclear receptor farnesoid X receptor (FXR), contributed by F. Hoffmann-La Roche. The dataset contained 102 IC50 values, spanning six orders of magnitude, and 36 high-resolution co-crystal structures with representatives of four major ligand classes. Strong global participation was evident, with 49 participants submitting 262 prediction submission packages in total. Procedurally, GC2 mimicked Grand Challenge 2015 (GC2015), with a Stage 1 subchallenge testing ligand pose prediction methods and ranking and scoring methods, and a Stage 2 subchallenge testing only ligand ranking and scoring methods after the release of all blinded co-crystal structures. Two smaller curated sets of 18 and 15 ligands were developed to test alchemical free energy methods. This overview summarizes all aspects of GC2, including the dataset details, challenge procedures, and participant results. We also consider implications for progress in the field, while highlighting methodological areas that merit continued development. Similar to GC2015, the outcome of GC2 underscores the pressing need for methods development in pose prediction, particularly for ligand scaffolds not currently represented in the Protein Data Bank ( http://www.pdb.org ), and in affinity ranking and scoring of bound ligands.

151 citations


Journal ArticleDOI
TL;DR: An overview of the SAMPL6 host–guest binding affinity prediction challenge, which featured three supramolecular hosts and an overall improvement in the correlation obtained by the affinity predictions for OA and TEMOA systems, but a surprising lack of improvement regarding root mean square error over the past several challenge rounds.
Abstract: Accurately predicting the binding affinities of small organic molecules to biological macromolecules can greatly accelerate drug discovery by reducing the number of compounds that must be synthesized to realize desired potency and selectivity goals. Unfortunately, the process of assessing the accuracy of current computational approaches to affinity prediction against binding data to biological macromolecules is frustrated by several challenges, such as slow conformational dynamics, multiple titratable groups, and the lack of high-quality blinded datasets. Over the last several SAMPL blind challenge exercises, host-guest systems have emerged as a practical and effective way to circumvent these challenges in assessing the predictive performance of current-generation quantitative modeling tools, while still providing systems capable of possessing tight binding affinities. Here, we present an overview of the SAMPL6 host-guest binding affinity prediction challenge, which featured three supramolecular hosts: octa-acid (OA), the closely related tetra-endo-methyl-octa-acid (TEMOA), and cucurbit[8]uril (CB8), along with 21 small organic guest molecules. A total of 119 entries were received from ten participating groups employing a variety of methods that spanned from electronic structure and movable type calculations in implicit solvent to alchemical and potential of mean force strategies using empirical force fields with explicit solvent models. While empirical models tended to obtain better performance than first-principle methods, it was not possible to identify a single approach that consistently provided superior results across all host-guest systems and statistical metrics. Moreover, the accuracy of the methodologies generally displayed a substantial dependence on the system considered, emphasizing the need for host diversity in blind evaluations. Several entries exploited previous experimental measurements of similar host-guest systems in an effort to improve their physical-based predictions via some manner of rudimentary machine learning; while this strategy succeeded in reducing systematic errors, it did not correspond to an improvement in statistical correlation. Comparison to previous rounds of the host-guest binding free energy challenge highlights an overall improvement in the correlation obtained by the affinity predictions for OA and TEMOA systems, but a surprising lack of improvement regarding root mean square error over the past several challenge rounds. The data suggests that further refinement of force field parameters, as well as improved treatment of chemical effects (e.g., buffer salt conditions, protonation states), may be required to further enhance predictive accuracy.

99 citations


Journal ArticleDOI
TL;DR: A modified version of the contact-based binding affinity predictor PRODIGY is developed, using the number of interatomic contacts classified by their type and the intermolecular electrostatic energy, which results in an enrichment factor of 2.5 compared to a random predictor for ranking ligands within the top 25%, making it a promising approach to identify lead compounds in virtual screening.
Abstract: We present the performance of HADDOCK, our information-driven docking software, in the second edition of the D3R Grand Challenge. In this blind experiment, participants were requested to predict the structures and binding affinities of complexes between the Farnesoid X nuclear receptor and 102 different ligands. The models obtained in Stage1 with HADDOCK and ligand-specific protocol show an average ligand RMSD of 5.1 A from the crystal structure. Only 6/35 targets were within 2.5 A RMSD from the reference, which prompted us to investigate the limiting factors and revise our protocol for Stage2. The choice of the receptor conformation appeared to have the strongest influence on the results. Our Stage2 models were of higher quality (13 out of 35 were within 2.5 A), with an average RMSD of 4.1 A. The docking protocol was applied to all 102 ligands to generate poses for binding affinity prediction. We developed a modified version of our contact-based binding affinity predictor PRODIGY, using the number of interatomic contacts classified by their type and the intermolecular electrostatic energy. This simple structure-based binding affinity predictor shows a Kendall’s Tau correlation of 0.37 in ranking the ligands (7th best out of 77 methods, 5th/25 groups). Those results were obtained from the average prediction over the top10 poses, irrespective of their similarity/correctness, underscoring the robustness of our simple predictor. This results in an enrichment factor of 2.5 compared to a random predictor for ranking ligands within the top 25%, making it a promising approach to identify lead compounds in virtual screening.

90 citations


Journal ArticleDOI
TL;DR: A comparison of the performance of seven different meta-classifiers for their ability to handle imbalanced datasets, whereby Random Forest is used as base-classifier and Stratified bagging, MetaCost and CostSensitiveClassifier were found to be the best performing among all the methods.
Abstract: Cheminformatics datasets used in classification problems, especially those related to biological or physicochemical properties, are often imbalanced. This presents a major challenge in development of in silico prediction models, as the traditional machine learning algorithms are known to work best on balanced datasets. The class imbalance introduces a bias in the performance of these algorithms due to their preference towards the majority class. Here, we present a comparison of the performance of seven different meta-classifiers for their ability to handle imbalanced datasets, whereby Random Forest is used as base-classifier. Four different datasets that are directly (cholestasis) or indirectly (via inhibition of organic anion transporting polypeptide 1B1 and 1B3) related to liver toxicity were chosen for this purpose. The imbalance ratio in these datasets ranges between 4:1 and 20:1 for negative and positive classes, respectively. Three different sets of molecular descriptors for model development were used, and their performance was assessed in 10-fold cross-validation and on an independent validation set. Stratified bagging, MetaCost and CostSensitiveClassifier were found to be the best performing among all the methods. While MetaCost and CostSensitiveClassifier provided better sensitivity values, Stratified Bagging resulted in high balanced accuracies.

39 citations


Journal ArticleDOI
TL;DR: This study has developed a new ligand-biased ensemble receptor docking method and composite scoring function which combine the use of lig and-based atomic property field (APF) method with receptor structure-based docking.
Abstract: Ligand docking to flexible protein molecules can be efficiently carried out through ensemble docking to multiple protein conformations, either from experimental X-ray structures or from in silico simulations. The success of ensemble docking often requires the careful selection of complementary protein conformations, through docking and scoring of known co-crystallized ligands. False positives, in which a ligand in a wrong pose achieves a better docking score than that of native pose, arise as additional protein conformations are added. In the current study, we developed a new ligand-biased ensemble receptor docking method and composite scoring function which combine the use of ligand-based atomic property field (APF) method with receptor structure-based docking. This method helps us to correctly dock 30 out of 36 ligands presented by the D3R docking challenge. For the six mis-docked ligands, the cognate receptor structures prove to be too different from the 40 available experimental Pocketome conformations used for docking and could be identified only by receptor sampling beyond experimentally explored conformational subspace.

39 citations


Journal ArticleDOI
TL;DR: Two similar quantum chemical based approaches based on the high accuracy calculation of standard reaction free energies and the subsequent determination of those pKa values via a linear free energy relationship are presented.
Abstract: Recent advances in the development of low-cost quantum chemical methods have made the prediction of conformational preferences and physicochemical properties of medium-sized drug-like molecules routinely feasible, with significant potential to advance drug discovery. In the context of the SAMPL6 challenge, macroscopic pKa values were blindly predicted for a set of 24 of such molecules. In this paper we present two similar quantum chemical based approaches based on the high accuracy calculation of standard reaction free energies and the subsequent determination of those pKa values via a linear free energy relationship. Both approaches use extensive conformational sampling and apply hybrid and double-hybrid density functional theory with continuum solvation to calculate free energies. The blindly calculated macroscopic pKa values were in excellent agreement with the experiment.

37 citations


Journal ArticleDOI
TL;DR: This work used UV absorbance-based pKa measurements to construct a high-quality experimental reference dataset of macroscopic pKas for the evaluation of computational pKa prediction methodologies that was utilized in the SAMPL6 pKa challenge.
Abstract: Determining the net charge and protonation states populated by a small molecule in an environment of interest or the cost of altering those protonation states upon transfer to another environment is a prerequisite for predicting its physicochemical and pharmaceutical properties. The environment of interest can be aqueous, an organic solvent, a protein binding site, or a lipid bilayer. Predicting the protonation state of a small molecule is essential to predicting its interactions with biological macromolecules using computational models. Incorrectly modeling the dominant protonation state, shifts in dominant protonation state, or the population of significant mixtures of protonation states can lead to large modeling errors that degrade the accuracy of physical modeling. Low accuracy hinders the use of physical modeling approaches for molecular design. For small molecules, the acid dissociation constant (pKa) is the primary quantity needed to determine the ionic states populated by a molecule in an aqueous solution at a given pH. As a part of SAMPL6 community challenge, we organized a blind pKa prediction component to assess the accuracy with which contemporary pKa prediction methods can predict this quantity, with the ultimate aim of assessing the expected impact on modeling errors this would induce. While a multitude of approaches for predicting pKa values currently exist, predicting the pKas of drug-like molecules can be difficult due to challenging properties such as multiple titratable sites, heterocycles, and tautomerization. For this challenge, we focused on set of 24 small molecules selected to resemble selective kinase inhibitors—an important class of therapeutics replete with titratable moieties. Using a Sirius T3 instrument that performs automated acid–base titrations, we used UV absorbance-based pKa measurements to construct a high-quality experimental reference dataset of macroscopic pKas for the evaluation of computational pKa prediction methodologies that was utilized in the SAMPL6 pKa challenge. For several compounds in which the microscopic protonation states associated with macroscopic pKas were ambiguous, we performed follow-up NMR experiments to disambiguate the microstates involved in the transition. This dataset provides a useful standard benchmark dataset for the evaluation of pKa prediction methodologies on kinase inhibitor-like compounds.

36 citations


Journal ArticleDOI
TL;DR: Estimates of ligand binding kinetics rates are obtained that are consistent across multiple simulations, with an average log10-scale standard deviation of 0.28 for on-rates and 0.56 for off-rates, which is well within an order of magnitude and far better than previously observed for previous applications of the WExplore algorithm.
Abstract: Interest in ligand binding kinetics has been growing rapidly, as it is being discovered in more and more systems that ligand residence time is the crucial factor governing drug efficacy. Many enhanced sampling methods have been developed with the goal of predicting ligand binding rates ( $$k_{\text {on}}$$ ) and/or ligand unbinding rates ( $$k_{\text {off}}$$ ) through explicit simulation of ligand binding pathways, and these methods work by very different mechanisms. Although there is not yet a blind challenge for ligand binding kinetics, here we take advantage of experimental measurements and rigorously computed benchmarks to compare estimates of $$K_D$$ calculated as the ratio of two rates: $$k_{\text {off}}/k_{\text {on}}$$ . These rates were determined using a new enhanced sampling method based on the weighted ensemble framework that we call “REVO”: Reweighting of Ensembles by Variance Optimization. This is a further development of the WExplore enhanced sampling method, in which trajectory cloning and merging steps are guided not by the definition of sampling regions, but by maximizing trajectory variance. Here we obtain estimates of $$k_{\text {on}}$$ and $$k_{\text {off}}$$ that are consistent across multiple simulations, with an average log10-scale standard deviation of 0.28 for on-rates and 0.56 for off-rates, which is well within an order of magnitude and far better than previously observed for previous applications of the WExplore algorithm. Our rank ordering of the three host–guest pairs agrees with the reference calculations, however our predicted $$\Delta G$$ values were systematically lower than the reference by an average of 4.2 kcal/mol. Using tree network visualizations of the trajectories in the REVO algorithm, and conformation space networks for each system, we analyze the results of our sampling, and hypothesize sources of discrepancy between our $$K_D$$ values and the reference. We also motivate the direct inclusion of $$k_{\text {on}}$$ and $$k_{\text {off}}$$ challenges in future iterations of SAMPL, to further develop the field of ligand binding kinetics prediction and modeling.

34 citations


Journal ArticleDOI
TL;DR: The generation of a multi-class random forest model is described to predict which, out of a list of the seven leading Cytochrome P450 isoforms, would be the major metabolising isoforms for a novel compound.
Abstract: In the development of novel pharmaceuticals, the knowledge of how many, and which, Cytochrome P450 isoforms are involved in the phase I metabolism of a compound is important. Potential problems can arise if a compound is metabolised predominantly by a single isoform in terms of drug–drug interactions or genetic polymorphisms that would lead to variations in exposure in the general population. Combined with models of regioselectivities of metabolism by each isoform, such a model would also aid in the prediction of the metabolites likely to be formed by P450-mediated metabolism. We describe the generation of a multi-class random forest model to predict which, out of a list of the seven leading Cytochrome P450 isoforms, would be the major metabolising isoforms for a novel compound. The model has a 76% success rate with a top-1 criterion and an 88% success rate for a top-2 criterion and shows significant enrichment over randomised models.

33 citations


Journal ArticleDOI
TL;DR: A novel and powerful computational tool for accurately uncovering the potential associations between drugs and diseases using high-dimensional and heterogeneous omics data as information sources and can be served as a useful bioinformatic tool for identifying the potential drug-disease associations and guiding drug repositioning.
Abstract: Finding the new related candidate diseases for known drugs provides an effective method for fast-speed and low-risk drug development. However, experimental identification of drug-disease associations is expensive and time-consuming. This motivates the need for developing in silico computational methods that can infer true drug-disease pairs with high confidence. In this study, we presented a novel and powerful computational tool, DR2DI, for accurately uncovering the potential associations between drugs and diseases using high-dimensional and heterogeneous omics data as information sources. Based on a unified and extended similarity kernel framework, DR2DI inferred the unknown relationships between drugs and diseases using Regularized Kernel Classifier. Importantly, DR2DI employed a semi-supervised and global learning algorithm which can be applied to uncover the diseases (drugs) associated with known and novel drugs (diseases). In silico global validation experiments showed that DR2DI significantly outperforms recent two approaches for predicting drug-disease associations. Detailed case studies further demonstrated that the therapeutic indications and side effects of drugs predicted by DR2DI could be validated by existing database records and literature, suggesting that DR2DI can be served as a useful bioinformatic tool for identifying the potential drug-disease associations and guiding drug repositioning. Our software and comparison codes are freely available at https://github.com/huayu1111/DR2DI .

31 citations


Journal ArticleDOI
TL;DR: Data is analyzed across non-homologous proteins in complex with small biological ligands to address observations made in inhibitor discovery projects: that proteins favor donating H-bonds to ligands and avoid using groups with both H- bond donor and acceptor capacity.
Abstract: Understanding how proteins encode ligand specificity is fascinating and similar in importance to deciphering the genetic code. For protein-ligand recognition, the combination of an almost infinite variety of interfacial shapes and patterns of chemical groups makes the problem especially challenging. Here we analyze data across non-homologous proteins in complex with small biological ligands to address observations made in our inhibitor discovery projects: that proteins favor donating H-bonds to ligands and avoid using groups with both H-bond donor and acceptor capacity. The resulting clear and significant chemical group matching preferences elucidate the code for protein-native ligand binding, similar to the dominant patterns found in nucleic acid base-pairing. On average, 90% of the keto and carboxylate oxygens occurring in the biological ligands formed direct H-bonds to the protein. A two-fold preference was found for protein atoms to act as H-bond donors and ligand atoms to act as acceptors, and 76% of all intermolecular H-bonds involved an amine donor. Together, the tight chemical and geometric constraints associated with satisfying donor groups generate a hydrogen-bonding lock that can be matched only by ligands bearing the right acceptor-rich key. Measuring an index of H-bond preference based on the observed chemical trends proved sufficient to predict other protein-ligand complexes and can be used to guide molecular design. The resulting Hbind and Protein Recognition Index software packages are being made available for rigorously defining intermolecular H-bonds and measuring the extent to which H-bonding patterns in a given complex match the preference key.

Journal ArticleDOI
TL;DR: The FSDAM blind prediction, relying on the normality assumption for the annihilation work distributions, ranked fairly well among the submitted blind predictions that were not adjusted with a linear corrections obtained from retrospective data on similar host guest systems.
Abstract: In this paper, we compute, by means of a non equilibrium alchemical technique, called fast switching double annihilation methods (FSDAM), the absolute standard dissociation free energies of the the octa acids host–guest systems in the SAMPL6 challenge initiative. FSDAM is based on the production of canonical configurations of the bound and unbound states via enhanced sampling and on the subsequent generation of hundreds of fast non-equilibrium ligand annihilation trajectories. The annihilation free energies of the ligand when bound to the receptor and in bulk solvent are obtained from the collection of work values using an estimate based on the Crooks theorem for driven non equilibrium processes. The FSDAM blind prediction, relying on the normality assumption for the annihilation work distributions, ranked fairly well among the submitted blind predictions that were not adjusted with a linear corrections obtained from retrospective data on similar host guest systems. Improved results for FSDAM can be obtained by post-processing the work data assuming mixtures of normal components.

Journal ArticleDOI
TL;DR: A hybrid QM and MM approach to predict pKa of small drug-like molecules in explicit solvent and shows that further optimization of the protocol needs to be done before this method can be used as an alternative approach to the well established approaches of a full quantum level or empirical pKa prediction methods.
Abstract: In this work we have developed a hybrid QM and MM approach to predict pKa of small drug-like molecules in explicit solvent. The gas phase free energy of deprotonation is calculated using the M06-2X density functional theory level with Pople basis sets. The solvation free energy difference of the acid and its conjugate base is calculated at MD level using thermodynamic integration. We applied this method to the 24 drug-like molecules in the SAMPL6 blind pKa prediction challenge. We achieved an overall RMSE of 2.4 pKa units in our prediction. Our results show that further optimization of the protocol needs to be done before this method can be used as an alternative approach to the well established approaches of a full quantum level or empirical pKa prediction methods.

Journal ArticleDOI
TL;DR: The AMOEBA protocol for determining absolute binding free energies benefitted from participation in the SAMPL6 host–guest blind challenge and the results suggest the implementation of the methodology in future host-guest calculations.
Abstract: As part of the SAMPL6 host–guest blind challenge, the AMOEBA force field was applied to calculate the absolute binding free energy for a cucurbit[8]uril host complexed with 14 diverse guests, ranging from small, rigid structures to drug molecules. The AMOEBA results from the initial submission prompted an investigation into aspects of the methodology and parameterization employed. Lessons learned from the blind challenge include: a double annihilation scheme (electrostatics and van der Waals) is needed to obtain proper sampling of guest conformations, annihilation of key torsion parameters of the guest are recommended for flexible guests, and a more thorough analysis of torsion parameters is warranted. When put in to practice with the AMOEBA model, the lessons learned improved the MUE from 2.63 to 1.20 kcal/mol and the RMSE from 3.62 to 1.68 kcal/mol, respectively. Overall, the AMOEBA protocol for determining absolute binding free energies benefitted from participation in the SAMPL6 host–guest blind challenge and the results suggest the implementation of the methodology in future host–guest calculations.

Journal ArticleDOI
TL;DR: The combined biophysical and quantum chemical studies in this study supported the results of previous experimental studies, thereby stipulating an action of resveratrol on mutant SOD1 and paving a way for the design of highly potent effective inhibitors against fALS affecting the mankind.
Abstract: Amyotrophic lateral sclerosis (ALS) is a neurodegenerative disease that has been associated with mutations in metalloenzyme superoxide dismutase (SOD1) causing protein structural destabilization and aggregation. However, the mechanistic action and the cure for the disease still remain obscure. Herein, we initially studied the conformational preferences of SOD1 protein structures upon substitution of Ala at Gly93 in comparison with that of wild type. Our results corroborated with the previous experimental studies on the aggregation and the destabilizing activity of mutant SOD1 protein G93A. On the therapeutic point of view, we computationally analyzed the influence of resveratrol, a natural polyphenol widely found in red wine on mutant SOD1 relative to wild type, using molecular docking studies. Further, FMO calculations were performed, using GAMESS to study the pair residual interaction on the wild type and mutant complex systems. Consequently, the resveratrol showed greater interaction with mutant than the wild type. Subsequently, we evaluated the conformational preferences of wild type and mutant complex systems, where the protein conformational structures of mutant that were earlier found to lose their conformational stability was regained, upon binding with resveratrol. Similar trend of results were found on the 2-D free energy landscapes of both the wild type and mutant systems. Hence, the combined biophysical and quantum chemical studies in our study supported the results of previous experimental studies, thereby stipulating an action of resveratrol on mutant SOD1 and paving a way for the design of highly potent effective inhibitors against fALS affecting the mankind.

Journal ArticleDOI
TL;DR: Several blinded binding free energies predictions were made for two congeneric series of Farsenoid X Receptor (FXR) inhibitors with a semi-automated alchemical free energy calculation workflow featuring FESetup and SOMD software tools.
Abstract: The Drug Design Data Resource (D3R) consortium organises blinded challenges to address the latest advances in computational methods for ligand pose prediction, affinity ranking, and free energy calculations. Within the context of the second D3R Grand Challenge several blinded binding free energies predictions were made for two congeneric series of Farsenoid X Receptor (FXR) inhibitors with a semi-automated alchemical free energy calculation workflow featuring FESetup and SOMD software tools. Reasonable performance was observed in retrospective analyses of literature datasets. Nevertheless, blinded predictions on the full D3R datasets were poor due to difficulties encountered with the ranking of compounds that vary in their net-charge. Performance increased for predictions that were restricted to subsets of compounds carrying the same net-charge. Disclosure of X-ray crystallography derived binding modes maintained or improved the correlation with experiment in a subsequent rounds of predictions. The best performing protocols on D3R set1 and set2 were comparable or superior to predictions made on the basis of analysis of literature structure activity relationships (SAR)s only, and comparable or slightly inferior, to the best submissions from other groups.

Journal ArticleDOI
TL;DR: The “embedded cluster reference interaction site model” (EC-RISM) integral equation theory is applied to the problem of predicting aqueous pKa values for drug-like molecules based on an ensemble of tautomers and it is concluded that these numbers are probably near the ultimate accuracy achievable with the simple 3-parameter model using a single or the two best-ranking conformations per tautomer or microstate.
Abstract: The “embedded cluster reference interaction site model” (EC-RISM) integral equation theory is applied to the problem of predicting aqueous pKa values for drug-like molecules based on an ensemble of tautomers. EC-RISM is based on self-consistent calculations of a solute’s electronic structure and the distribution function of surrounding water. Following-up on the workflow developed after the SAMPL5 challenge on cyclohexane-water distribution coefficients we extended and improved the methodology by taking into account exact electrostatic solute–solvent interactions taken from the wave function in solution. As before, the model is calibrated against Gibbs energies of hydration from the “Minnesota Solvation Database” and a public dataset of acidity constants of organic acids and bases by adjusting in total 4 parameters, among which only 3 are relevant for predicting pKa values. While the best-performing training model yields a root-mean-square error (RMSE) of 1 pK unit, the corresponding test set prediction on the full SAMPL6 dataset of macroscopic pKa values using the same level of theory exhibits slightly larger error (1.7 pK units) than the best test set model submitted (1.7 pK units for corresponding training set vs. test set performance of 1.6). Post-submission analysis revealed a number of physical optimization options regarding the numerical treatment of electrostatic interactions and conformational sampling. While the experimental test set data revealed after submission was not used for reparametrizing the methodology, the best physically optimized models consequentially result in RMSEs of 1.5 if only improved electrostatic interactions are considered and of 1.1 if, in addition, conformational sampling accounts for quantum-chemically derived rankings. We conclude that these numbers are probably near the ultimate accuracy achievable with the simple 3-parameter model using a single or the two best-ranking conformations per tautomer or microstate. Finally, relations of the present macrostate approach to microstate pKa results are discussed and some illustrative results for microstate populations are presented.

Journal ArticleDOI
TL;DR: Results are encouraging and show that bringing attention to the choice of the docking simulation fundamental components improves the results of the binding mode predictions.
Abstract: Molecular docking is a powerful tool in the field of computer-aided molecular design. In particular, it is the technique of choice for the prediction of a ligand pose within its target binding site. A multitude of docking methods is available nowadays, whose performance may vary depending on the data set. Therefore, some non-trivial choices should be made before starting a docking simulation. In the same framework, the selection of the target structure to use could be challenging, since the number of available experimental structures is increasing. Both issues have been explored within this work. The pose prediction of a pool of 36 compounds provided by D3R Grand Challenge 2 organizers was preceded by a pipeline to choose the best protein/docking-method couple for each blind ligand. An integrated benchmark approach including ligand shape comparison and cross-docking evaluations was implemented inside our DockBench software. The results are encouraging and show that bringing attention to the choice of the docking simulation fundamental components improves the results of the binding mode predictions.

Journal ArticleDOI
TL;DR: This work sought to use force matching to generate MM parameters for the SAMPL6 CB[8] host–guest binding challenge, classically compute binding free energies, and apply energetic end state corrections to obtain QM/MM binding free energy differences.
Abstract: Use of quantum mechanical/molecular mechanical (QM/MM) methods in binding free energy calculations, particularly in the SAMPL challenge, often fail to achieve improvement over standard additive (MM) force fields. Frequently, the implementation is through use of reference potentials, or the so-called “indirect approach”, and inherently relies on sufficient overlap existing between MM and QM/MM configurational spaces. This overlap is generally poor, particularly for the use of free energy perturbation to perform the MM to QM/MM free energy correction at the end states of interest (e.g., bound and unbound states). However, by utilizing MM parameters that best reproduce forces obtained at the desired QM level of theory, it is possible to lessen the configurational disparity between MM and QM/MM. To this end, we sought to use force matching to generate MM parameters for the SAMPL6 CB[8] host–guest binding challenge, classically compute binding free energies, and apply energetic end state corrections to obtain QM/MM binding free energy differences. For the standard set of 11 molecules and the bonus set (including three additional challenge molecules), error statistics, such as the root mean square deviation (RMSE) were moderately poor (5.5 and 5.4 kcal/mol). Correlation statistics, however, were in the top two for both standard and bonus set submissions ( $$R^{2}$$ of 0.42 and 0.26, $$\tau$$ of 0.64 and 0.47 respectively). High RMSE and moderate correlation strongly indicated the presence of systematic error. Identifiable issues were ameliorated for two of the guest molecules, resulting in a reduction of error and pointing to strong prospects for the future use of this methodology.

Journal ArticleDOI
TL;DR: In this article, the authors present a software pipeline for visualizing protein structures through VR, which combines VR visualization with fast algorithms for simulating intramolecular motions of protein flexibility, in an effort to further improve structure-led drug design.
Abstract: The ability to precisely visualize the atomic geometry of the interactions between a drug and its protein target in structural models is critical in predicting the correct modifications in previously identified inhibitors to create more effective next generation drugs. It is currently common practice among medicinal chemists while attempting the above to access the information contained in three-dimensional structures by using two-dimensional projections, which can preclude disclosure of useful features. A more accessible and intuitive visualization of the three-dimensional configuration of the atomic geometry in the models can be achieved through the implementation of immersive virtual reality (VR). While bespoke commercial VR suites are available, in this work, we present a freely available software pipeline for visualising protein structures through VR. New consumer hardware, such as the HTC Vive and the Oculus Rift utilized in this study, are available at reasonable prices. As an instructive example, we have combined VR visualization with fast algorithms for simulating intramolecular motions of protein flexibility, in an effort to further improve structure-led drug design by exposing molecular interactions that might be hidden in the less informative static models. This is a paradigmatic test case scenario for many similar applications in computer-aided molecular studies and design.

Journal ArticleDOI
TL;DR: It is shown that FEP/MD calculations hold predictive value and can nowadays be used in a high throughput mode in a lead optimization project provided that crystal structures of sufficiently high quality are available.
Abstract: Computer-aided drug design has become an integral part of drug discovery and development in the pharmaceutical and biotechnology industry, and is nowadays extensively used in the lead identification and lead optimization phases. The drug design data resource (D3R) organizes challenges against blinded experimental data to prospectively test computational methodologies as an opportunity for improved methods and algorithms to emerge. We participated in Grand Challenge 2 to predict the crystallographic poses of 36 Farnesoid X Receptor (FXR)-bound ligands and the relative binding affinities for two designated subsets of 18 and 15 FXR-bound ligands. Here, we present our methodology for pose and affinity predictions and its evaluation after the release of the experimental data. For predicting the crystallographic poses, we used docking and physics-based pose prediction methods guided by the binding poses of native ligands. For FXR ligands with known chemotypes in the PDB, we accurately predicted their binding modes, while for those with unknown chemotypes the predictions were more challenging. Our group ranked #1st (based on the median RMSD) out of 46 groups, which submitted complete entries for the binding pose prediction challenge. For the relative binding affinity prediction challenge, we performed free energy perturbation (FEP) calculations coupled with molecular dynamics (MD) simulations. FEP/MD calculations displayed a high success rate in identifying compounds with better or worse binding affinity than the reference (parent) compound. Our studies suggest that when ligands with chemical precedent are available in the literature, binding pose predictions using docking and physics-based methods are reliable; however, predictions are challenging for ligands with completely unknown chemotypes. We also show that FEP/MD calculations hold predictive value and can nowadays be used in a high throughput mode in a lead optimization project provided that crystal structures of sufficiently high quality are available.

Journal ArticleDOI
TL;DR: In this paper, quantum mechanical methods were used to predict the microscopic and macroscopic pKa values for a set of 24 molecules as a part of the SAMPL6 blind challenge.
Abstract: In this work, quantum mechanical methods were used to predict the microscopic and macroscopic pKa values for a set of 24 molecules as a part of the SAMPL6 blind challenge. The SMD solvation model was employed with M06-2X and different basis sets to evaluate three pKa calculation schemes (direct, vertical, and adiabatic). The adiabatic scheme is the most accurate approach (RMSE = 1.40 pKa units) and has high correlation (R2 = 0.93), with respect to experiment. This approach can be improved by applying a linear correction to yield an RMSE of 0.73 pKa units. Additionally, we consider including explicit solvent representation and multiple lower-energy conformations to improve the predictions for outliers. Adding three water molecules explicitly can reduce the error by 2–4 pKa units, with respect to experiment, whereas including multiple local minima conformations does not necessarily improve the pKa prediction.

Journal ArticleDOI
TL;DR: A new cross-docking pipeline suitable to dock a large library of molecules while taking advantage of multiple target protein structures is proposed and displayed not only decent pose prediction performance but also better virtual screening performance over several other methods.
Abstract: Pose prediction and virtual screening performance of a molecular docking method depend on the choice of protein structures used for docking. Multiple structures for a target protein are often used to take into account the receptor flexibility and problems associated with a single receptor structure. However, the use of multiple receptor structures is computationally expensive when docking a large library of small molecules. Here, we propose a new cross-docking pipeline suitable to dock a large library of molecules while taking advantage of multiple target protein structures. Our method involves the selection of a suitable receptor for each ligand in a screening library utilizing ligand 3D shape similarity with crystallographic ligands. We have prospectively evaluated our method in D3R Grand Challenge 2 and demonstrated that our cross-docking pipeline can achieve similar or better performance than using either single or multiple-receptor structures. Moreover, our method displayed not only decent pose prediction performance but also better virtual screening performance over several other methods.

Journal ArticleDOI
TL;DR: The results of this study showed that the compounds with known and unknown activities can be helpful to improve the performance of the combined Fisher and Laplacian based feature selection methods.
Abstract: Quantitative structure–activity relationship (QSAR) is an effective computational technique for drug design that relates the chemical structures of compounds to their biological activities. Feature selection is an important step in QSAR based drug design to select the most relevant descriptors. One of the most popular feature selection methods for classification problems is Fisher score which aim is to minimize the within-class distance and maximize the between-class distance. In this study, the properties of Fisher criterion were extended for QSAR models to define the new distance metrics based on the continuous activity values of compounds with known activities. Then, a semi-supervised feature selection method was proposed based on the combination of Fisher and Laplacian criteria which exploits both compounds with known and unknown activities to select the relevant descriptors. To demonstrate the efficiency of the proposed semi-supervised feature selection method in selecting the relevant descriptors, we applied the method and other feature selection methods on three QSAR data sets such as serine/threonine–protein kinase PLK3 inhibitors, ROCK inhibitors and phenol compounds. The results demonstrated that the QSAR models built on the selected descriptors by the proposed semi-supervised method have better performance than other models. This indicates the efficiency of the proposed method in selecting the relevant descriptors using the compounds with known and unknown activities. The results of this study showed that the compounds with known and unknown activities can be helpful to improve the performance of the combined Fisher and Laplacian based feature selection methods.

Journal ArticleDOI
TL;DR: HTMoL as mentioned in this paper is a plug-in-free, secure GPU-accelerated web application specifically designed to stream and visualize MD trajectory data on a web browser, which can also be used as a visualization interface to access MD trajectories generated on a high-performance computer center directly.
Abstract: Research on biology has seen significant advances with the use of molecular dynamics (MD) simulations. The MD methodology enables explanation and discovery of molecular mechanisms in a wide range of natural processes and biological systems. The need to readily share the ever-increasing amount of MD data has been hindered by the lack of specialized bioinformatic tools. The difficulty lies in the efficient management of the data, i.e., in sending and processing 3D information for its visualization. In this work, we present HTMoL, a plug-in-free, secure GPU-accelerated web application specifically designed to stream and visualize MD trajectory data on a web browser. Now, individual research labs can publish MD data on the Internet, or use HTMoL to profoundly improve scientific reports by including supplemental MD data in a journal publication. HTMoL can also be used as a visualization interface to access MD trajectories generated on a high-performance computer center directly. Furthermore, the HTMoL architecture can be leveraged with educational efforts to improve learning in the fields of biology, chemistry, and physics.

Journal ArticleDOI
TL;DR: A general model for predicting microscopic and macroscopic $$pK_a$$pKas using a Gaussian process regression trained using physical and chemical features of each ionizable group, along with good agreement in quantile–quantile plots, indicating it can predict its own accuracy.
Abstract: A variety of fields would benefit from accurate [Formula: see text] predictions, especially drug design due to the effect a change in ionization state can have on a molecule's physiochemical properties. Participants in the recent SAMPL6 blind challenge were asked to submit predictions for microscopic and macroscopic [Formula: see text]s of 24 drug like small molecules. We recently built a general model for predicting [Formula: see text]s using a Gaussian process regression trained using physical and chemical features of each ionizable group. Our pipeline takes a molecular graph and uses the OpenEye Toolkits to calculate features describing the removal of a proton. These features are fed into a Scikit-learn Gaussian process to predict microscopic [Formula: see text]s which are then used to analytically determine macroscopic [Formula: see text]s. Our Gaussian process is trained on a set of 2700 macroscopic [Formula: see text]s from monoprotic and select diprotic molecules. Here, we share our results for microscopic and macroscopic predictions in the SAMPL6 challenge. Overall, we ranked in the middle of the pack compared to other participants, but our fairly good agreement with experiment is still promising considering the challenge molecules are chemically diverse and often polyprotic while our training set is predominately monoprotic. Of particular importance to us when building this model was to include an uncertainty estimate based on the chemistry of the molecule that would reflect the likely accuracy of our prediction. Our model reports large uncertainties for the molecules that appear to have chemistry outside our domain of applicability, along with good agreement in quantile-quantile plots, indicating it can predict its own accuracy. The challenge highlighted a variety of means to improve our model, including adding more polyprotic molecules to our training set and more carefully considering what functional groups we do or do not identify as ionizable.

Journal ArticleDOI
TL;DR: This work reports the first direct QSPR modeling of equilibrium constants of tautomeric transformations (logKT) in different solvents and at different temperatures, which do not require intermediate assessment of acidity (basicity) constants for all tautomers, and performs well both in cross-validation and on two external test sets.
Abstract: We report the first direct QSPR modeling of equilibrium constants of tautomeric transformations (logK T ) in different solvents and at different temperatures, which do not require intermediate assessment of acidity (basicity) constants for all tautomeric forms. The key step of the modeling consisted in the merging of two tautomers in one sole molecular graph ("condensed reaction graph") which enables to compute molecular descriptors characterizing entire equilibrium. The support vector regression method was used to build the models. The training set consisted of 785 transformations belonging to 11 types of tautomeric reactions with equilibrium constants measured in different solvents and at different temperatures. The models obtained perform well both in cross-validation (Q2 = 0.81 RMSE = 0.7 logK T units) and on two external test sets. Benchmarking studies demonstrate that our models outperform results obtained with DFT B3LYP/6-311 ++ G(d,p) and ChemAxon Tautomerizer applicable only in water at room temperature.

Journal ArticleDOI
TL;DR: The results provide insights for further optimization of 1, an interesting lead compound for the development of new cruzain inhibitors, and combine results from MD, ab initio calculations, and MM/PBSA, a binding mode of 1 are proposed.
Abstract: Chagas disease remains a major health problem in South America, and throughout the world. The two drugs clinically available for its treatment have limited efficacy and cause serious adverse effects. Cruzain is an established therapeutic target of Trypanosoma cruzi, the protozoan that causes Chagas disease. Our group recently identified a competitive cruzain inhibitor (compound 1) with an IC50 = 15 µM that is also more synthetically accessible than the previously reported lead, compound 2. Prior studies, however, did not propose a binding mode for compound 1, hindering understanding of the structure–activity relationship and optimization. Here, the cruzain binding mode of compound 1 was investigated using docking, molecular dynamics (MD) simulations with ab initio derived parameters, ab initio calculations, and MM/PBSA. Two ligand protonation states and four binding poses were evaluated. A careful ligand parameterization method was employed to derive more physically meaningful parameters than those obtained by automated tools. The poses of unprotonated 1 were unstable in MD, showing large conformational changes and diffusing away from the binding site, whereas the protonated form showed higher stability and interaction with negatively charged residues Asp161 and Cys25. MM/PBSA also suggested that these two residues contribute favorably to binding of compound 1. By combining results from MD, ab initio calculations, and MM/PBSA, a binding mode of 1 is proposed. The results also provide insights for further optimization of 1, an interesting lead compound for the development of new cruzain inhibitors.

Journal ArticleDOI
TL;DR: This work provides a map of the structural elements relevant for the design of more selective ATP-competitive MTPK inhibitors, and suggests schemes suitable for a major understanding of these proteins as antituberculotic targets.
Abstract: In the last decades, human protein kinases (PKs) have been relevant as targets in the development of novel therapies against many diseases, but the study of Mycobacterium tuberculosis PKs (MTPKs) involved in tuberculosis pathogenesis began much later and has not yet reached an advanced stage of development. To increase knowledge of these enzymes, in this work we studied the structural features of MTPKs, with focus on their ATP-binding sites and their interactions with inhibitors. PknA, PknB, and PknG are the most studied MTPKs, which were previously crystallized; ATP-competitive inhibitors have been designed against them in the last decade. In the current work, reported PknA, PknB, and PknG inhibitors were extracted from literature and their orientations inside the ATP-binding site were proposed by using docking method. With this information, interaction fingerprints were elaborated, which reveal the more relevant residues for establishing chemical interactions with inhibitors. The non-crystallized MTPKs PknD, PknF, PknH, PknJ, PknK, and PknL were also studied; their three-dimensional structural models were developed by using homology modeling. The main characteristics of MTPK ATP-binding sites (the non-crystallized and crystallized MTPKs, including PknE and PknI) were accounted; schemes of the main polar and nonpolar groups inside their ATP-binding sites were constructed, which are suitable for a major understanding of these proteins as antituberculotic targets. These schemes could be used for establishing comparisons between MTPKs and human PKs in order to increase selectivity of MTPK inhibitors. As a key tool for guiding medicinal chemists interested in the design of novel MTPK inhibitors, our work provides a map of the structural elements relevant for the design of more selective ATP-competitive MTPK inhibitors.

Journal ArticleDOI
TL;DR: The absolute binding free energies of tetra-methylated octa-acids host–guest systems as a part of the SAMPL6 blind challenge are calculated using two different free energy simulation methods, i.e., the umbrella sampling (US) and double decoupling method (DDM).
Abstract: We calculate the absolute binding free energies of tetra-methylated octa-acids host–guest systems as a part of the SAMPL6 blind challenge (receipt ID vq30p). We employed two different free energy simulation methods, i.e., the umbrella sampling (US) and double decoupling method (DDM). The US method was used with the weighted histogram analysis method (WHAM) (US-WHAM scheme). In the DDM scheme, Hamiltonian replica-exchange method (HREM) was combined with the Bennett acceptance ratio (BAR) (HREM-BAR scheme). We obtained initial binding poses via molecular docking using GalaxyDock-HG program, which is developed for the SAMPL challenge. The root mean square deviation (RMSD) and the mean absolute deviations (MAD) using US-WHAM scheme were 1.33 and 1.02 kcal/mol, respectively. The MAD was the top among all submissions, however the correlation with respect to experiment was unexceptional. While the RMSD and MAD via HREM-BAR scheme were greater than US-WHAM scheme, (i.e., 2.09 and 1.76 kcal/mol), their correlations were slightly better than US-WHAM. The correlation between the two methods was high. Further discussion on the DDM method can be found in a companion paper by Han et al. (receipt ID 3z83m) in the same issue.