scispace - formally typeset
Search or ask a question

Showing papers in "Journal of Computer-aided Molecular Design in 2002"


Journal ArticleDOI
TL;DR: The results show that this consensus scoring function, X-CSCORE, improves the docking accuracy considerably when compared to the conventional force field computation used for molecular docking.
Abstract: Summary New empirical scoring functions have been developed to estimate the binding affinity of a given protein-ligand complex with known three-dimensional structure. These scoring functions include terms accounting for van der Waals interaction, hydrogen bonding, deformation penalty, and hydrophobic effect. A special feature is that three different algorithms have been implemented to calculate the hydrophobic effect term, which results in three parallel scoring functions. All three scoring functions are calibrated through multivariate regression analysis of a set of 200 protein-ligand complexes and they reproduce the binding free energies of the entire training set with standard deviations of 2.2 kcal/mol, 2.1 kcal/mol, and 2.0 kcal/mol, respectively. These three scoring functions are further combined into a consensus scoring function, X-CSCORE. When tested on an independent set of 30 protein-ligand complexes, X-CSCORE is able to predict their binding free energies with a standard deviation of 2.2 kcal/mol. The potential application of X-CSCORE to molecular docking is also investigated. Our results show that this consensus scoring function improves the docking accuracy considerably when compared to the conventional force field computation used for molecular docking.

1,074 citations


Journal ArticleDOI
TL;DR: An overview of current docking techniques is presented with a description of applications including single docking experiments and the virtual screening of databases.
Abstract: The binding of small molecule ligands to large protein targets is central to numerous biological processes. The accurate prediction of the binding modes between the ligand and protein, (the docking problem) is of fundamental importance in modern structure-based drug design. An overview of current docking techniques is presented with a description of applications including single docking experiments and the virtual screening of databases.

633 citations


Journal ArticleDOI
TL;DR: It is demonstrated that QSAR models built and validated with the approach have statistically better predictive power than models generated with either random or activity ranking based selection of the training and test sets.
Abstract: One of the most important characteristics of Quantitative Structure Activity Relashionships (QSAR) models is their predictive power. The latter can be defined as the ability of a model to predict accurately the target property (e.g., biological activity) of compounds that were not used for model development. We suggest that this goal can be achieved by rational division of an experimental SAR dataset into the training and test set, which are used for model development and validation, respectively. Given that all compounds are represented by points in multidimensional descriptor space, we argue that training and test sets must satisfy the following criteria: (i) Representative points of the test set must be close to those of the training set; (ii) Representative points of the training set must be close to representative points of the test set; (iii) Training set must be diverse. For quantitative description of these criteria, we use molecular dataset diversity indices introduced recently (Golbraikh, A., J. Chem. Inf. Comput. Sci., 40 (2000) 414-425). For rational division of a dataset into the training and test sets, we use three closely related sphere-exclusion algorithms. Using several experimental datasets, we demonstrate that QSAR models built and validated with our approach have statistically better predictive power than models generated with either random or activity ranking based selection of the training and test sets. We suggest that rational approaches to the selection of training and test sets based on diversity principles should be used routinely in all QSAR modeling research.

466 citations


Journal ArticleDOI
John W. Raymond1, Peter Willett1
TL;DR: A classification and a review of the many MCS algorithms, both exact and approximate, that have been described in the literature, and recommendations regarding their applicability to typical chemoinformatics tasks are made.
Abstract: The maximum common subgraph (MCS) problem has become increasingly important in those aspects of chemoinformatics that involve the matching of 2D or 3D chemical structures. This paper provides a classification and a review of the many MCS algorithms, both exact and approximate, that have been described in the literature, and makes recommendations regarding their applicability to typical chemoinformatics tasks.

459 citations


Journal ArticleDOI
TL;DR: In this article, the authors examined the current trends in lead discovery by comparing MW (molecular weight), LogP (octanol/water partition coefficient), and LogSw (intrinsic water solubility), for the following categories: 62 leads and 75 drugs, as indexed in MDDR; and compounds indexed in medicinal chemistry journals, categorized according to their biological activity.
Abstract: The new drug discovery paradigm is based on high-throughput technologies, both with respect to synthesis and screening. The progression HTS hits → lead series → candidate drug → marketed drug appears to indicate that the probability of reaching launched status is one in a million. This has shifted the focus from good quality candidate drugs to good quality leads. We examined the current trends in lead discovery by comparing MW (molecular weight), LogP (octanol/water partition coefficient, estimated by Kowwin [17]) and LogSw (intrinsic water solubility, estimated by Wskowwin [18]) for the following categories: 62 leads and 75 drugs [11]; compounds in the development phase (I, II, III and launched), as indexed in MDDR; and compounds indexed in medicinal chemistry journals [ref. 20], categorized according to their biological activity. Comparing the distribution of the above properties, the 62 lead structures show the lowest median with respect to MW (smaller) and LogP (less hydrophobic), and the highest median with respect to LogSw (more soluble). By contrast, over 50% of the medicinal chemistry compounds with activities above 1 nanomolar have MW > 425, LogP > 4.25 and LogSw < -4.75, indicating that the reported active compounds are larger, more hydrophobic and less soluble when compared to time-tested quality leads. In the MDDR set, a progressive constraint to reduce MW and LogP, and to increase LogSw, can be observed when examining trends in the developmental sequence: phase I, II, III and launched drugs. These trends indicate that other properties besides binding affinity, e.g., solubility and hydrophobicity, need to be considered when choosing the appropriate leads.

218 citations


Journal ArticleDOI
TL;DR: From the analysis of data, it is estimated that the barrier to binding, due to the loss of rigid-body entropy, is 15–20 kJ/mol, i.e. around 3 orders of magnitude in affinity at 298 K.
Abstract: When a small molecule binds to a protein, it loses a significant amount of rigid body translational and rotational entropy. Estimates of the associated energy barrier vary widely in the literature yet accurate estimates are important in the interpretation of results from fragment-based drug discovery techniques. This paper describes an analysis that allows the estimation of the rigid body entropy barrier from the increase in binding affinities that results when two fragments of known affinity and known binding mode are joined together. The paper reviews the relatively rare number of examples where good quality data is available. From the analysis of this data, we estimate that the barrier to binding, due to the loss of rigid-body entropy, is 15-20 kJ/mol, i.e. around 3 orders of magnitude in affinity at 298 K. This large barrier explains why it is comparatively rare to observe multiple fragments binding to non-overlapping adjacent sites in enzymes. The barrier is also consistent with medicinal chemistry experience where small changes in the critical binding regions of ligands are often poorly tolerated by enzymes.

186 citations


Journal ArticleDOI
TL;DR: FlexX-Pharm, an extended version of the flexible docking tool FlexX, allows the incorporation of information about important characteristics of protein-ligand binding modes into a docking calculation by applying a series of look-ahead checks during the flexible construction of ligand fragments within the active site.
Abstract: FLEXX-PHARM, an extended version of the flexible docking tool FLEXX, allows the incorporation of information about important characteristics of protein-ligand binding modes into a docking calculation. This information is introduced as a simple set of constraints derived from receptor-based type pharmacophore features. The constraints are determined by selected FLEXX interactions and inclusion volumes in the receptor active site. They guide the docking process to produce a set of docking solutions with particular properties. By applying a series of look-ahead checks during the flexible construction of ligand fragments within the active site, FLEXX-PHARM determines which partially built docking solutions can potentially obey the constraints. Solutions that will not obey the constraints are deleted as early as possible, often decreasing the calculation time and enabling new docking solutions to emerge. FLEXX-PHARM was evaluated on various individual protein-ligand complexes where the top docking solutions generated by FLEXX had high root mean square deviations (RMSD) from the experimentally observed binding modes. FLEXX-PHARM showed an improvement in the RMSD of the top solutions in most cases, along with a reduction in run time. We also tested FLEXX-PHARM as a database screening tool on a small dataset of molecules for three target proteins. In two cases, FLEXX-PHARM missed one or two of the active molecules due to the constraints selected. However, in general FLEXX-PHARM maintained or improved the enrichment shown with FLEXX, while completing the screen in considerably less run time.

173 citations


Journal ArticleDOI
TL;DR: AstexViewer™ is a Java molecular graphics program that can be used for visualisation in many aspects of structure-based drug design and as part of a structure based design platform.
Abstract: AstexViewer is a Java molecular graphics program that can be used for visualisation in many aspects of structure-based drug design. This paper describes its functionality, implementation and examples of its use. The program can run as an Applet in a web browser allowing structures to be displayed without installing additional software. Applications of its use are described for visualisation and as part of a structure based design platform. The software is being made freely available to the community and may be downloaded from http://www.astex-technology.com/AstexViewer.

142 citations


Journal ArticleDOI
TL;DR: Three commercially available pharmacophore generation programs, Catalyst/HipHop, DISCO and GASP, were compared on their ability to generate known pharmacophores deduced from protein-ligand complexes extracted from the Protein Data Bank and results show that GASp and Catalyst outperformed DISCO at reproducing the five target Pharmacophores.
Abstract: Three commercially available pharmacophore generation programs, Catalyst/HipHop, DISCO and GASP, were compared on their ability to generate known pharmacophores deduced from protein-ligand complexes extracted from the Protein Data Bank. Five different protein families were included Thrombin, Cyclin Dependent Kinase 2, Dihydrofolate Reductase, HIV Reverse Transcriptase and Thermolysin. Target pharmacophores were defined through visual analysis of the data sets. The pharmacophore models produced were evaluated qualitatively through visual inspection and according to their ability to generate the target pharmacophores. Our results show that GASP and Catalyst outperformed DISCO at reproducing the five target pharmacophores.

121 citations


Journal ArticleDOI
TL;DR: This review outlines the strategies by which both macrocyclic cyclic peptides and cyclic dipeptides or diketopiperazines have been synthesised in combinatorial libraries, thereby justifying their inclusion as privileged structures.
Abstract: Head-to-tail cyclic peptides have been reported to bind to multiple, unrelated classes of receptor with high affinity. They may therefore be considered to be privileged structures. This review outlines the strategies by which both macrocyclic cyclic peptides and cyclic dipeptides or diketopiperazines have been synthesised in combinatorial libraries. It also briefly outlines some of the biological applications of these molecules, thereby justifying their inclusion as privileged structures.

104 citations


Journal ArticleDOI
TL;DR: An evaluation of both graph- based and fingerprint-based measures of structural similarity, when used for virtual screening of sets of 2D molecules drawn from the MDDR and ID Alert databases, suggests that graph-based approaches provide an effective complement to existing fingerprint- based approaches to virtual screening.
Abstract: This paper reports an evaluation of both graph-based and fingerprint-based measures of structural similarity, when used for virtual screening of sets of 2D molecules drawn from the MDDR and ID Alert databases. The graph-based measures employ a new maximum common edge subgraph isomorphism algorithm, called RASCAL, with several similarity coefficients described previously for quantifying the similarity between pairs of graphs. The effectiveness of these graph-based searches is compared with that resulting from similarity searches using BCI, Daylight and Unity 2D fingerprints. Our results suggest that graph-based approaches provide an effective complement to existing fingerprint-based approaches to virtual screening.

Journal ArticleDOI
TL;DR: A review of tools for visualizing and modeling in vitro absorption, distribution, metabolism, excretion and toxicity (ADME/TOX) data can be found in this paper.
Abstract: With the continual pressure to ensure follow-up molecules to billion dollar blockbuster drugs, there is a hurdle in profitability and growth for pharmaceutical companies in the next decades With each success and failure we increasingly appreciate that a key to the success of synthesized molecules through the research and development process is the possession of drug-like properties These properties include an adequate bioactivity as well as adequate solubility, an ability to cross critical membranes (intestinal and sometimes blood-brain barrier), reasonable metabolic stability and of course safety in humans Dependent on the therapeutic area being investigated it might also be desirable to avoid certain enzymes or transporters to circumvent potential drug-drug interactions It may also be important to limit the induction of these same proteins that can result in further toxicities We have clearly moved the assessment of in vitro absorption, distribution, metabolism, excretion and toxicity (ADME/TOX) parameters much earlier in the discovery organization than a decade ago with the inclusion of higher throughput systems We are also now faced with huge amounts of ADME/TOX data for each molecule that need interpretation and also provide a valuable resource for generating predictive computational models for future drug discovery The present review aims to show what tools exist today for visualizing and modeling ADME/TOX data, what tools need to be developed, and how both the present and future tools are valuable for virtual filtering using ADME/TOX and bioactivity properties in parallel as a viable addition to present practices

Journal ArticleDOI
TL;DR: It is shown that 880 compounds from Prestwick chemical library represent a very diverse pharmacological space, and on this basis, the selection of compounds with required and without unwanted properties is possible.
Abstract: Due to the directed way of testing chemical compounds' in drug research and development many projects fail because serious adverse effects and toxicity are discovered too late, and many existing prospective activities remain unstudied. Evaluation of the general biological potential of molecules is possible using a computer program PASS that predicts more than 780 pharmacological effects, mechanisms of action, mutagenicity, carcinogenicity, etc. on the basis of structural formulae of compounds, with average accuracy ∼85%. PASS applications to both databases of available samples included hundreds of thousands compounds, and small collections of compounds synthesized by separate medicinal chemists are described. It is shown that 880 compounds from Prestwick chemical library represent a very diverse pharmacological space. New activities can be found in existing compounds by prediction. Therefore, on this basis, the selection of compounds with required and without unwanted properties is possible. Even when PASS cannot predict very new activities, it may recognize some unwanted actions at the early stage of R&D, providing the medicinal chemist with the means to increase the efficiency of projects.

Journal ArticleDOI
TL;DR: This paper presents an alternative method of protein template and ligand interaction point design that identifies the most favorable points for making hydrophobic and hydrogen–bond interactions by using a knowledge base.
Abstract: For the successful identification and docking of new ligands to a protein target by virtual screening, the essential features of the protein and ligand surfaces must be captured and distilled in an efficient representation Since the running time for docking increases exponentially with the number of points representing the protein and each ligand candidate, it is important to place these points where the best interactions can be made between the protein and the ligand This definition of favorable points of interaction can also guide protein structure-based ligand design, which typically focuses on which chemical groups provide the most energetically favorable contacts In this paper, we present an alternative method of protein template and ligand interaction point design that identifies the most favorable points for making hydrophobic and hydrogen–bond interactions by using a knowledge base The knowledge-based protein and ligand representations have been incorporated in version 20 of SLIDE and resulted in dockings closer to the crystal structure orientations when screening a set of 57 known thrombin and glutathione S–transferase (GST) ligands against the apo structures of these proteins There was also improved scoring enrichment of the dockings, meaning better differentiation between the chemically diverse known ligands and a ∼15,000-molecule dataset of randomly-chosen small organic molecules This approach for identifying the most important points of interaction between proteins and their ligands can equally well be used in other docking and design techniques While much recent effort has focused on improving scoring functions for protein-ligand docking, our results indicate that improving the representation of the chemistry of proteins and their ligands is another avenue that can lead to significant improvements in the identification, docking, and scoring of ligands

Journal ArticleDOI
TL;DR: It is shown that the Skelgen algorithm generates representatives of many inhibitor classes within a very short time and that the new similarity measure is useful for comparing and clustering designed structures.
Abstract: The de novo design program Skelgen has been used to design inhibitor structures for four targets of pharmaceutical interest. The designed structures are compared to modeled binding modes of known inhibitors (i) visually and (ii) by means of a novel similarity measure considering the size and spatial proximity of the maximum common substructure of two small molecules. It is shown that the Skelgen algorithm generates representatives of many inhibitor classes within a very short time and that the new similarity measure is useful for comparing and clustering designed structures. The results demonstrate the necessity of properly defining search constraints in practical applications of de novo design.

Journal ArticleDOI
TL;DR: It is expected that concepts from receptor-based 3D QSAR will be valuable tools for the analysis of high-throughput screening as well as virtual screening data.
Abstract: One of the major challenges in computational approaches to drug design is the accurate prediction of the binding affinity of novel biomolecules. In the present study an automated procedure which combines docking and 3D-QSAR methods was applied to several drug targets. The developed receptor-based 3D-QSAR methodology was tested on several sets of ligands for which the three-dimensional structure of the target protein has been solved – namely estrogen receptor, acetylcholine esterase and protein-tyrosine-phosphatase 1B. The molecular alignments of the studied ligands were determined using the docking program AutoDock and were compared with the X-ray structures of the corresponding protein-ligand complexes. The automatically generated protein-based ligand alignment obtained was subsequently taken as basis for a comparative field analysis applying the GRID/GOLPE approach. Using GRID interaction fields and applying variable selection procedures, highly predictive models were obtained. It is expected that concepts from receptor-based 3D QSAR will be valuable tools for the analysis of high-throughput screening as well as virtual screening data

Journal ArticleDOI
TL;DR: A new genetic algorithm that has been tailored to meet the demands of de novo drug design, i.e. efficient optimization based on small training sets that are analyzed in only a small number of design cycles is proposed.
Abstract: The design of molecules with desired properties is still a challenge because of the largely unpredictable end results. Computational methods can be used to assist and speed up this process. In particular, genetic algorithms have proved to be powerful tools with a wide range of applications, e.g. in the field of drug development. Here, we propose a new genetic algorithm that has been tailored to meet the demands of de novo drug design, i.e. efficient optimization based on small training sets that are analyzed in only a small number of design cycles. The efficiency of the design algorithm was demonstrated in the context of several different applications. First, RNA molecules were optimized with respect to folding energy. Second, a spinglass was optimized as a model system for the optimization of multiletter alphabet biopolymers such as peptides. Finally, the feasibility of the computer-assisted molecular design approach was demonstrated for the de novo construction of peptidic thrombin inhibitors using an iterative process of 4 design cycles of computer-guided optimization. Synthesis and experimental fitness determination of only 600 different compounds from a virtual library of more than 1017 molecules was necessary to achieve this goal.

Journal ArticleDOI
TL;DR: The data supports the use of a single protein structure for virtual screening with GOLD in some applications involving induced fit effects, although care must be taken to identify the protein structure that performs best against a wide variety of ligands.
Abstract: Many proteins undergo small side chain or even backbone movements on binding of different ligands into the same protein structure. This is known as induced fit and is potentially problematic for virtual screening of databases against protein targets. In this report we investigate the limits of the rigid protein approximation used by the docking program, GOLD, through cross-docking using protein structures of influenza neuraminidase. Neuraminidase is known to exhibit small but significant induced fit effects on ligand binding. Some neuraminidase crystal structures caused concern due to the bound ligand conformation and GOLD performed poorly on these complexes. A `clean' set, which contained unique, unambiguous complexes, was defined. For this set, the lowest energy structure was correctly docked (i.e. RMSD < 1.5 A away from the crystal reference structure) in 84% of proteins, and the most promiscuous protein (1mwe) was able to dock all 15 ligands accurately including those that normally required an induced fit movement. This is considerably better than the 70% success rate seen with GOLD against general validation sets. Inclusion of specific water molecules involved in water-mediated hydrogen bonds did not significantly improve the docking performance for ligands that formed water-mediated contacts but it did prevent docking of ligands that displaced these waters. Our data supports the use of a single protein structure for virtual screening with GOLD in some applications involving induced fit effects, although care must be taken to identify the protein structure that performs best against a wide variety of ligands. The performance of GOLD was significantly better than the GOLD implementation of ChemScore and the reasons for this are discussed. Overall, GOLD has shown itself to be an extremely good, robust docking program for this system.



Journal ArticleDOI
TL;DR: It is shown that in the case where a given loop from two different GPCRs has approximately the same length and some degree of sequence identity, the fold adopted by the loops can be similar, and in such special cases homology modeling might be used to obtain initial structures of these loops.
Abstract: Some key concerns raised by molecular modeling and computational simulation of functional mechanisms for membrane proteins are discussed and illustrated for members of the family of G protein coupled receptors (GPCRs). Of particular importance are issues related to the modeling and computational treatment of loop regions. These are demonstrated here with results from different levels of computational simulations applied to the structures of rhodopsin and a model of the 5-HT2A serotonin receptor, 5-HT2AR. First, comparative Molecular Dynamics (MD) simulations are reported for rhodopsin in vacuum and embedded in an explicit representation of the membrane and water environment. It is shown that in spite of a partial accounting of solvent screening effects by neutralization of charged side chains, vacuum MD simulations can lead to severe distortions of the loop structures. The primary source of the distortion appears to be formation of artifactual H-bonds, as has been repeatedly observed in vacuum simulations. To address such shortcomings, a recently proposed approach that has been developed for calculating the structure of segments that connect elements of secondary structure with known coordinates, is applied to 5-HT2AR to obtain an initial representation of the loops connecting the transmembrane (TM) helices. The approach consists of a simulated annealing combined with biased scaled collective variables Monte Carlo technique, and is applied to loops connecting the TM segments on both the extra-cellular and the cytoplasmic sides of the receptor. Although this initial calculation treats the loops as independent structural entities, the final structure exhibits a number of interloop interactions that may have functional significance. Finally, it is shown here that in the case where a given loop from two different GPCRs (here rhodopsin and 5-HT2AR) has approximately the same length and some degree of sequence identity, the fold adopted by the loops can be similar. Thus, in such special cases homology modeling might be used to obtain initial structures of these loops. Notably, however, all other loops in these two receptors appear to be very different in sequence and structure, so that their conformations can be found reliably only by ab initio, energy based methods and not by homology modeling.

Journal ArticleDOI
TL;DR: The basic principles of hierarchical modelling by means of PCA and PLS are reviewed and one objective of the paper is to disseminate this concept to a broader QSAR audience.
Abstract: Multivariate PCA- and PLS-models involving many variables are often difficult to interpret, because plots and lists of loadings, coefficients, VIPs, etc, rapidly become messy and hard to overview. ...

Journal ArticleDOI
TL;DR: Binding to largely hydrophobic sites, such as the active site of p38, was significantly improved by introducing a correction factor selectively affecting only carbon and hydrogen energy grids, thus, providing an effective, although approximate, treatment of solvation.
Abstract: Protein kinases are an important class of enzymes controlling virtually all cellular signaling pathways. Consequently, selective inhibitors of protein kinases have attracted significant interest as potential new drugs for many diseases. Computational methods, including molecular docking, have increasingly been used in the inhibitor design process [1]. We have considered several docking packages in order to strengthen our kinase inhibitor work with computational capabilities. In our experience, AutoDock offered a reasonable combination of accuracy and speed, as opposed to methods that specialize either in fast database searches or detailed and computationally intensive calculations. However, AutoDock did not perform well in cases where extensive hydrophobic contacts were involved, such as docking of SB203580 to its target protein kinase p38. Another shortcoming was a hydrogen bonding energy function, which underestimated the attraction component and, thus, did not allow for sufficiently accurate modeling of the key hydrogen bonds in the kinase-inhibitor complexes. We have modified the parameter set used to model hydrogen bonds, which increased the accuracy of AutoDock and appeared to be generally applicable to many kinase-inhibitor pairs without customization. Binding to largely hydrophobic sites, such as the active site of p38, was significantly improved by introducing a correction factor selectively affecting only carbon and hydrogen energy grids, thus, providing an effective, although approximate, treatment of solvation.

Journal ArticleDOI
TL;DR: The use of alignment-independent descriptors for obtaining qualitative and quantitative predictions of the competitive inhibition of CYP2C9 on a serie of highly structurally diverse compounds and the 3D-QSAR model will be used during lead optimization to avoid chemistry that result in inhibition of cytochrome P450 2C9.
Abstract: Discriminant and quantitative PLS analysis of competitive CYP2C9 inhibitors versus non-inhibitors using alignment independent GRIND descriptors

Journal ArticleDOI
TL;DR: A homology-based model of the 5-HT2A receptor was produced utilizing an activated form of the bovine rhodopsin (Rh) crystal structure, and the final binding orientations were observed to be compatible with much of the data acquired through both diversified ligand design and site directed mutagenesis.
Abstract: A homology-based model of the 5-HT2A receptor was produced utilizing an activated form of the bovine rhodopsin (Rh) crystal structure. In silico activation of the Rh structure was accomplished by isomerization of the 11-cis-retinal (1) chromophore, followed by constrained molecular dynamics to relax the resultant high energy structure. The activated form of Rh was then used as a structural template for development of a human 5-HT2A receptor model. Both the 5-HT2A receptor and Rh are members of the G-protein coupled receptor (GPCR) super-family. The resulting homology model of the receptor was then used for docking studies of compounds representing a cross-section of structural classes that activate the 5-HT2A receptor, including ergolines, tryptamines, and amphetamines. The ligand/receptor complexes that ensued were refined and the final binding orientations were observed to be compatible with much of the data acquired through both diversified ligand design and site directed mutagenesis.

Journal ArticleDOI
TL;DR: The proposed EpiDock method is fully automated and fast enough to scan a viral genome in less than an hour on a parallel computing architecture and can be used to predict potential T-cell epitopes from viral genomes and roughly predict still unknown peptide binding motifs for novel class I MHC alleles.
Abstract: Summary A new computational method (EpiDock) is proposed for predicting peptide binding to class I MHC proteins, from the amino acid sequence of any protein of immunological interest. Starting from the primary structure of the target protein, individual three-dimensional structures of all possible MHC-peptide (8-, 9- and 10-mers) complexes are obtained by homology modelling. A free energy scoring function (Fresno) is then used to predict the absolute binding free energy of all possible peptides to the class I MHC restriction protein. Assuming that immunodominant epitopes are usually found among the top MHC binders, the method can thus be applied to predict the location of immunogenic peptides on the sequence of the protein target. When applied to the prediction of HLA-A ∗ 0201restricted T-cell epitopes from the Hepatitis B virus, EpiDock was able to recover 92% of known high affinity binders and 80% of known epitopes within a filtered subset of all possible nonapeptides corresponding to about one tenth of the full theoretical list. The proposed method is fully automated and fast enough to scan a viral genome in less than an hour on a parallel computing architecture. As it requires very few starting experimental data, EpiDock can be used: (i) to predict potential T-cell epitopes from viral genomes (ii) to roughly predict still unknown peptide binding motifs for novel class I MHC alleles.

Journal ArticleDOI
TL;DR: This paper discusses a class of algorithms for subset selection rooted in the principles of multiobjective optimization, and employs an objective function that encodes all of the desired selection criteria, and then uses a simulated annealing or evolutionary approach to identify the optimal subset from among the vast number of possibilities.
Abstract: Combinatorial chemistry and high-throughput screening have caused a fundamental shift in the way chemists contemplate experiments. Designing a combinatorial library is a controversial art that involves a heterogeneous mix of chemistry, mathematics, economics, experience, and intuition. Although there seems to be little agreement as to what constitutes an ideal library, one thing is certain: only one property or measure seldom defines the quality of the design. In most real-world applications, a good experiment requires the simultaneous optimization of several, often conflicting, design objectives, some of which may be vague and uncertain. In this paper, we discuss a class of algorithms for subset selection rooted in the principles of multiobjective optimization. Our approach is to employ an objective function that encodes all of the desired selection criteria, and then use a simulated annealing or evolutionary approach to identify the optimal (or a nearly optimal) subset from among the vast number of possibilities. Many design criteria can be accommodated, including diversity, similarity to known actives, predicted activity and/or selectivity determined by quantitative structure-activity relationship (QSAR) models or receptor binding models, enforcement of certain property distributions, reagent cost and availability, and many others. The method is robust, convergent, and extensible, offers the user full control over the relative significance of the various objectives in the final design, and permits the simultaneous selection of compounds from multiple libraries in full- or sparse-array format.

Journal ArticleDOI
TL;DR: Ab initoand density functional theory methods were used to study the tautomers of barbituric acid in the gas phase and in a polar medium and the ability of maximum hardness principle to predict the stable tautomer has been studied.
Abstract: Ab initoand density functional theory (DFT) methods were used to study the tautomers of barbituric acid in the gas phase and in a polar medium. In the gas phase, the tautomers were optimized at the HF/6-31G*, MP2/6-31G*and B3LYP/6-31G*, B3PW91/6-31G*levels of theory. The self-consistent reaction field theory (SCRF) at the HF/6-31G*level of theory has been used to optimize the tautomers in a polar medium. The relative stability of the tautomers was compared in the gaseous and polar mediums. The ability of maximum hardness principle to predict the stable tautomer has been studied. The 13C-NMR chemical shift for carbon atoms in the tautomers was calculated and the results are discussed.

Journal ArticleDOI
TL;DR: The 3D-QSAR CoMSIA technique was applied to a set of 458 peptides binding to the five most widespread HLA-A2-like alleles, allowing an A2-supermotif to be identified based on common favoured and disfavoured areas.
Abstract: The 3D-QSAR CoMSIA technique was applied to a set of 458 peptides binding to the five most widespread HLA-A2-like alleles: A*0201, A*0202, A*0203, A*0206 and A*6802. Models comprising the main physicochemical properties (steric bulk, electron density, hydrophobicity and hydrogen-bond formation abilities) were obtained with acceptable predictivity (q2 ranged from 0.385 to 0.683). The use of coefficient contour maps allowed an A2-supermotif to be identified based on common favoured and disfavoured areas. The CoMSIA definition for the best HLA-A2 binder is as follows: hydrophobic aromatic amino acid at position 1; hydrophobic bulky side chains at positions 2, 6 and 9; non-hydrogen-bond-forming amino acids at position 3; small aliphatic hydrogen-bond donors at position 4; aliphatic amino acids at position 5; small aliphatic side chains at position 7; and small aliphatic hydrophilic and hydrogen-bond forming amino acids at position 8.

Journal ArticleDOI
TL;DR: A new method is presented that docks molecular fragments to a rigid protein receptor using a probabilistic procedure based on statistical thermodynamic principles to place ligand atom triplets at the lowest energy sites.
Abstract: A new method is presented that docks molecular fragments to a rigid protein receptor. It uses a probabilistic procedure based on statistical thermodynamic principles to place ligand atom triplets at the lowest energy sites. The probabilistic method ranks receptor binding modes so that the lowest energy ones are sampled first. This allows constraints to be introduced to limit the depth of the search leading to a computationally efficient method of sampling low energy conformational space. This is combined with energy minimization of the initial fragment placement to arrive at a low energy conformation for the molecular fragment. Two different search methods are tested involving (i) geometric hashing and (ii) pose clustering methods. Ten molecular fragments were docked that have commonly been used to test docking methods. The success rate was 8/10 and 10/10 for generating a close solution ranked first using the two different sampling procedures. In general, all five of the top ranked solutions reproduce the observed binding mode, which increases confidence in the predictions. A set of ten molecular fragments that have previously been identified as problematic were docked. Success was achieved in 3/10 and 4/10 using the two different methods. Again there is a high level of agreement between the two methods and again in the successful cases the top ranked solutions are correct whilst in the case of the failures none are. The geometric hashing and pose clustering methods are fast averaging ∼ 13 and ∼ 11 s per placement respectively using conservative parameters. The results are very encouraging and will facilitate the process of finding novel small molecule lead compounds by virtual screening of chemical databases.