scispace - formally typeset
Search or ask a question

Showing papers in "Proteins in 2003"


Journal ArticleDOI
15 Feb 2003-Proteins
TL;DR: Geometrical validation around the Cα is described, with a new Cβ measure and updated Ramachandran plot, and Favored and allowed ϕ,ψ regions are also defined for Pro, pre‐Pro, and Gly (important because Gly ϕ‐ψ angles are more permissive but less accurately determined).
Abstract: Geometrical validation around the Calpha is described, with a new Cbeta measure and updated Ramachandran plot. Deviation of the observed Cbeta atom from ideal position provides a single measure encapsulating the major structure-validation information contained in bond angle distortions. Cbeta deviation is sensitive to incompatibilities between sidechain and backbone caused by misfit conformations or inappropriate refinement restraints. A new phi,psi plot using density-dependent smoothing for 81,234 non-Gly, non-Pro, and non-prePro residues with B < 30 from 500 high-resolution proteins shows sharp boundaries at critical edges and clear delineation between large empty areas and regions that are allowed but disfavored. One such region is the gamma-turn conformation near +75 degrees,-60 degrees, counted as forbidden by common structure-validation programs; however, it occurs in well-ordered parts of good structures, it is overrepresented near functional sites, and strain is partly compensated by the gamma-turn H-bond. Favored and allowed phi,psi regions are also defined for Pro, pre-Pro, and Gly (important because Gly phi,psi angles are more permissive but less accurately determined). Details of these accurate empirical distributions are poorly predicted by previous theoretical calculations, including a region left of alpha-helix, which rates as favorable in energy yet rarely occurs. A proposed factor explaining this discrepancy is that crowding of the two-peptide NHs permits donating only a single H-bond. New calculations by Hu et al. [Proteins 2002 (this issue)] for Ala and Gly dipeptides, using mixed quantum mechanics and molecular mechanics, fit our nonrepetitive data in excellent detail. To run our geometrical evaluations on a user-uploaded file, see MOLPROBITY (http://kinemage.biochem.duke.edu) or RAMPAGE (http://www-cryst.bioc.cam.ac.uk/rampage).

3,963 citations


Journal ArticleDOI
01 Sep 2003-Proteins
TL;DR: In terms of producing binding energy estimates, the Goldscore function appears to perform better than the Chemscore function and the two consensus protocols, particularly for faster search settings.
Abstract: The Chemscore function was implemented as a scoring function for the protein-ligand docking program GOLD, and its performance compared to the original Goldscore function and two consensus docking protocols, "Goldscore-CS" and "Chemscore-GS," in terms of docking accuracy, prediction of binding affinities, and speed. In the "Goldscore-CS" protocol, dockings produced with the Goldscore function are scored and ranked with the Chemscore function; in the "Chemscore-GS" protocol, dockings produced with the Chemscore function are scored and ranked with the Goldscore function. Comparisons were made for a "clean" set of 224 protein-ligand complexes, and for two subsets of this set, one for which the ligands are "drug-like," the other for which they are "fragment-like." For "drug-like" and "fragment-like" ligands, the docking accuracies obtained with Chemscore and Goldscore functions are similar. For larger ligands, Goldscore gives superior results. Docking with the Chemscore function is up to three times faster than docking with the Goldscore function. Both combined docking protocols give significant improvements in docking accuracy over the use of the Goldscore or Chemscore function alone. "Goldscore-CS" gives success rates of up to 81% (top-ranked GOLD solution within 2.0 A of the experimental binding mode) for the "clean list," but at the cost of long search times. For most virtual screening applications, "Chemscore-GS" seems optimal; search settings that give docking speeds of around 0.25-1.3 min/compound have success rates of about 78% for "drug-like" compounds and 85% for "fragment-like" compounds. In terms of producing binding energy estimates, the Goldscore function appears to perform better than the Chemscore function and the two consensus protocols, particularly for faster search settings. Even at docking speeds of around 1-2 min/compound, the Goldscore function predicts binding energies with a standard deviation of approximately 10.5 kJ/mol.

2,505 citations


Journal ArticleDOI
01 Jul 2003-Proteins
TL;DR: A new scoring function for the initial stage of unbound docking is presented that combines the recently developed pairwise shape complementarity with desolvation and electrostatics and shows superior performance, especially for the antibody‐antigen category of test cases.
Abstract: The development of scoring functions is of great importance to protein docking. Here we present a new scoring function for the initial stage of unbound docking. It combines our recently developed pairwise shape complementarity with desolvation and electrostatics. We compare this scoring function with three other functions on a large benchmark of 49 nonredundant test cases and show its superior performance, especially for the antibody-antigen category of test cases. For 44 test cases (90% of the benchmark), we can retain at least one near-native structure within the top 2000 predictions at the 6 degrees rotational sampling density, with an average of 52 near-native structures per test case. The remaining five difficult test cases can be explained by a combination of poor binding affinity, large backbone conformational changes, and our algorithm's strong tendency for identifying large concave binding pockets. All four scoring functions have been integrated into our Fast Fourier Transform based docking algorithm ZDOCK, which is freely available to academic users at http://zlab.bu.edu/~ rong/dock.

1,305 citations


Journal ArticleDOI
01 Jul 2003-Proteins
TL;DR: The motivations for launching CAPRI, the rules that were applied to select targets and run the experiment, the results stress the need for new scoring functions and for methods handling the conformation changes that were observed in some of the target systems, and some conclusions can already be drawn.
Abstract: CAPRI is a communitywide experiment to assess the capacity of protein-docking methods to predict protein-protein interactions. Nineteen groups participated in rounds 1 and 2 of CAPRI and submitted blind structure predictions for seven protein-protein complexes based on the known structure of the component proteins. The predictions were compared to the unpublished X-ray structures of the complexes. We describe here the motivations for launching CAPRI, the rules that we applied to select targets and run the experiment, and some conclusions that can already be drawn. The results stress the need for new scoring functions and for methods handling the conformation changes that were observed in some of the target systems. CAPRI has already been a powerful drive for the community of computational biologists who development docking algorithms. We hope that this issue of Proteins will also be of interest to the community of structural biologists, which we call upon to provide new targets for future rounds of CAPRI, and to all molecular biologists who view protein-protein recognition as an essential process.

625 citations


Journal ArticleDOI
15 Feb 2003-Proteins
TL;DR: The results demonstrate the significant improvement of structure quality by a short refinement in a thin layer of solvent and show that a dihedral angle energy term in the force field is beneficial for structure calculation and refinement.
Abstract: We present a CPU efficient protocol for refinement of protein structures in a thin layer of explicit solvent and energy parameters with completely revised dihedral angle terms. Our approach is suitable for protein structures determined by theoretical (e.g., homology modeling or threading) or experimental methods (e.g., NMR). In contrast to other recently proposed refinement protocols, we put a strong emphasis on consistency with widely accepted covalent parameters and computational efficiency. We illustrate the method for NMR structure calculations of three proteins: interleukin-4, ubiquitin, and crambin. We show a comparison of their structure ensembles before and after refinement in water with and without a force field energy term for the dihedral angles; crambin was also refined in DMSO. Our results demonstrate the significant improvement of structure quality by a short refinement in a thin layer of solvent. Further, they show that a dihedral angle energy term in the force field is beneficial for structure calculation and refinement. We discuss the optimal weight for the energy constant for the backbone angle omega and include an extensive discussion of meaning and relevance of the calculated validation criteria, in particular root mean square Z scores for covalent parameters such as bond lengths.

615 citations


Journal ArticleDOI
03 Sep 2003-Proteins
TL;DR: This study successfully isolate 1046 functional modules from the known protein interaction network of Saccharomyces cerevisiae involving 8046 individual pair‐wise interactions by using an entirely automated and unsupervised graph clustering algorithm.
Abstract: Complex cellular processes are modular and are accomplished by the concerted action of functional modules (Ravasz et al., Science 2002;297:1551-1555; Hartwell et al., Nature 1999;402: C47-52). These modules encompass groups of genes or proteins involved in common elementary biologi- cal functions. One important and largely unsolved goal of functional genomics is the identification of functional modules from genomewide information, such as transcription profiles or protein interac- tions. To cope with the ever-increasing volume and complexity of protein interaction data (Bader et al., Nucleic Acids Res 2001;29:242-245; Xenarios et al., Nucleic Acids Res 2002;30:303-305), new automated approaches for pattern discovery in these densely connected interaction networks are required (Ravasz et al., Science 2002;297:1551-1555; Bader and Hogue, Nat Biotechnol 2002;20:991-997; Snel et al., Proc Natl Acad Sci USA 2002;99:5890 -5895). In this study, we successfully isolate 1046 functional modules from the known protein interaction net- work of Saccharomyces cerevisiae involving 8046 individual pair-wise interactions by using an en- tirely automated and unsupervised graph cluster- ing algorithm. This systems biology approach is able to detect many well-known protein complexes or biological processes, without reference to any addi- tional information. We use an extensive statistical validation procedure to establish the biological sig- nificance of the detected modules and explore this complex, hierarchical network of modular interac- tions from which pathways can be inferred. Proteins 2004;54:49 -57. © 2003 Wiley-Liss, Inc.

453 citations


Journal ArticleDOI
01 Jan 2003-Proteins
TL;DR: This experiment supports the predictability of intrinsic disorder from amino acid sequence by making blind predictions of intrinsic order and disorder on 42 proteins subsequently revealed to contain 9,044 ordered residues and 284 disordered residues.
Abstract: Blind predictions of intrinsic order and disorder were made on 42 proteins subsequently revealed to contain 9,044 ordered residues, 284 disordered residues in 26 segments of length 30 residues or less, and 281 disordered residues in 2 disordered segments of length greater than 30 residues. The accuracies of the six predictors used in this experiment ranged from 77% to 91% for the ordered regions and from 56% to 78% for the disordered segments. The average of the order and disorder predictions ranged from 73% to 77%. The prediction of disorder in the shorter segments was poor, from 25% to 66% correct, while the prediction of disorder in the longer segments was better, from 75% to 95% correct. Four of the predictors were composed of ensembles of neural networks. This enabled them to deal more efficiently with the large asymmetry in the training data through diversified sampling from the significantly larger ordered set and achieve better accuracy on ordered and long disordered regions. The exclusive use of long disordered regions for predictor training likely contributed to the disparity of the predictions on long versus short disordered regions, while averaging the output values over 61-residue windows to eliminate short predictions of order or disorder probably contributed to the even greater disparity for three of the predictors. This experiment supports the predictability of intrinsic disorder from amino acid sequence.

427 citations


Journal ArticleDOI
01 Jul 2003-Proteins
TL;DR: The current status of docking procedures for predicting protein–protein interactions starting from their three‐dimensional structure is assessed from a first major evaluation of blind predictions, which reveals genuine progress but also illustrates the remaining serious limitations and points out the need for better scoring functions and more effective ways for handling conformational flexibility.
Abstract: The current status of docking procedures for predicting protein-protein interactions starting from their three-dimensional structure is assessed from a first major evaluation of blind predictions. This evaluation was performed as part of a communitywide experiment on Critical Assessment of PRedicted Interactions (CAPRI). Seven newly determined structures of protein-protein complexes were available as targets for this experiment. These were the complexes between a kinase and its protein substrate, between a T-cell receptor beta-chain and a superantigen, and five antigen-antibody complexes. For each target, the predictors were given the experimental structures of the free components, or of one free and one bound component in a random orientation. The structure of the complex was revealed only at the time of the evaluation. A total of 465 predictions submitted by 19 groups were evaluated. These groups used a wide range of algorithms and scoring functions, some of which were completely novel. The quality of the predicted interactions was evaluated by comparing residue-residue contacts and interface residues to those in the X-ray structures and by analyzing the fit of the ligand molecules (the smaller of the two proteins in the complex) or of interface residues only, in the predicted versus target complexes. A total of 14 groups produced predictions, ranking from acceptable to highly accurate for five of the seven targets. The use of available biochemical and biological information, and in one instance structural information, played a key role in achieving this result. It was essential for identifying the native binding modes for the five correctly predicted targets, including the kinase-substrate complex where the enzyme changes conformation on association. But it was also the cause for missing the correct solution for the two remaining unpredicted targets, which involve unexpected antigen-antibody binding modes. Overall, this analysis reveals genuine progress in docking procedures but also illustrates the remaining serious limitations and points out the need for better scoring functions and more effective ways for handling conformational flexibility.

408 citations


Journal ArticleDOI
01 Sep 2003-Proteins
TL;DR: An algorithm is developed that partitions protein disorder into flavors based on competition among increasing numbers of predictors, with prediction accuracy determining both the number of distinct predictors and the partitioning of the individual proteins.
Abstract: Intrinsically disordered proteins are characterized by long regions lacking 3-D structure in their native states, yet they have been so far associated with 28 distinguishable functions. Previous studies showed that protein predictors trained on disorder from one type of protein often achieve poor accuracy on disorder of proteins of a different type, thus indicating significant differences in sequence properties among disordered proteins. Important biological problems are identifying different types, or flavors, of disorder and examining their relationships with protein function. Innovative use of computational methods is needed in addressing these problems due to relative scarcity of experimental data and background knowledge related to protein disorder. We developed an algorithm that partitions protein disorder into flavors based on competition among increasing numbers of predictors, with prediction accuracy determining both the number of distinct predictors and the partitioning of the individual proteins. Using 145 variously characterized proteins with long (>30 amino acids) disordered regions, 3 flavors, called V, C, and S, were identified by this approach, with the V subset containing 52 segments and 7743 residues, C containing 39 segments and 3402 residues, and S containing 54 segments and 5752 residues. The V, C, and S flavors were distinguishable by amino acid compositions, sequence locations, and biological function. For the sequences in SwissProt and 28 genomes, their protein functions exhibit correlations with the commonness and usage of different disorder flavors, suggesting different flavor-function sets across these protein groups. Overall, the results herein support the flavor-function approach as a useful complement to structural genomics as a means for automatically assigning possible functions to sequences.

385 citations


Journal ArticleDOI
01 Jul 2003-Proteins
TL;DR: A nonredundant benchmark for testing protein–protein docking algorithms and should benefit the docking community not only as a large curated test set but also as a common ground for comparing different algorithms.
Abstract: We have developed a nonredundant benchmark for testing protein-protein docking algorithms. Currently it contains 59 test cases: 22 enzyme-inhibitor complexes, 19 antibody-antigen complexes, 11 other complexes, and 7 difficult test cases. Thirty-one of the test cases, for which the unbound structures of both the receptor and ligand are available, are classified as follows: 16 enzyme-inhibitor, 5 antibody-antigen, 5 others, and 5 difficult. Such a centralized resource should benefit the docking community not only as a large curated test set but also as a common ground for comparing different algorithms. The benchmark is available at (http://zlab.bu.edu/~rong/dock/benchmark.shtml).

360 citations


Journal ArticleDOI
01 Jan 2003-Proteins
TL;DR: This interactive model building procedure has several advantages and suggests important ways in which its and other methods can be improved, examples of which are provided.
Abstract: We participated in the fold recognition and homology sections of CASP5 using primarily in-house software. The central feature of our structure prediction strategy involved the ability to generate good sequence-to-structure alignments and to quickly transform them into models that could be evaluated both with energy-based methods and manually. The in-house tools we used include: a) HMAP (Hybrid Multidimensional Alignment Profile)-a profile-to-profile alignment method that is derived from sequence-enhanced multiple structure alignments in core regions, and sequence motifs in non-structurally conserved regions. b) NEST-a fast model building program that applies an "artificial evolution" algorithm to construct a model from a given template and alignment. c) GRASP2-a new structure and alignment visualization program incorporating multiple structure superposition and domain database scanning modules. These methods were combined with model evaluation based on all atom and simplified physical-chemical energy functions. All of these methods were under development during CASP5 and consequently a great deal of manual analysis was carried out at each stage of the prediction process. This interactive model building procedure has several advantages and suggests important ways in which our and other methods can be improved, examples of which are provided.

Journal ArticleDOI
15 Nov 2003-Proteins
TL;DR: This work presents a simple and effective algorithm RDOCK, which makes substantial improvement upon the top predictions by ZDOCK with all three scoring functions and the improvement is observed across all three categories of test cases in a large benchmark of 49 non‐redundant unbound test cases.
Abstract: We present a simple and effective algorithm RDOCK for refining unbound predictions generated by a rigid-body docking algorithm ZDOCK, which has been developed earlier by our group. The main component of RDOCK is a three-stage energy minimization scheme, followed by the evaluation of electrostatic and desolvation energies. Ionic side chains are kept neutral in the first two stages of minimization, and reverted to their full charge states in the last stage of brief minimization. Without side chain conformational search or filtering/clustering of resulting structures, RDOCK represents the simplest approach toward refining unbound docking predictions. Despite its simplicity, RDOCK makes substantial improvement upon the top predictions by ZDOCK with all three scoring functions and the improvement is observed across all three categories of test cases in a large benchmark of 49 non-redundant unbound test cases. RDOCK makes the most powerful combination with ZDOCK2.1, which uses pairwise shape complementarity as the scoring function. Collectively, they rank a near-native structure as the number-one prediction for 18 test cases (37% of the benchmark), and within the top 4 predictions for 24 test cases (49% of the benchmark). To various degrees, funnel-like energy landscapes are observed for these 24 test cases. To the best of our knowledge, this is the first report of binding funnels starting from global searches for a broad range of test cases. These results are particularly exciting, given that we have not used any biological information that is specific to individual test cases and the whole process is entirely automated. Among three categories of test cases, the best results are seen for enzyme/inhibitor, with a near-native structure ranked as the number-one prediction for 48% test cases, and within the top 10 predictions for 78% test cases. RDOCK is freely available to academic users at http://zlab.bu.edu/ approximately rong/dock.

Journal ArticleDOI
Ruhong Zhou1
01 Nov 2003-Proteins
TL;DR: The β‐hairpin from C‐terminus of protein G is used as an example to explore the folding free energy landscape with various GB models, and the results are compared to the explicit solvent simulations and experiments.
Abstract: The Generalized Born (GB) continuum solvent model is arguably the most widely used implicit solvent model in protein folding and protein structure prediction simulations; however, it still remains an open question on how well the model behaves in these large-scale simulations The current study uses the beta-hairpin from C-terminus of protein G as an example to explore the folding free energy landscape with various GB models, and the results are compared to the explicit solvent simulations and experiments All free energy landscapes are obtained from extensive conformation space sampling with a highly parallel replica exchange method Because solvation model parameters are strongly coupled with force fields, five different force field/solvation model combinations are examined and compared in this study, namely the explicit solvent model: OPLSAA/SPC model, and the implicit solvent models: OPLSAA/SGB (Surface GB), AMBER94/GBSA (GB with Solvent Accessible Surface Area), AMBER96/GBSA, and AMBER99/GBSA Surprisingly, we find that the free energy landscapes from implicit solvent models are quite different from that of the explicit solvent model Except for AMBER96/GBSA, all other implicit solvent models find the lowest free energy state not the native state All implicit solvent models show erroneous salt-bridge effects between charged residues, particularly in OPLSAA/SGB model, where the overly strong salt-bridge effect results in an overweighting of a non-native structure with one hydrophobic residue F52 expelled from the hydrophobic core in order to make better salt bridges On the other hand, both AMBER94/GBSA and AMBER99/GBSA models turn the beta-hairpin in to an alpha-helix, and the alpha-helical content is much higher than the previously reported alpha-helices in an explicit solvent simulation with AMBER94 (AMBER94/TIP3P) Only AMBER96/GBSA shows a reasonable free energy landscape with the lowest free energy structure the native one despite an erroneous salt-bridge between D47 and K50 Detailed results on free energy contour maps, lowest free energy structures, distribution of native contacts, alpha-helical content during the folding process, NOE comparison with NMR, and temperature dependences are reported and discussed for all five models

Journal ArticleDOI
01 Jan 2003-Proteins
TL;DR: The Robetta server produced quite reasonable predictions for targets in the recent CASP‐5 and CAFASP‐3 experiments, some of which were at the level of the best human predictions.
Abstract: Robetta is a fully automated protein structure prediction server that uses the Rosetta fragment-insertion method. It combines template-based and de novo structure prediction methods in an attempt to produce high quality models that cover every residue of a submitted sequence. The first step in the procedure is the automatic detection of the locations of domains and selection of the appropriate modeling protocol for each domain. For domains matched to a homolog with an experimentally characterized structure by PSI-BLAST or Pcons2, Robetta uses a new alignment method, called K*Sync, to align the query sequence onto the parent structure. It then models the variable regions by allowing them to explore conformational space with fragments in fashion similar to the de novo protocol, but in the context of the template. When no structural homolog is available, domains are modeled with the Rosetta de novo protocol, which allows the full length of the domain to explore conformational space via fragment-insertion, producing a large decoy ensemble from which the final models are selected. The Robetta server produced quite reasonable predictions for targets in the recent CASP-5 and CAFASP-3 experiments, some of which were at the level of the best human predictions.

Journal ArticleDOI
01 Jan 2003-Proteins
TL;DR: An overview of the SAM‐T02 method for protein fold recognition and the UNDERTAKER program for ab initio predictions is presented and results on a few selected targets for which this combined method worked particularly well are presented.
Abstract: This article presents an overview of the SAM-T02 method for protein fold recognition and the UNDERTAKER program for ab initio predictions. The SAM-T02 server is an automatic method that uses two-track hidden Markov models (HMMS) to find and align template proteins from PDB to the target protein. The two-track HMMs use an amino acid alphabet and one of several different local structure alphabets. The UNDERTAKER program is a new fragment-packing program that can use short or long fragments and alignments to create protein conformations. The HMMs and fold-recognition alignments from the SAM-T02 method were used to generate the fragment and alignment libraries used by UNDERTAKER. We present results on a few selected targets for which this combined method worked particularly well: T0129, T0181, T0135, T0130, and T0139.

Journal ArticleDOI
15 May 2003-Proteins
TL;DR: Without any post‐processing or biological information about the binding site except the complementarity‐determining region of antibodies, PSC predicts the complex structure correctly for 6 test cases, and ranks at least one near‐native structure in the top 20 predictions for 18 test cases.
Abstract: Shape complementarity is the most basic ingredient of the scoring functions for protein-protein docking. Most grid-based docking algorithms use the total number of grid points at the binding interface to quantify shape complementarity. We have developed a novel Pairwise Shape Complementarity (PSC) function that is conceptually simple and rapid to compute. The favorable component of PSC is the total number of atom pairs between the receptor and the ligand within a distance cutoff. When applied to a benchmark of 49 test cases, PSC consistently ranks near-native structures higher and produces more near-native structures than the traditional grid-based function, and the improvement was seen across all prediction levels and in all categories of the benchmark. Without any post-processing or biological information about the binding site except the complementarity-determining region of antibodies, PSC predicts the complex structure correctly for 6 test cases, and ranks at least one near-native structure in the top 20 predictions for 18 test cases. Our docking program ZDOCK has been parallelized and the average computing time is 4 minutes using sixteen IBM SP3 processors. Both ZDOCK and the benchmark are freely available to academic users (http://zlab.bu.edu/~ rong/dock).

Journal ArticleDOI
15 Nov 2003-Proteins
TL;DR: On average, subunit interfaces in homodimers are twice larger than in complexes, and much less polar due to the large fraction belonging to the core, although the amino acid compositions of the cores are similar in the two types of interfaces.
Abstract: The subunit interfaces of 122 homodimers of known three-dimensional structure are analyzed and dissected into sets of surface patches by clustering atoms at the interface; 70 interfaces are single-patch, the others have up to six patches, often contributed by different structural domains. The average interface buries 1,940 A 2 of the surface of each monomer, contains one or two patches burying 600-1,600 A 2 , is 65% nonpolar and includes 18 hydrogen bonds. However, the range of size and of hydrophobicity is wide among the 122 interfaces. Each interface has a core made of residues with atoms buried in the dimer, surrounded by a rim of residues with atoms that remain accessible to solvent. The core, which constitutes 77% of the interface on average, has an amino acid composition that resembles the protein interior except for the presence of arginine residues, whereas the rim is more like the protein surface. These properties of the interfaces in homodimers, which are permanent assemblies, are compared to those of protein-protein complexes where the components associate after they have independently folded. On average, subunit interfaces in homodimers are twice larger than in complexes, and much less polar due to the large fraction belonging to the core, although the amino acid compositions of the cores are similar in the two types of interfaces.

Journal ArticleDOI
15 Feb 2003-Proteins
TL;DR: For both solutes, the distribution from the QM/MM simulation shows greater similarity with the distribution in high‐resolution protein structures than is the case for any of the MM simulations.
Abstract: We compare the conformational distributions of Ace-Ala-Nme and Ace-Gly-Nme sampled in long simulations with several molecular mechanics (MM) force fields and with a fast combined quantum mechanics/molecular mechanics (QM/MM) force field, in which the solute's intramolecular energy and forces are calculated with the self-consistent charge density functional tight binding method (SCCDFTB), and the solvent is represented by either one of the well-known SPC and TIP3P models. All MM force fields give two main states for Ace-Ala-Nme, beta and alpha separated by free energy barriers, but the ratio in which these are sampled varies by a factor of 30, from a high in favor of beta of 6 to a low of 1/5. The frequency of transitions between states is particularly low with the amber and charmm force fields, for which the distributions are noticeably narrower, and the energy barriers between states higher. The lower of the two barriers lies between alpha and beta at values of psi near 0 for all MM simulations except for charmm22. The results of the QM/MM simulations vary less with the choice of MM force field; the ratio beta/alpha varies between 1.5 and 2.2, the easy pass lies at psi near 0, and transitions between states are more frequent than for amber and charmm, but less frequent than for cedar. For Ace-Gly-Nme, all force fields locate a diffuse stable region around phi = pi and psi = pi, whereas the amber force field gives two additional densely sampled states near phi = +/-100 degrees and psi = 0, which are also found with the QM/MM force field. For both solutes, the distribution from the QM/MM simulation shows greater similarity with the distribution in high-resolution protein structures than is the case for any of the MM simulations.

Journal ArticleDOI
01 Aug 2003-Proteins
TL;DR: The resulting energy function (IMM1) reproduces the preference of Trp and Tyr for the membrane interface, gives reasonable energies of insertion into or adsorption onto a membrane, and allows stable 1‐ns MD simulations of the glycophorin A dimer.
Abstract: A simple extension of the EEF1 energy function to heterogeneous membrane-aqueous media is proposed. The extension consists of (a) development of solvation parameters for a nonpolar phase using experimental data for the transfer of amino acid side-chains from water to cyclohexane, (b) introduction of a heterogeneous membrane-aqueous system by making the reference solvation free energy of each atom dependent on the vertical coordinate, (c) a modification of the distance-dependent dielectric model to account for reduced screening of electrostatic interactions in the membrane, and (d) an adjustment of the EEF1 aqueous model in light of recent calculations of the potential of mean force between amino acid side-chains in water. The electrostatic model is adjusted to match experimental observations for polyalanine, polyleucine, and the glycophorin A dimer. The resulting energy function (IMM1) reproduces the preference of Trp and Tyr for the membrane interface, gives reasonable energies of insertion into or adsorption onto a membrane, and allows stable 1-ns MD simulations of the glycophorin A dimer. We find that the lowest-energy orientation of melittin in bilayers varies, depending on the thickness of the hydrocarbon layer.

Journal ArticleDOI
01 Jul 2003-Proteins
TL;DR: A very efficient rigid “unbound” soft docking methodology, which is based on detection of geometric shape complementarity, allowing liberal steric clash at the interface, avoiding the exhaustive search of the 6D transformation space.
Abstract: We present a very efficient rigid "unbound" soft docking methodology, which is based on detection of geometric shape complementarity, allowing liberal steric clash at the interface. The method is based on local shape feature matching, avoiding the exhaustive search of the 6D transformation space. Our experiments at CAPRI rounds 1 and 2 show that although the method does not perform an exhaustive search of the 6D transformation space, the "correct" solution is never lost. However, such a solution might rank low for large proteins, because there are alternatives with significantly larger geometrically compatible interfaces. In many cases this problem can be resolved by successful a priori focusing on the vicinity of potential binding sites as well as the extension of the technique to flexible (hinge-bent) docking. This is demonstrated in the experiments performed as a lesson from our CAPRI experience.

Journal ArticleDOI
24 Nov 2003-Proteins
TL;DR: The interplay between strong and weak interactions in ligand binding possibly leads to a satisfactory enthalpy–entropy balance in macromolecular structures, and the implications to crystallographic refinement and molecular dynamics software are discussed.
Abstract: The characteristics of N-H...O, O-H...O, and C-H...O hydrogen bonds are examined in a group of 28 high-resolution crystal structures of protein-ligand complexes from the Protein Data Bank and compared with interactions found in small-molecule crystal structures from the Cambridge Structural Database. It is found that both strong and weak hydrogen bonds are involved in ligand binding. Because of the prevalence of multifurcation, the restrictive geometrical criteria set up for hydrogen bonds in small-molecule crystal structures may need to be relaxed in macromolecular structures. For example, there are definite deviations from linearity for the hydrogen bonds in protein-ligand complexes. The formation of C-H...O hydrogen bonds is influenced by the activation of the C(alpha)-H atoms and by the flexibility of the side-chain atoms. In contrast to small-molecule structures, anticooperative geometries are common in the macromolecular structures studied here, and there is a gradual lengthening as the extent of furcation increases. C-H...O bonds formed by Gly, Phe, and Tyr residues are noteworthy. The numbers of hydrogen bond donors and acceptors agree with Lipinski's "rule of five" that predicts drug-like properties. Hydrogen bonds formed by water are also seen to be relevant in ligand binding. Ligand C-H...O(w) interactions are abundant when compared to N-H...O(w) and O-H...O(w). This suggests that ligands prefer to use their stronger hydrogen bond capabilities for use with the protein residues, leaving the weaker interactions to bind with water. In summary, the interplay between strong and weak interactions in ligand binding possibly leads to a satisfactory enthalpy-entropy balance. The implications of these results to crystallographic refinement and molecular dynamics software are discussed.

Journal ArticleDOI
01 Nov 2003-Proteins
TL;DR: In this paper, molecular dynamics simulations of a polyalanine model were performed, which is an -helix in its native state and observed a metastable -hairpin intermediate.
Abstract: The aggregation of -helix-rich proteins into -sheet-rich amyloid fibrils is associ- ated with fatal diseases, such as Alzheimer's disease and prion disease. During an aggregation process, protein secondary structure elements—-helices— undergo conformational changes to -sheets. The fact that proteins with different sequences and structures undergo a similar transition on aggrega- tion suggests that the sequence nonspecific hydro- gen bond interaction among protein backbones is an important factor. We perform molecular dynam- ics simulations of a polyalanine model, which is an -helix in its native state and observe a metastable -hairpin intermediate. Although a -hairpin has larger potential energy than an -helix, the entropy of a -hairpin is larger because of fewer constraints imposed by the hydrogen bonds. In the vicinity of the transition temperature, we observe the intercon- version of the -helix and -sheet states via a ran- dom coil state. We also study the effect of the environment by varying the relative strength of side-chain interactions for a designed peptide—an -helix in its native state. For a certain range of side-chain interaction strengths, we find that the intermediate-hairpin state is destabilized and even disappears, suggesting an important role of the environment in the aggregation propensity of a peptide. Proteins 2003;53:220 -228.

Journal ArticleDOI
01 Jan 2003-Proteins
TL;DR: In this article, a special issue of the journal Proteins dedicated to the fifth CASP experiment to assess the state-of-the-art in protein structure prediction is presented.
Abstract: This article provides an introduction to the special issue of the journal Proteins dedicated to the fifth CASP experiment to assess the state of the art in protein structure prediction. The article describes the conduct, the categories of prediction, and the evaluation and assessment procedures of the experiment. A brief summary of progress over the five CASP experiments is provided. Related developments in the field are also described.

Journal ArticleDOI
01 Jan 2003-Proteins
TL;DR: The overall two‐state prediction accuracy for the method is very high but this is highly skewed by the fact that most residues are observed to be ordered, which gives a more realistic impression of the overall accuracy of the method.
Abstract: We describe here the results ofusing a neural network based method (DISOPRED)for predicting disordered regions in 55 proteins inthe 5 th CASP experiment. A set of 715 highly re-solved proteins with regions of disorder was used totrain the network. The inputs to the network werederived from sequence profiles generated by PSI-BLAST. A post-filter was applied to the output of thenetworktopreventregionsbeingpredictedasdisor-dered in regions of confidently predicted alphahelix or beta sheet structure. The overall two-stateprediction accuracy for the method is very high(90%) but this is highly skewed by the fact that mostresidues are observed to be ordered. The overallMatthews’ correlation coefficient for the submittedpredictions is 0.34, which gives a more realisticimpression of the overall accuracy of the method,though still indicates significant predictive power.Proteins2003;53:573–578. © 2003 Wiley-Liss, Inc. Key words: protein structure prediction; folding;disorder; neural networks; sequenceanalysisINTRODUCTION

Journal ArticleDOI
12 Dec 2003-Proteins
TL;DR: Simple methods to charactere quantitatively the arc shape of LRR are developed and then applied to all known LRR proteins, finding a quantity of 2Rsin(φ/2), in which R and φ are the radii of the LRR arc and the rotation angle about the central axis per repeating unit, respectively, is highly conserved in all the L RR proteins regardless of a large variety of repeat number and the radius of theLRR arc.
Abstract: LRR-containing proteins are present in over 2000 proteins from viruses to eukaryotes. Most LRRs are 20-30 amino acids long, and the repeat number ranges from 2 to 42. The known structures of 14 LRR proteins, each containing 4-17 repeats, have revealed that the LRR domains fold into a horseshoe (or arc) shape with a parallel beta-sheet on the concave face and with various secondary structures, including alpha-helix, 3(10)-helix, and pII helix on the convex face. We developed simple methods to charactere quantitatively the arc shape of LRR and then applied them to all known LRR proteins. A quantity of 2Rsin(phi/2), in which R and phi are the radii of the LRR arc and the rotation angle about the central axis per repeating unit, respectively, is highly conserved in all the LRR proteins regardless of a large variety of repeat number and the radius of the LRR arc. The radii of the LRR arc with beta-alpha structural units are smaller than those with beta-3(10) or beta-pII units. The concave face of the LRR beta-sheet forms a surface analogous to a part of a Mobius strip.

Journal ArticleDOI
01 Jul 2003-Proteins
TL;DR: This article describes and reviews the efforts using Hex 3.1 to predict the docking modes of the seven target protein–protein complexes presented in the CAPRI (Critical Assessment of Predicted Interactions) blind docking trial, and describes several enhancements to the original spherical polar Fourier docking correlation algorithm.
Abstract: This article describes and reviews our efforts using Hex 3.1 to predict the docking modes of the seven target protein-protein complexes presented in the CAPRI (Critical Assessment of Predicted Interactions) blind docking trial. For each target, the structure of at least one of the docking partners was given in its unbound form, and several of the targets involved large multimeric structures (e.g., Lactobacillus HPr kinase, hemagglutinin, bovine rotavirus VP6). Here we describe several enhancements to our original spherical polar Fourier docking correlation algorithm. For example, a novel surface sphere smothering algorithm is introduced to generate multiple local coordinate systems around the surface of a large receptor molecule, which may be used to define a small number of initial ligand-docking orientations distributed over the receptor surface. High-resolution spherical polar docking correlations are performed over the resulting receptor surface patches, and candidate docking solutions are refined by using a novel soft molecular mechanics energy minimization procedure. Overall, this approach identified two good solutions at rank 5 or less for two of the seven CAPRI complexes. Subsequent analysis of our results shows that Hex 3.1 is able to place good solutions within a list of

Journal ArticleDOI
01 Mar 2003-Proteins
TL;DR: This work presents a new method with which to predict real value ASAs for residues, based on neighborhood information, and observed that the ASA of a residue could be predicted within a 23.7% mean absolute error, even when no information about its neighbors is included.
Abstract: The solvent accessibility of amino acid residues has been predicted in the past by classifying them into exposure states with varying thresholds. This classification provides a wide range of values for the accessible surface area (ASA) within which a residue may fall. Thus far, no attempt has been made to predict real values of ASA from the sequence information without a priori classification into exposure states. Here, we present a new method with which to predict real value ASAs for residues, based on neighborhood information. Our real value prediction neural network could estimate the ASA for four different nonhomologous, nonredundant data sets of varying size, with 18.0-19.5% mean absolute error, defined as per residue absolute difference between the predicted and experimental values of relative ASA. Correlation between the predicted and experimental values ranged from 0.47 to 0.50. It was observed that the ASA of a residue could be predicted within a 23.7% mean absolute error, even when no information about its neighbors is included. Prediction of real values answers the issue of arbitrary choice of ASA state thresholds, and carries more information than category prediction. Prediction error for each residue type strongly correlates with the variability in its experimental ASA values.

Journal ArticleDOI
01 Jun 2003-Proteins
TL;DR: Predicted local structure, a generalization of secondary structure, is incorporated into two‐track profile hidden Markov models (HMMs) and a variety of local structure descriptions are experimented with, following a principled protocol to establish which descriptions are most useful for improving fold recognition and alignment quality.
Abstract: An important problem in computa- tional biology is predicting the structure of the large number of putative proteins discovered by genome sequencing projects. Fold-recognition meth- ods attempt to solve the problem by relating the target proteins to known structures, searching for template proteins homologous to the target. Remote homologs that may have significant structural simi- larity are often not detectable by sequence similari- ties alone. To address this, we incorporated pre- dicted local structure, a generalization of secondary structure, into two-track profile hidden Markov mod- els (HMMs). We did not rely on a simple helix-strand- coil definition of secondary structure, but experi- mented with a variety of local structure descriptions, following a principled protocol to establish which descriptions are most useful for improving fold recognition and alignment quality. On a test set of 1298 nonhomologous proteins, HMMs incorporating a 3-letter STRIDE alphabet improved fold recognition accuracy by 15% over amino-acid-only HMMs and 23% over PSI-BLAST, measured by ROC-65 numbers. We compared two-track HMMs to amino-acid-only HMMs on a difficult alignment test set of 200 protein pairs (structurally similar with 3-24% sequence identity). HMMs with a 6-letter STRIDE secondary track im- proved alignment quality by 62%, relative to DALI structural alignments, while HMMs with an STR track (an expanded DSSP alphabet that subdivides strands into six states) improved by 40% relative to CE.

Journal ArticleDOI
01 Jul 2003-Proteins
TL;DR: The ICM‐DISCO method is demonstrated that the algorithm handles the induced changes of surface side‐chains but is less successful if the backbone undergoes large‐scale rearrangements.
Abstract: The ICM-DISCO (Docking and Interface Side-Chain Optimization) protein-protein-docking method is a direct stochastic global energy optimization from multiple starting positions of the ligand. The first step is performed by docking of a rigid all-atom ligand molecule to a set of soft receptor potentials precalculated on a 0.5 A grid from realistic solvent-corrected force-field energies. This step finds the correct solution as the lowest energy conformation in almost 100% of the cases in which interfaces do not change on binding. The second step is needed to deal with the induced changes and includes the global optimization of the interface side-chains of up to 400 best solutions. The CAPRI predictions were performed fully automatically with this method. Available experimental information was included as a filtering step to favor expected docking surfaces. In three of the seven proposed targets, the ICM-DISCO method found a good solution (>50% of correct contacts) within the five submitted models. The procedure is global and fully automated. We demonstrate that the algorithm handles the induced changes of surface side-chains but is less successful if the backbone undergoes large-scale rearrangements.

Journal ArticleDOI
01 Jan 2003-Proteins
TL;DR: A fully automated version of the CASP5 protocol produced results that were comparable to the human‐assisted predictions for most of the targets, suggesting that automated genomic‐scale, de novo protein structure prediction may soon be worthwhile.
Abstract: We describe predictions of the structures of CASP5 targets using Rosetta. The Ro- setta fragment insertion protocol was used to gener- ate models for entire target domains without detect- able sequence similarity to a protein of known structure and to build long loop insertions (and N-and C-terminal extensions) in cases where a struc- tural template was available. Encouraging results were obtained both for the de novo predictions and for the long loop insertions; we describe here the successes as well as the failures in the context of current efforts to improve the Rosetta method. In particular, de novo predictions failed for large pro- teins that were incorrectly parsed into domains and for topologically complex (high contact order) pro- teins with swapping of segments between domains. However, for the remaining targets, at least one of the five submitted models had a long fragment with significant similarity to the native structure. A fully automated version of the CASP5 protocol produced results that were comparable to the human-assisted predictions for most of the targets, suggesting that automated genomic-scale, de novo protein structure prediction may soon be worthwhile. For the three targets where the human-assisted predictions were significantly closer to the native structure, we iden- tify the steps that remain to be automated. Proteins 2003;53:457- 468. © 2003 Wiley-Liss, Inc.