scispace - formally typeset
Search or ask a question

Showing papers in "Proteins in 2007"


Journal ArticleDOI
01 Sep 2007-Proteins
TL;DR: The scope of this review is to summarize all the available information regarding hot spots for a better atomic understanding of their structure and function, to improve the rational design of complexes of high affinity and specificity as well as that of small molecules, which can mimic the functional epitopes of the proteic complexes.
Abstract: Proteins tendency to bind to one another in a highly specific manner forming stable complexes is fundamental to all biological processes A better understanding of complex formation has many practical applications, which include the rational design of new therapeutic agents, and the analysis of metabolic and signal transduction networks Alanine-scanning mutagenesis made possible the detection of the functional epitopes, and demonstrated that most of the protein-protein binding energy is related only to a group of few amino acids at intermolecular protein interfaces: the hot spots The scope of this review is to summarize all the available information regarding hot spots for a better atomic understanding of their structure and function The ultimate objective is to improve the rational design of complexes of high affinity and specificity as well as that of small molecules, which can mimic the functional epitopes of the proteic complexes

689 citations


Journal ArticleDOI
01 Oct 2007-Proteins
TL;DR: FireDock's prediction results are comparable to current state‐of‐the‐art refinement methods while its running time is significantly lower, and its refinement procedure significantly improves the ranking of the rigid‐body PatchDock algorithm for these cases.
Abstract: Here, we present FireDock, an efficient method for the refinement and rescoring of rigid-body docking solutions. The refinement process consists of two main steps: (1) rearrangement of the interface side-chains and (2) adjustment of the relative orientation of the molecules. Our method accounts for the observation that most interface residues that are important in recognition and binding do not change their conformation significantly upon complexation. Allowing full side-chain flexibility, a common procedure in refinement methods, often causes excessive conformational changes. These changes may distort preformed structural signatures, which have been shown to be important for binding recognition. Here, we restrict side-chain movements, and thus manage to reduce the false-positive rate noticeably. In the later stages of our procedure (orientation adjustments and scoring), we smooth the atomic radii. This allows for the minor backbone and side-chain movements and increases the sensitivity of our algorithm. FireDock succeeds in ranking a near-native structure within the top 15 predictions for 83% of the 30 enzyme-inhibitor test cases, and for 78% of the 18 semiunbound antibody-antigen complexes. Our refinement procedure significantly improves the ranking of the rigid-body PatchDock algorithm for these cases. The FireDock program is fully automated. In particular, to our knowledge, FireDock's prediction results are comparable to current state-of-the-art refinement methods while its running time is significantly lower. The method is available at http://bioinfo3d.cs.tau.ac.il/FireDock/.

630 citations


Journal ArticleDOI
01 Dec 2007-Proteins
TL;DR: HADDOCK2.0 as mentioned in this paper is the most recent version of HADDOCK, which incorporates considerable improvements and new features, such as random patch definition or center-of-mass restraints.
Abstract: Here we present version 2.0 of HADDOCK, which incorporates considerable improvements and new features. HADDOCK is now able to model not only protein-protein complexes but also other kinds of biomolecular complexes and multi-component (N > 2) systems. In the absence of any experimental and/or predicted information to drive the docking, HADDOCK now offers two additional ab initio docking modes based on either random patch definition or center-of-mass restraints. The docking protocol has been considerably improved, supporting among other solvated docking, automatic definition of semi-flexible regions, and inclusion of a desolvation energy term in the scoring scheme. The performance of HADDOCK2.0 is evaluated on the targets of rounds 4-11, run in a semi-automated mode using the original information we used in our CAPRI submissions. This enables a direct assessment of the progress made since the previous versions. Although HADDOCK performed very well in CAPRI (65% and 71% success rates, overall and for unbound targets only, respectively), a substantial improvement was achieved with HADDOCK2.0.

542 citations


Journal ArticleDOI
Yang Zhang1
01 Jan 2007-Proteins
TL;DR: For the first time, the automated server prediction generates models as good as the human‐expert does in all the categories, which shows the robustness of the method and the potential of the application to genome‐wide structure prediction.
Abstract: We developed and tested the I-TASSER protein structure prediction algorithm in the CASP7 experiment, where targets are first threaded through the PDB library and continuous fragments in the threading alignments are exploited to assemble the global structure. The final models are obtained from the progressive refinements started from the last round structure clusters. A majority of the targets in the template-based modeling (TBM) category have the templates drawn closer to the native structure by more than 1 A within the aligned regions. For the free-modeling (FM) targets, I-TASSER builds correct topology for 7/19 cases with sequence up to 155 residues long. For the first time, the automated server prediction generates models as good as the human-expert does in all the categories, which shows the robustness of the method and the potential of the application to genome-wide structure prediction. Despite the success, the accuracy of I-TASSER modeling is still dominated by the similarity of the template and target structures with a strong correlation coefficient ( approximately 0.9) between the root-mean-squared deviation (RMSD) to native of the templates and the final models. Especially, there is no high-resolution model below 2 A for the FM targets. These problems highlight the issues that need to be addressed in the next generation of atomic-level I-TASSER development especially for the FM target modeling.

476 citations


Journal ArticleDOI
01 Jun 2007-Proteins
TL;DR: A scoring function that utilizes detailed electrostatics, van der Waals, and desolvation to rescore initial‐stage docking predictions is developed and tested and is shown to significantly improve the success rate over the initial ZDOCK rankings across a large benchmark.
Abstract: Protein-protein docking requires fast and effective methods to quickly discriminate correct from incorrect predictions generated by initial-stage docking. We have developed and tested a scoring function that utilizes detailed electrostatics, van der Waals, and desolvation to rescore initial-stage docking predictions. Weights for the scoring terms were optimized for a set of test cases, and this optimized function was then tested on an independent set of nonredundant cases. This program, named ZRANK, is shown to significantly improve the success rate over the initial ZDOCK rankings across a large benchmark. The amount of test cases with No. 1 ranked hits increased from 2 to 11 and from 6 to 12 when predictions from two ZDOCK versions were considered. ZRANK can be applied either as a refinement protocol in itself or as a preprocessing stage to enrich the well-ranked hits prior to further refinement.

435 citations


Journal ArticleDOI
01 Jan 2007-Proteins
TL;DR: The state-of-the-art in protein structure prediction was evaluated in the 7th CASP experiment as discussed by the authors, where the authors reported improvements in model accuracy relative to that obtainable from knowledge of a single best template structure.
Abstract: This paper is an introduction to the supplemental issue of the journal PROTEINS, dedicated to the seventh CASP experiment to assess the state of the art in protein structure prediction. The paper describes the conduct of the experiment, the categories of prediction included, and outlines the evaluation and assessment procedures. Highlights are improvements in model accuracy relative to that obtainable from knowledge of a single best template structure; convergence of the accuracy of models produced by automatic servers toward that produced by human modeling teams; the emergence of methods for predicting the quality of models; and rapidly increasing practical applications of the methods.

347 citations


Journal ArticleDOI
01 Dec 2007-Proteins
TL;DR: Evaluating blind predictions performed during 2005–2007 as part of Rounds 6–12 of the community‐wide experiment on Critical Assessment of PRedicted Interactions shows that current scoring methods are probably not sensitive enough, indicating that the growing community of CAPRI predictors is engaged more actively than ever in the development of better scoring functions and means of modeling conformational flexibility.
Abstract: The performance of methods for predicting protein-protein interactions at the atomic scale is assessed by evaluating blind predictions performed during 2005-2007 as part of Rounds 6-12 of the community-wide experiment on Critical Assessment of PRedicted Interactions (CAPRI) These Rounds also included a new scoring experiment, where a larger set of models contributed by the predictors was made available to groups developing scoring functions These groups scored the uploaded set and submitted their own best models for assessment The structures of nine protein complexes including one homodimer were used as targets These targets represent biologically relevant interactions involved in gene expression, signal transduction, RNA, or protein processing and membrane maintenance For all the targets except one, predictions started from the experimentally determined structures of the free (unbound) components or from models derived by homology, making it mandatory for docking methods to model the conformational changes that often accompany association In total, 63 groups and eight automatic servers, a substantial increase from previous years, submitted docking predictions, of which 1994 were evaluated here Fifteen groups submitted 305 models for five targets in the scoring experiment Assessment of the predictions reveals that 31 different groups produced models of acceptable and medium accuracy-but only one high accuracy submission-for all the targets, except the homodimer In the latter, none of the docking procedures reproduced the large conformational adjustment required for correct assembly, underscoring yet again that handling protein flexibility remains a major challenge In the scoring experiment, a large fraction of the groups attained the set goal of singling out the correct association modes from incorrect solutions in the limited ensembles of contributed models But in general they seemed unable to identify the best models, indicating that current scoring methods are probably not sensitive enough With the increased focus on protein assemblies, in particular by structural genomics efforts, the growing community of CAPRI predictors is engaged more actively than ever in the development of better scoring functions and means of modeling conformational flexibility, which hold promise for much progress in the future

340 citations


Journal ArticleDOI
15 Nov 2007-Proteins
TL;DR: This work shows that it can reduce the complexity of model representation and thus make the computation tractable with minimal loss of predictive performance and introduces a pair‐wise statistical potential suitable for docking that builds on previous work and is incorporated into the fast fourier transform‐based docking algorithm ZDOCK.
Abstract: The biophysical study of protein-protein interactions and docking has important implications in our understanding of most complex cellular signaling processes. Most computational approaches to protein docking involve a tradeoff between the level of detail incorporated into the model and computational power required to properly handle that level of detail. In this work, we seek to optimize that balance by showing that we can reduce the complexity of model representation and thus make the computation tractable with minimal loss of predictive performance. We also introduce a pair-wise statistical potential suitable for docking that builds on previous work and show that this potential can be incorporated into our fast fourier transform-based docking algorithm ZDOCK. We use the Protein Docking Benchmark to illustrate the improved performance of this potential compared with less detailed other scoring functions. Furthermore, we show that the new potential performs well on antibody-antigen complexes, with most predictions clustering around the Complementarity Determining Regions of antibodies without any manual intervention.

310 citations


Journal ArticleDOI
01 Aug 2007-Proteins
TL;DR: A simple approach to scoring of rigid‐body docking poses, which is able to detect a near‐native solution from 12,000 docking poses and place it within the 100 lowest‐energy docking solutions in 56% of the cases, in a completely unrestricted manner and without any other additional information.
Abstract: The accurate scoring of rigid-body docking orientations represents one of the major difficulties in protein-protein docking prediction. Other challenges are the development of faster and more efficient sampling methods and the introduction of receptor and ligand flexibility during simulations. Overall, good discrimination of near-native docking poses from the very early stages of rigid-body protein docking is essential step before applying more costly interface refinement to the correct docking solutions. Here we explore a simple approach to scoring of rigid-body docking poses, which has been implemented in a program called pyDock. The scheme is based on Coulombic electrostatics with distance dependent dielectric constant, and implicit desolvation energy with atomic solvation parameters previously adjusted for rigid-body protein-protein docking. This scoring function is not highly dependent on specific geometry of the docking poses and therefore can be used in rigid-body docking sets generated by a variety of methods. We have tested the procedure in a large benchmark set of 80 unbound docking cases. The method is able to detect a near-native solution from 12,000 docking poses and place it within the 100 lowest-energy docking solutions in 56% of the cases, in a completely unrestricted manner and without any other additional information. More specifically, a near-native solution will lie within the top 20 solutions in 37% of the cases. The simplicity of the approach allows for a better understanding of the physical principles behind protein-protein association, and provides a fast tool for the evaluation of large sets of rigid-body docking poses in search of the near-native orientation.

282 citations


Journal ArticleDOI
01 Jan 2007-Proteins
TL;DR: In this article, the authors describe predictions made using the Rosetta structure prediction methodology for both template-based modeling and free modeling categories in the Seventh Critical Assessment of Techniques for Protein Structure Prediction.
Abstract: We describe predictions made using the Rosetta structure prediction methodology for both template-based modeling and free modeling categories in the Seventh Critical Assessment of Techniques for Protein Structure Prediction. For the first time, aggressive sampling and all-atom refinement could be carried out for the majority of targets, an advance enabled by the Rosetta@home distributed computing network. Template-based modeling predictions using an iterative refinement algorithm improved over the best existing templates for the majority of proteins with less than 200 residues. Free modeling methods gave near-atomic accuracy predictions for several targets under 100 residues from all secondary structure classes. These results indicate that refinement with an all-atom energy function, although computationally expensive, is a powerful method for obtaining accurate structure predictions.

217 citations


Journal ArticleDOI
01 Nov 2007-Proteins
TL;DR: Overall, OPEP correctly identifies 24 native or native‐like states for 29 targets and has very similar capability to the all‐atom discrete optimized protein energy model (DOPE), found recently to outperform five currently used energy models.
Abstract: We have revisited the protein coarse-grained optimized potential for efficient structure prediction (OPEP). The training and validation sets consist of 13 and 16 protein targets. Because optimization depends on details of how the ensemble of decoys is sampled, trial conformations are generated by molecular dynamics, threading, greedy, and Monte Carlo simulations, or taken from publicly available databases. The OPEP parameters are varied by a genetic algorithm using a scoring function which requires that the native structure has the lowest energy, and the native-like structures have energy higher than the native structure but lower than the remote conformations. Overall, we find that OPEP correctly identifies 24 native or native-like states for 29 targets and has very similar capability to the all-atom discrete optimized protein energy model (DOPE), found recently to outperform five currently used energy models. Proteins 2007. © 2007 Wiley-Liss, Inc.

Journal ArticleDOI
01 Sep 2007-Proteins
TL;DR: Two knowledge‐based models are presented that improve the ability to predict hot spots: K‐FADE uses shape specificity features calculated by the Fast Atomic Density Evaluation (FADE) program, and K‐CON uses biochemical contact features and displays better overall predictive accuracy than computational alanine scanning (Robetta–Ala).
Abstract: Protein–protein interactions can be altered by mutating one or more “hot spots,” the subset of residues that account for most of the interface's binding free energy. The identification of hot spots requires a significant experimental effort, highlighting the practical value of hot spot predictions. We present two knowledge-based models that improve the ability to predict hot spots: K-FADE uses shape specificity features calculated by the Fast Atomic Density Evaluation (FADE) program, and K-CON uses biochemical contact features. The combined K-FADE/CON (KFC) model displays better overall predictive accuracy than computational alanine scanning (Robetta–Ala). In addition, because these methods predict different subsets of known hot spots, a large and significant increase in accuracy is achieved by combining KFC and Robetta–Ala. The KFC analysis is applied to the calmodulin (CaM)/smooth muscle myosin light chain kinase (smMLCK) interface, and to the bone morphogenetic protein-2 (BMP-2)/BMP receptor-type I (BMPR-IA) interface. The results indicate a strong correlation between KFC hot spot predictions and mutations that significantly reduce the binding affinity of the interface. Proteins 2007. © 2007 Wiley-Liss, Inc.

Journal ArticleDOI
01 Apr 2007-Proteins
TL;DR: The strong and strong hydrogen bonds are ubiquitous in protein–ligand recognition, and that with suitable computational tools very large numbers of strong and weak intermolecular interactions in the ligand–protein interface may be analyzed reliably.
Abstract: The characteristics of N--H...O, O--H...O, and C--H...O hydrogen bonds and other weak intermolecular interactions are analyzed in a large and diverse group of 251 protein-ligand complexes using a new computer program that was developed in-house for this purpose. The interactions examined in the present study are those which occur in the active sites, defined here as a sphere of 10 A radius around the ligand. Notably, N--H...O and O--H...O bonds tend towards linearity. Multifurcated interactions are especially common, especially multifurcated acceptors, and the average degree of furcation is 2.6 hydrogen bonds per furcated acceptor. A significant aspect of this study is that we have been able to assess the reliability of hydrogen bond geometry as a function of crystallographic resolution. Thresholds of 2.3 and 2.0 A are established for strong and weak hydrogen bonds, below which hydrogen bond geometries may be safely considered for detailed analysis. Interactions involving water as donor or acceptor, and C--H...O bonds with Gly and Tyr as donors are ubiquitous in the active site. A similar trend was observed in an external test set of 233 protein-ligand complexes belonging to the kinase family. Weaker interactions like X--H...pi (X = C, N, O) and those involving halogen atoms as electrophiles or nucleophiles have also been studied. We conclude that the strong and weak hydrogen bonds are ubiquitous in protein-ligand recognition, and that with suitable computational tools very large numbers of strong and weak intermolecular interactions in the ligand-protein interface may be analyzed reliably. Results confirm earlier trends reported previously by us but the extended nature of the present data set mean that the observed trends are more reliable.

Journal ArticleDOI
01 Jul 2007-Proteins
TL;DR: The results indicate that the ethanol‐induced conformation transition of silk fibroin in films and solutions is a three‐phase process, which gives support to the previous evidence that natural silk spinning in silkworms is nucleation‐dependent, and that silkworms use concentrated silk protein solutions to speed up the nucleation step.
Abstract: Time-resolved FTIR analysis was used to monitor the conformation transition induced by treating regenerated Bombyx mori silk fibroin films and solutions with different concentrations of ethanol. The resulting curves showing the kinetics of the transition for both films and fibroin solutions were influenced by the ethanol concentration. In addition, for silk fibroin solutions the protein concentration also had an effect on the kinetics. At low ethanol concentrations (for example, less than 40% v/v in the case of film), films and fibroin solutions showed a phase in which beta-sheets slowly formed at a rate dependent on the ethanol concentration. Reducing the concentration of the fibroin in solutions also slowed the formation of beta-sheets. These observations suggest that this phase represents a nucleation step. Such a nucleation phase was not seen in the conformation transition at ethanol concentrations > 40% in films or > 50% in silk fibroin solutions. Our results indicate that the ethanol-induced conformation transition of silk fibroin in films and solutions is a three-phase process. The first phase is the initiation of beta-sheet structure (nucleation), the second is a fast phase of beta-sheet growth while the third phase represents a slow perfection of previously formed beta-sheet structure. The nucleation step can be very fast or relatively slow, depending on factors that influence protein chain mobility and intermolecular hydrogen bond formation. The findings give support to the previous evidence that natural silk spinning in silkworms is nucleation-dependent, and that silkworms (like spiders) use concentrated silk protein solutions, and careful control of the pH value and metallic ion content of the processing environment to speed up the nucleation step to produce a rapid conformation transition to convert the water soluble spinning dope to a tough solid silk fiber.

Journal ArticleDOI
01 Apr 2007-Proteins
TL;DR: It is shown that rhodopsin's fluctuations are not well described by 100 ns of dynamics, and that the sampling is not fully converged even for individual loops, a reminder of the caution required when interpreting molecular dynamics simulations of macromolecules.
Abstract: The central question in evaluating almost any result from a molecular dynamics simulation is whether the calculation has converged. Unfortunately, assessing the ergodicity of a single trajectory is very difficult to do. In this work, we assess the sampling of molecular dynamics simulations of the membrane protein rhodopsin by comparing the results from 26 independent trajectories, each run for 100 ns. By examining principal components and cluster populations, we show that rhodopsin's fluctuations are not well described by 100 ns of dynamics, and that the sampling is not fully converged even for individual loops. The results serve as a reminder of the caution required when interpreting molecular dynamics simulations of macromolecules.

Journal ArticleDOI
01 Jun 2007-Proteins
TL;DR: The ability of EADock to accurately predict binding modes on a real application was illustrated by the successful docking of the RGD cyclic pentapeptide on the αVβ3 integrin, starting far away from the binding pocket.
Abstract: In recent years, protein-ligand docking has become a powerful tool for drug development. Although several approaches suitable for high throughput screening are available, there is a need for methods able to identify binding modes with high accuracy. This accuracy is essential to reliably compute the binding free energy of the ligand. Such methods are needed when the binding mode of lead compounds is not determined experimentally but is needed for structure-based lead optimization. We present here a new docking software, called EADock, that aims at this goal. It uses an hybrid evolutionary algorithm with two fitness functions, in combination with a sophisticated management of the diversity. EADock is interfaced with the CHARMM package for energy calculations and coordinate handling. A validation was carried out on 37 crystallized protein-ligand complexes featuring 11 different proteins. The search space was defined as a sphere of 15 A around the center of mass of the ligand position in the crystal structure, and on the contrary to other benchmarks, our algorithm was fed with optimized ligand positions up to 10 A root mean square deviation (RMSD) from the crystal structure, excluding the latter. This validation illustrates the efficiency of our sampling strategy, as correct binding modes, defined by a RMSD to the crystal structure lower than 2 A, were identified and ranked first for 68% of the complexes. The success rate increases to 78% when considering the five best ranked clusters, and 92% when all clusters present in the last generation are taken into account. Most failures could be explained by the presence of crystal contacts in the experimental structure. Finally, the ability of EADock to accurately predict binding modes on a real application was illustrated by the successful docking of the RGD cyclic pentapeptide on the alphaVbeta3 integrin, starting far away from the binding pocket.

Journal ArticleDOI
01 Sep 2007-Proteins
TL;DR: The Library is shown to outperform BLASTP and a general Pfam hidden Markov model of the kinase catalytic domain in the retrieval and family‐level classification of protein kinases and provides novel insights on the early evolution and subsequent adaptations of the various protein kinase families in eukaryotes.
Abstract: Reversible protein phosphorylation by protein kinases and phosphatases is a ubiquitous signaling mechanism in all eukaryotic cells. A multilevel hidden Markov model library is presented which is able to classify protein kinases into one of 12 families, with a misclassification rate of zero on the characterized kinomes of H. sapiens, M. musculus, D. melanogaster, C. elegans, S. cerevisiae, D. discoideum, and P. falciparum. The Library is shown to outperform BLASTP and a general Pfam hidden Markov model of the kinase catalytic domain in the retrieval and family-level classification of protein kinases. The application of the Library to the 38 unclassified kinases of yeast enriches the yeast kinome in protein kinases of the families AGC (5), CAMK (17), CMGC (4), and STE (1), thereby raising the family-level classification of yeast conventional protein kinases from 66.96 to 90.43%. The application of the Library to 21 eukaryotic genomes shows seven families (AGC, CAMK, CK1, CMGC, STE, PIKK, and RIO) to be present in all genomes analyzed, and so is likely to be essential to eukaryotes. Putative tyrosine kinases (TKs) are found in the plants A. thaliana (2), O. sativa ssp. Indica (6), and O. sativa ssp. Japonica (7), and in the amoeba E. histolytica (7). To our knowledge, TKs have not been predicted in plants before. This also suggests that a primitive set of TKs might have predated the radiation of eukaryotes. Putative tyrosine kinase-like kinases (TKLs) are found in the fungi C. neoformans (2), P. chrysosporium (4), in the Apicomplexans C. hominis (4), P. yoelii (4), and P. falciparum (6), the amoeba E. histolytica (109), and the alga T. pseudonana (6). TKLs are found to be abundant in plants (776 in A. thaliana, 1010 in O. sativa ssp. Indica, and 969 in O. sativa ssp. Japonica). TKLs might have predated the radiation of eukaryotes too and have been lost secondarily from some fungi. The application of the Library facilitates the annotation of kinomes and has provided novel insights on the early evolution and subsequent adaptations of the various protein kinase families in eukaryotes. Proteins 2007. © 2007 Wiley-Liss, Inc.

Journal ArticleDOI
01 Jan 2007-Proteins
TL;DR: The accuracy of predicted protein models for 108 target domains was assessed based on a detailed comparison between the experimental and predicted structures and it showed that the best groups produced models closer to the target structure than the best single template for a significant number of targets.
Abstract: This manuscript presents the assessment of the template-based modeling category of the seventh Critical Assessment of Techniques for Protein Structure Prediction (CASP7). The accuracy of predicted protein models for 108 target domains was assessed based on a detailed comparison between the experimental and predicted structures. The assessment was performed using numerical measures for backbone and structural alignment accuracy, and by scoring correctly modeled hydrogen bond interactions in the predictions. Based on these criteria, our statistical analysis identified a number of groups whose predictions were on average significantly more accurate. Furthermore, the predictions for six target proteins were evaluated for the accuracy of their modeled cofactor binding sites. We also assessed the ability of predictors to improve over the best available single template structure, which showed that the best groups produced models closer to the target structure than the best single template for a significant number of targets. In addition, we assessed the accuracy of the error estimates (local confidence values) assigned to predictions on a per residue basis. Finally, we discuss some general conclusions about the state of the art of template-based modeling methods and their usefulness for practical applications.

Journal ArticleDOI
01 Jan 2007-Proteins
TL;DR: There was a sense of progress in template FM relative to CASP6, but the ability to demonstrate this progress objectively was unable to be demonstrated.
Abstract: In CASP7, protein structure prediction targets that lacked substantial similarity to a protein in the PDB at the time of assessment were considered to be free modeling targets (FM). We assessed predictions for 14 FM targets as well as four other targets that were deemed to be on the borderline between FM targets and template based modeling targets (TBM/FM). GDT_TS was used as one measure of model quality. Model quality was also assessed by visual inspection. Visual inspection was performed by three independent assessors who were blinded to GDT_TS scores and other quantitative measures of model quality. The best models by visual inspection tended to rank among the top few percent by GDT_TS, but were typically not the highest scoring models. Thus, visual inspection remains an essential component of assessment for FM targets. Overall, group TS020 (Baker) performed best, but success on individual targets was widely distributed among many groups. Among these other groups, TS024 and TS025 (Zhang and Zhang server) performed notably well without exceptionally large computing resources. This should be considered encouraging for future CASPs. There was a sense of progress in template FM relative to CASP6, but we were unable to demonstrate this progress objectively. Proteins 2007. © 2007 Wiley-Liss, Inc.

Journal ArticleDOI
01 Oct 2007-Proteins
TL;DR: A mixed elastic network model (MENM) is developed to study large‐scale conformational transitions of proteins between two (or more) known structures and is computationally efficient and generally applicable even for large protein systems that undergo highly collective structural changes.
Abstract: We develop a mixed elastic network model (MENM) to study large-scale conformational transitions of proteins between two (or more) known structures. Elastic network potentials for the beginning and end states of a transition are combined, in effect, by adding their respective partition functions. The resulting effective MENM energy function smoothly interpolates between the original surfaces, and retains the beginning and end structures as local minima. Saddle points, transition paths, potentials of mean force, and partition functions can be found efficiently by largely analytic methods. To characterize the protein motions during a conformational transition, we follow "transition paths" on the MENM surface that connect the beginning and end structures and are invariant to parameterizations of the model and the mathematical form of the mixing scheme. As illustrations of the general formalism, we study large-scale conformation changes of the motor proteins KIF1A kinesin and myosin II. We generate possible transition paths for these two proteins that reveal details of their conformational motions. The MENM formalism is computationally efficient and generally applicable even for large protein systems that undergo highly collective structural changes.

Journal ArticleDOI
01 Jun 2007-Proteins
TL;DR: Improvement can be quite reliably achieved when the initial models are sufficiently close to the native basin (e.g., 3–4 Å Cα RMSD) and reliable structural information is incorporated into the simulation protocol.
Abstract: Recent advances in efficient and accurate treatment of solvent with the generalized Born approximation (GB) have made it possible to substantially refine the protein structures generated by various prediction tools through detailed molecular dynamics simulations. As demonstrated in a recent CASPR experiment, improvement can be quite reliably achieved when the initial models are sufficiently close to the native basin (e.g., 3-4 A C(alpha) RMSD). A key element to effective refinement is to incorporate reliable structural information into the simulation protocol. Without intimate knowledge of the target and prediction protocol used to generate the initial structural models, it can be assumed that the regular secondary structure elements (helices and strands) and overall fold topology are largely correct to start with, such that the protocol limits itself to the scope of refinement and focuses the sampling in vicinity of the initial structure. The secondary structures can be enforced by dihedral restraints and the topology through structural contacts, implemented as either multiple pair-wise C(alpha) distance restraints or a single sidechain distance matrix restraint. The restraints are weakly imposed with flat-bottom potentials to allow sufficient flexibility for structural rearrangement. Refinement is further facilitated by enhanced sampling of advanced techniques such as the replica exchange method (REX). In general, for single domain proteins of small to medium sizes, 3-5 nanoseconds of REX/GB refinement simulations appear to be sufficient for reasonable convergence. Clustering of the resulting structural ensembles can yield refined models over 1.0 A closer to the native structure in C(alpha) RMSD. Substantial improvement of sidechain contacts and rotamer states can also be achieved in most cases. Additional improvement is possible with longer sampling and knowledge of the robust structural features in the initial models for a given prediction protocol. Nevertheless, limitations still exist in sampling as well as force field accuracy, manifested as difficulty in refinement of long and flexible loops.

Journal ArticleDOI
17 Sep 2007-Proteins
TL;DR: FCA should provide improved collective degrees of freedom for dimension‐reduced descriptions of macromolecular dynamics and is shown to be due to a strongly increased anharmonicity of FCA modes as compared to the respective PCA modes.
Abstract: Correlated motions in biomolecules are often essential for their function, for example, allosteric signal transduction or mechanical/thermodynamic energy transport. Principal component analysis (PCA) is a widely used method to extract functionally relevant collective motions from a molecular dynamics (MD) trajectory. Being based on the covariance matrix, however, PCA detects only linear correlations. Here we present a new method, full correlation analysis (FCA), which is based on mutual information and thus quantifies all correlations, including nonlinear and higher order correlations. For comparison, we applied both, PCA and FCA, to approximately 100 ns MD trajectories of T4 lysozyme and the hexapeptide neurotensin. For both systems, FCA yielded better resolved conformational substates and aligned its modes more often with actual transition pathways. This improved resolution is shown to be due to a strongly increased anharmonicity of FCA modes as compared to the respective PCA modes. The high anharmonicity further suggests that the motions extracted by FCA are functionally more relevant than those captured by PCA. In summary, FCA should provide improved collective degrees of freedom for dimension-reduced descriptions of macromolecular dynamics.

Journal ArticleDOI
12 Dec 2007-Proteins
TL;DR: Generally, protein classification is a multi‐class classification problem and can be reduced to a set of binary classification problems, where one classifier is designed for each class, but in this case the number of proteins in one class is usually much smaller than that of the proteins outside the class.
Abstract: Generally, protein classification is a multi-class classification problem and can be reduced to a set of binary classification problems, where one classifier is designed for each class. The proteins in one class are seen as positive examples while those outside the class are seen as negative examples. However, the imbalanced problem will arise in this case because the number of proteins in one class is usually much smaller than that of the proteins outside the class. As a result, the imbalanced data cause classifiers to tend to overfit and to perform poorly in particular on the minority class. This article presents a new technique for protein classification with imbalanced data. First, we propose a new algorithm to overcome the imbalanced problem in protein classification with a new sampling technique and a committee of classifiers. Then, classifiers trained in different feature spaces are combined together to further improve the accuracy of protein classification. The numerical experiments on benchmark datasets show promising results, which confirms the effectiveness of the proposed method in terms of accuracy. The Matlab code and supplementary materials are available at http://eserver2.sat.iis.u-tokyo.ac.jp/ approximately xmzhao/proteins.html.

Journal ArticleDOI
01 May 2007-Proteins
TL;DR: An improved method for predicting interfaces from a single protein structure, which is based on local statistical properties of the protein surface derived at the level of atomic groups, which makes PIER a suitable tool for automated high‐throughput annotation of protein structures emerging from structural proteomics projects.
Abstract: Recent advances in structural proteomics call for development of fast and reliable automatic methods for prediction of functional surfaces of proteins with known three-dimensional structure, including binding sites for known and unknown protein partners as well as oligomerization interfaces. Despite significant progress the problem is still far from being solved. Most existing methods rely, at least partially, on evolutionary information from multiple sequence alignments projected on protein surface. The common drawback of such methods is their limited applicability to the proteins with a sparse set of sequential homologs, as well as inability to detect interfaces in evolutionary variable regions. In this study, the authors developed an improved method for predicting interfaces from a single protein structure, which is based on local statistical properties of the protein surface derived at the level of atomic groups. The proposed Protein IntErface Recognition (PIER) method achieved the overall precision of 60% at the recall threshold of 50% at the residue level on a diverse benchmark of 490 homodimeric, 62 heterodimeric, and 196 transient interfaces (compared with 25% precision at 50% recall expected from random residue function assignment). For 70% of proteins in the benchmark, the binding patch residues were successfully detected with precision exceeding 50% at 50% recall. The calculation only took seconds for an average 300-residue protein. The authors demonstrated that adding the evolutionary conservation signal only marginally influenced the overall prediction performance on the benchmark; moreover, for certain classes of proteins, using this signal actually resulted in a deteriorated prediction. Thorough benchmarking using other datasets from literature showed that PIER yielded improved performance as compared with several alignment-free or alignment-dependent predictions. The accuracy, efficiency, and dependence on structure alone make PIER a suitable tool for automated high-throughput annotation of protein structures emerging from structural proteomics projects. Proteins 2007. © 2007 Wiley-Liss, Inc.

Journal ArticleDOI
01 Oct 2007-Proteins
TL;DR: Testing on a set of 187 diverse protein‐ligand complexes has shown that the AutoLigand method is successful in predicting the location and approximate volume of the binding site in 73% of cases.
Abstract: We present a method, termed AutoLigand, for the prediction of ligand-binding sites in proteins of known structure. The method searches the space surrounding the protein and finds the contiguous envelope with the specified volume of atoms, which has the largest possible interaction energy with the protein. It uses a full atomic representation, with atom types for carbon, hydrogen, oxygen, nitrogen and sulfur (and others, if desired), and is designed to minimize the need for artificial geometry. Testing on a set of 187 diverse protein-ligand complexes has shown that the method is successful in predicting the location and approximate volume of the binding site in 73% of cases. Additional testing was performed on a set of 96 protein-ligand complexes with crystallographic structures of apo and holo forms, and AutoLigand was able to predict the binding site in 80% of the apo structures.

Journal ArticleDOI
01 Jun 2007-Proteins
TL;DR: The proposed activity of R207910 against Mycobacterium tuberculosis is based on interference of the compound with the escapement geometry of the proton transfer chain, corroborated by the good agreement between the computed interaction energies and the observed pattern of stereo‐specificity in the model of the binding region.
Abstract: Diarylquinolines (DARQs) are a new class of potent inhibitors of the ATPase of Mycobacterium tuberculosis. We have created a homology model of a binding site for this class of compounds located on the contact area of the a-subunit (gene atpB) and c-subunits (gene atpE) of Mycobacterium tuberculosis ATPase. The binding pocket that was identified from the analysis of the homology model is formed by 4 helices of three c-subunits and 2 helices of the a-subunit. The lead compound of the DARQ series, R207910, was docked into the pocket using a simulated annealing, multiple conformer, docking algorithm. Different stereoisomers were treated separately. The best docking pose for each stereoisomer was optimized by molecular dynamics simulation on the 5300 atoms of the binding region and ligand. The interaction energies in the computed complexes enable us to rank the different stereoisomers in order of interaction strength with the ATPase binding pockets. We propose that the activity of R207910 against Mycobacterium tuberculosis is based on interference of the compound with the escapement geometry of the proton transfer chain. Upon binding the compound mimicks the conserved Arg-186 residue of the a-subunit and interacts in its place with the conserved acidic residue Glu-61 of the c-subunit. This mode of action is corroborated by the good agreement between the computed interaction energies and the observed pattern of stereo-specificity in the model of the binding region. Proteins 2007. © 2007 Wiley-Liss, Inc.

Journal ArticleDOI
01 Jan 2007-Proteins
TL;DR: Evaluation of the predictions submitted to the model quality assessment (QA) category in CASP7 demonstrates that a respectable accuracy in this task can be achieved by methods relying on the comparison of different models for the same target.
Abstract: The article presents our evaluation of the predictions submitted to the model quality assessment (QA) category in CASP7. In this newly introduced category, predictors were asked to provide quality estimates for protein structure models. The QA category uses the automatically produced models that are traditionally distributed to CASP participants as input for predictions. Predictors were asked to provide an index of the quality of these individual models (QM1) as well as an index for the expected correctness of each of their residues (QM2). We computed the correlation between the observed and predicted quality of the models and of the individual residues achieved by the participating groups and evaluated the statistical significance of the differences. We also compared the results with those obtained by a "naive predictor" that assigns a quality score related to how close the model is to the structure of the most similar protein of known structure. The aims of a method for assessing the overall quality of a model can be twofold: selecting the best (or one of the best) model(s) among a set of plausible choices, or assigning a nonrelative quality value to an individual model. The applications of the two strategies are different, albeit equally important. Our assessment of the QA category demonstrates that methods for addressing the first task effectively do exist, while there is room for improvement as far as the second aspect is concerned. Notwithstanding the limited number of groups submitting predictions for residue-level accuracy, our data demonstrate that a respectable accuracy in this task can be achieved by methods relying on the comparison of different models for the same target.

Journal ArticleDOI
15 Nov 2007-Proteins
TL;DR: An approach for detecting statistically significant structural differences between crystal and NMR structural models is presented, based on structural superposition and the analysis of the distributions of atomic positions relative to a mean structure, and finds that repulsive crystal packing plays a minor role in the observed differences.
Abstract: The existence of a large number of proteins for which both nuclear magnetic resonance (NMR) and X-ray crystallographic coordinates have been deposited into the Protein Data Bank (PDB) makes the statistical comparison of the corresponding crystal and NMR structural models over a large data set possible, and facilitates the study of the effect of the crystal environment and other factors on structure. We present an approach for detecting statistically significant structural differences between crystal and NMR structural models which is based on structural superposition and the analysis of the distributions of atomic positions relative to a mean structure. We apply this to a set of 148 protein structure pairs (crystal vs NMR), and analyze the results in terms of methodological and physical sources of structural difference. For every one of the 148 structure pairs, the backbone root-mean-square distance (RMSD) over core atoms of the crystal structure to the mean NMR structure is larger than the average RMSD of the members of the NMR ensemble to the mean, with 76% of the structure pairs having an RMSD of the crystal structure to the mean more than a factor of two larger than the average RMSD of the NMR ensemble. On average, the backbone RMSD over core atoms of crystal structure to the mean NMR is approximately 1 A. If non-core atoms are included, this increases to 1.4 A due to the presence of variability in loops and similar regions of the protein. The observed structural differences are only weakly correlated with the age and quality of the structural model and differences in conditions under which the models were determined. We examine steric clashes when a putative crystalline lattice is constructed using a representative NMR structure, and find that repulsive crystal packing plays a minor role in the observed differences between crystal and NMR structures. The observed structural differences likely have a combination of physical and methodological causes. Stabilizing attractive interactions arising from intermolecular crystal contacts which shift the equilibrium of the crystal structure relative to the NMR structure is a likely physical source which can account for some of the observed differences. Methodological sources of apparent structural difference include insufficient sampling or other issues which could give rise to errors in the estimates of the precision and/or accuracy.

Journal ArticleDOI
15 Aug 2007-Proteins
TL;DR: A method is presented that efficiently generates realistic all‐atom protein structures starting from the Cα atom positions, as obtained from extensive coarse‐grain simulations, and shows good correspondence and little distortion in the protein folding landscape.
Abstract: Multiscale methods are becoming increasingly promising as a way to characterize the dynamics of large protein systems on biologically relevant time-scales. The underlying assumption in multiscale simulations is that it is possible to move reliably between different resolutions. We present a method that efficiently generates realistic all-atom protein structures starting from the C(alpha) atom positions, as obtained for instance from extensive coarse-grain simulations. The method, a reconstruction algorithm for coarse-grain structures (RACOGS), is validated by reconstructing ensembles of coarse-grain structures obtained during folding simulations of the proteins src-SH3 and S6. The results show that RACOGS consistently produces low energy, all-atom structures. A comparison of the free energy landscapes calculated using the coarse-grain structures versus the all-atom structures shows good correspondence and little distortion in the protein folding landscape.

Journal ArticleDOI
01 May 2007-Proteins
TL;DR: A set of 51 pairs of known inactive and active allosteric protein structures from the Protein Data Bank is compiled and local conformational differences between the two structures of each protein are calculated using simple metrics, such as backbone and side‐chain Cartesian displacement, and torsion angle change and rearrangement in residue–residue contacts.
Abstract: Allosteric proteins have been stud- ied extensively in the last 40 years, but so far, no systematic analysis of conformational changes bet- ween allosteric structures has been carried out. Here, we compile a set of 51 pairs of known inac- tive and active allosteric protein structures from the Protein Data Bank. We calculate local confor- mational differences between the two structures of each protein using simple metrics, such as backbone and side-chain Cartesian displacement, and torsion angle change and rearrangement in residue-residue contacts. Thresholds for each met- ric arise from distributions of motions in two con- trol sets of pairs of protein structures in the same biochemical state. Statistical analysis of motions in allosteric proteins quantifies the magnitude of allosteric effects and reveals simple structural principles about allostery. For example, allosteric proteins exhibit substantial conformational changes comprising about 20% of the residues. In addition, motions in allosteric proteins show strong bias to- ward weakly constrained regions such as loops and the protein surface. Correlation functions show that motions communicate through protein structures over distances averaging 10-20 residues in sequence space and 10-20 Ain Cartesian space. Comparison of motions in the allosteric set and a set of 21 nonallos- teric ligand-binding proteins shows that nonallos- teric proteins also exhibit bias of motion toward weakly constrained regions and local correlation of motion. However, allosteric proteins exhibit twice as much percent motion on average as nonallosteric proteins with ligand-induced motion. These observa- tions may guide efforts to design flexibility and allos- tery into proteins. Proteins 2007;67:385-399. V C 2007