scispace - formally typeset
Search or ask a question

Showing papers in "Proteins in 2010"


Journal ArticleDOI
01 Jun 2010-Proteins
TL;DR: A new force field, which is termed Amber ff99SB‐ILDN, exhibits considerably better agreement with the NMR data and is validated against a large set of experimental NMR measurements that directly probe side‐chain conformations.
Abstract: Recent advances in hardware and software have enabled increasingly long molecular dynamics (MD) simulations of biomolecules, exposing certain limitations in the accuracy of the force fields used for such simulations and spurring efforts to refine these force fields. Recent modifications to the Amber and CHARMM protein force fields, for example, have improved the backbone torsion potentials, remedying deficiencies in earlier versions. Here, we further advance simulation accuracy by improving the amino acid side-chain torsion potentials of the Amber ff99SB force field. First, we used simulations of model alpha-helical systems to identify the four residue types whose rotamer distribution differed the most from expectations based on Protein Data Bank statistics. Second, we optimized the side-chain torsion potentials of these residues to match new, high-level quantum-mechanical calculations. Finally, we used microsecond-timescale MD simulations in explicit solvent to validate the resulting force field against a large set of experimental NMR measurements that directly probe side-chain conformations. The new force field, which we have termed Amber ff99SB-ILDN, exhibits considerably better agreement with the NMR data. Proteins 2010. © 2010 Wiley-Liss, Inc.

4,590 citations


Journal ArticleDOI
01 Feb 2010-Proteins
TL;DR: It is shown that gain and loss of predicted ubiquitination sites may likely represent a molecular mechanism behind a number of disease‐associatedmutations.
Abstract: Ubiquitination plays an important role in many cellular processes and is implicated in many diseases. Experimental identification of ubiquitination sites is challenging due to rapid turnover of ubiquitinated proteins and the large size of the ubiquitin modifier. We identified 141 new ubiquitination sites using a combination of liquid chromatography, mass spectrometry, and mutant yeast strains. Investigation of the sequence biases and structural preferences around known ubiquitination sites indicated that their properties were similar to those of intrinsically disordered protein regions. Using a combined set of new and previously known ubiquitination sites, we developed a random forest predictor of ubiquitination sites, UbPred. The class-balanced accuracy of UbPred reached 72%, with the area under the ROC curve at 80%. The application of UbPred showed that high confidence Rsp5 ubiquitin ligase substrates and proteins with very short half-lives were significantly enriched in the number of predicted ubiquitination sites. Proteome-wide prediction of ubiquitination sites in Saccharomyces cerevisiae indicated that highly ubiquitinated substrates were prevalent among transcription/enzyme regulators and proteins involved in cell cycle control. In the human proteome, cytoskeletal, cell cycle, regulatory, and cancer-associated proteins display higher extent of ubiquitination than proteins from other functional categories. We show that gain and loss of predicted ubiquitination sites may likely represent a molecular mechanism behind a number of disease-associatedmutations. UbPred is available at http://www.ubpred.org.

538 citations


Journal ArticleDOI
15 Nov 2010-Proteins
TL;DR: The protein–protein docking benchmark is updated to include complexes that became available since the previous release, and provides 176 unbound–unbound cases that can be used for protein– protein docking method development and assessment.
Abstract: We updated our protein–protein docking benchmark to include complexes that became available since our previous release. As before, we only considered high-resolution complex structures that are nonredundant at the family–family pair level, for which the X-ray or NMR unbound structures of the constituent proteins are also available. Benchmark 4.0 adds 52 new complexes to the 124 cases of Benchmark 3.0, representing an increase of 42%. Thus, benchmark 4.0 provides 176 unbound–unbound cases that can be used for protein–protein docking method development and assessment. Seventeen of the newly added cases are enzyme-inhibitor complexes, and we found no new antigen-antibody complexes. Classifying the new cases according to expected difficulty for protein–protein docking algorithms gives 33 rigid body cases, 11 cases of medium difficulty, and 8 cases that are difficult. Benchmark 4.0 listings and processed structure files are publicly accessible at http://zlab.umassmed.edu/benchmark/ Proteins 2010. © 2010 Wiley-Liss, Inc.

478 citations


Journal ArticleDOI
01 May 2010-Proteins
TL;DR: This review examines the dynamical proposal in a critical way, considering basically all reasonable definitions, including (but not limited to) such proposed effects as “coupling between conformational and chemical motions,” “landscape searches” and “entropy funnels.”
Abstract: Enzymes play a key role in almost all biological processes, accelerating a variety of metabolic reactions as well as controlling energy transduction, the transcription, and translation of genetic information, and signaling. They possess the remarkable capacity to accelerate reactions by many orders of magnitude compared to their uncatalyzed counterparts, making feasible crucial processes that would otherwise not occur on biologically relevant timescales. Thus, there is broad interest in understanding the catalytic power of enzymes on a molecular level. Several proposals have been put forward to try to explain this phenomenon, and one that has rapidly gained momentum in recent years is the idea that enzyme dynamics somehow contributes to catalysis. This review examines the dynamical proposal in a critical way, considering basically all reasonable definitions, including (but not limited to) such proposed effects as "coupling between conformational and chemical motions," "landscape searches" and "entropy funnels." It is shown that none of these proposed effects have been experimentally demonstrated to contribute to catalysis, nor are they supported by consistent theoretical studies. On the other hand, it is clarified that careful simulation studies have excluded most (if not all) dynamical proposals. This review places significant emphasis on clarifying the role of logical definitions of different catalytic proposals, and on the need for a clear formulation in terms of the assumed potential surface and reaction coordinate. Finally, it is pointed out that electrostatic preorganization actually accounts for the observed catalytic effects of enzymes, through the corresponding changes in the activation free energies.

428 citations


Journal ArticleDOI
01 Jul 2010-Proteins
TL;DR: Rosetta FlexPepDock is presented, a novel tool for refining coarse peptide–protein models that allows significant changes in both peptide backbone and side chains and is expected to have significant impact on structure‐based functional characterization, controlled manipulation of peptide interactions, and on peptide‐based drug design.
Abstract: A wide range of regulatory processes in the cell are mediated by flexible peptides that fold upon binding to globular proteins. Computational efforts to model these interactions are hindered by the large number of rotatable bonds in flexible peptides relative to typical ligand molecules, and the fact that different peptides assume different backbone conformations within the same binding site. In this study, we present Rosetta FlexPepDock, a novel tool for refining coarse peptide-protein models that allows significant changes in both peptide backbone and side chains. We obtain high resolution models, often of sub-angstrom backbone quality, over an extensive and general benchmark that is based on a large nonredundant dataset of 89 peptide-protein interactions. Importantly, side chains of known binding motifs are modeled particularly well, typically with atomic accuracy. In addition, our protocol has improved modeling quality for the important application of cross docking to PDZ domains. We anticipate that the ability to create high resolution models for a wide range of peptide-protein complexes will have significant impact on structure-based functional characterization, controlled manipulation of peptide interactions, and on peptide-based drug design.

373 citations


Journal ArticleDOI
15 Nov 2010-Proteins
TL;DR: By selecting the top ranked models, the current protocol reliably generates high‐quality structures of protein–protein complexes from the structures of separately crystallized proteins, even in the absence of biological information, provided that there is limited backbone conformational change.
Abstract: Our approach to protein-protein docking includes three main steps. First, we run PIPER, a rigid body docking program based on the Fast Fourier Transform (FFT) correlation approach, extended to use pairwise interactions potentials. Second, the 1000 best energy conformations are clustered, and the 30 largest clusters are retained for refinement. Third, the stability of the clusters is analyzed by short Monte Carlo simulations, and the structures are refined by the medium-range optimization method SDU. The first two steps of this approach are implemented in the ClusPro 2.0 protein-protein docking server. Despite being fully automated, the last step is computationally too expensive to be included in the server. When comparing the models obtained in CAPRI rounds 13-19 by ClusPro, by the refinement of the ClusPro predictions and by all predictor groups, we arrived at three conclusions. First, for the first time in the CAPRI history, our automated ClusPro server was able to compete with the best human predictor groups. Second, selecting the top ranked models, our current protocol reliably generates high-quality structures of protein-protein complexes from the structures of separately crystallized proteins, even in the absence of biological information, provided that there is limited backbone conformational change. Third, despite occasional successes, homology modeling requires further improvement to achieve reliable docking results.

238 citations


Journal ArticleDOI
15 Nov 2010-Proteins
TL;DR: The ability of these algorithms to sample docking poses and to single out specific association modes in 14 targets, representing 11 distinct protein complexes, was evaluated, revealing that 67% of the groups, more than ever before, produced acceptable models or better for at least one target.
Abstract: Protein docking algorithms are assessed by evaluating blind predictions performed during 2007-2009 in Rounds 13-19 of the community-wide experiment on critical assessment of predicted interactions (CAPRI). We evaluated the ability of these algorithms to sample docking poses and to single out specific association modes in 14 targets, representing 11 distinct protein complexes. These complexes play important biological roles in RNA maturation, G-protein signal processing, and enzyme inhibition and function. One target involved protein-RNA interactions not previously considered in CAPRI, several others were hetero-oligomers, or featured multiple interfaces between the same protein pair. For most targets, predictions started from the experimentally determined structures of the free (unbound) components, or from models built from known structures of related or similar proteins. To succeed they therefore needed to account for conformational changes and model inaccuracies. In total, 64 groups and 12 web-servers submitted docking predictions of which 4420 were evaluated. Overall our assessment reveals that 67% of the groups, more than ever before, produced acceptable models or better for at least one target, with many groups submitting multiple high- and medium-accuracy models for two to six targets. Forty-one groups including four web-servers participated in the scoring experiment with 1296 evaluated models. Scoring predictions also show signs of progress evidenced from the large proportion of correct models submitted. But singling out the best models remains a challenge, which also adversely affects the ability to correctly rank docking models. With the increased interest in translating abstract protein interaction networks into realistic models of protein assemblies, the growing CAPRI community is actively developing more efficient and reliable docking and scoring methods for everyone to use. © 2010 Wiley-Liss, Inc.

226 citations


Journal ArticleDOI
01 Sep 2010-Proteins
TL;DR: The findings rationalize the efforts of correlating the pH of maximal stability and the characteristic pH of subcellular compartments, as only pH of activity is subject of evolutionary pressure.
Abstract: Biological macromolecules evolved to perform their function in specific cellular environment (subcellular compartments or tissues); therefore, they should be adapted to the biophysical characteristics of the corresponding environment, one of them being the characteristic pH. Many macromolecular properties are pH dependent, such as activity and stability. However, only activity is biologically important, while stability may not be crucial for the corresponding reaction. Here we show that the pH-optimum of activity (the pH of maximal activity) is correlated with the pHoptimum of stability (the pH of maximal stability) on a set of 310 proteins with available experimental data. We speculate that such a correlation is needed to allow the corresponding macromolecules to tolerate small pH fluctuations that are inevitable with cellular function. Our findings rationalize the efforts of correlating the pH of maximal stability and the characteristic pH of subcellular compartments, since only pH of activity is subject of evolutionary pressure. In addition, our analysis confirmed the previous observation that pH-optimum of activity and stability are not correlated with the isoelectric point, pI, or with the optimal temperature.

218 citations


Journal ArticleDOI
15 May 2010-Proteins
TL;DR: A comparative study of the LRA, the LIE, the PDLD/S‐LRA/β, and the more widely used MM/PBSA and assess their abilities to estimate the absolute binding energies to offer an appealing option for the final stages of massive screening approaches.
Abstract: Calculating the absolute binding free energies is a challenging task. Reliable estimates of binding free energies should provide a guide for rational drug design. It should also provide us with deeper understanding of the correlation between protein structure and its function. Further applications may include identifying novel molecular scaffolds and optimizing lead compounds in computer-aided drug design. Available options to evaluate the absolute binding free energies range from the rigorous but expensive free energy perturbation to the microscopic linear response approximation (LRA/beta version) and related approaches including the linear interaction energy (LIE) to the more approximated and considerably faster scaled protein dipoles Langevin dipoles (PDLD/S-LRA version) as well as the less rigorous molecular mechanics Poisson-Boltzmann/surface area (MM/PBSA) and generalized born/surface area (MM/GBSA) to the less accurate scoring functions. There is a need for an assessment of the performance of different approaches in terms of computer time and reliability. We present a comparative study of the LRA/beta, the LIE, the PDLD/S-LRA/beta, and the more widely used MM/PBSA and assess their abilities to estimate the absolute binding energies. The LRA and LIE methods perform reasonably well but require specialized parameterization for the nonelectrostatic term. The PDLD/S-LRA/beta performs effectively without the need of reparameterization. Our assessment of the MM/PBSA is less optimistic. This approach appears to provide erroneous estimates of the absolute binding energies because of its incorrect entropies and the problematic treatment of electrostatic energies. Overall, the PDLD/S-LRA/beta appears to offer an appealing option for the final stages of massive screening approaches.

193 citations


Journal ArticleDOI
01 Apr 2010-Proteins
TL;DR: This study invented an efficient algorithm for calculating deep and shallow pockets simultaneously, using several different sizes of spherical probes, and implemented it as a new program, ghecom (grid‐based HECOMi finder), which had a higher performance of detecting binding pockets, than four other popular pocket‐finding programs proposed previously.
Abstract: Detection of pockets on protein surfaces is an important step toward finding the binding sites of small molecules. In a previous study, we defined a pocket as a space into which a small spherical probe can enter, but a large probe cannot. The radius of the large probes corresponds to the shallowness of pockets. We showed that each type of binding molecule has a characteristic shallowness distribution. In this study, we introduced fundamental changes to our previous algorithm by using a 3D grid representation of proteins and probes, and the theory of mathematical morphology. We invented an efficient algorithm for calculating deep and shallow pockets (multiscale pockets) simultaneously, using several different sizes of spherical probes (multiscale probes). We implemented our algorithm as a new program, ghecom (grid-based HECOMi finder). The statistics of calculated pockets for the structural dataset showed that our program had a higher performance of detecting binding pockets, than four other popular pocket-finding programs proposed previously. The ghecom also calculates the shallowness of binding ligands, R(inaccess) (minimum radius of inaccessible spherical probes) that can be obtained from the multiscale molecular volume. We showed that each part of the binding molecule had a bias toward a specific range of shallowness. These findings will be useful for predicting the types of molecules that will be most likely to bind putative binding pockets, as well as the configurations of binding molecules. The program ghecom is available through the Web server (http://biunit.naist.jp/ghecom).

181 citations


Journal ArticleDOI
01 May 2010-Proteins
TL;DR: The results show that the FiberDock method successfully models backbone movements that occur during molecular interactions and considerably improves the accuracy and the ranking of rigid‐docking models of protein–protein complexes.
Abstract: Upon binding, proteins undergo conformational changes. These changes often prevent rigid-body docking methods from predicting the 3D structure of a complex from the unbound conformations of its proteins. Handling protein backbone flexibility is a major challenge for docking methodologies, as backbone flexibility adds a huge number of degrees of freedom to the search space, and therefore considerably increases the running time of docking algorithms. Normal mode analysis permits description of protein flexibility as a linear combination of discrete movements (modes). Low-frequency modes usually describe the large-scale conformational changes of the protein. Therefore, many docking methods model backbone flexibility by using only few modes, which have the lowest frequencies. However, studies show that due to molecular interactions, many proteins also undergo local and small-scale conformational changes, which are described by high-frequency normal modes. Here we present a new method, FiberDock, for docking refinement which models backbone flexibility by an unlimited number of normal modes. The method iteratively minimizes the structure of the flexible protein along the most relevant modes. The relevance of a mode is calculated according to the correlation between the chemical forces, applied on each atom, and the translation vector of each atom, according to the normal mode. The results show that the method successfully models backbone movements that occur during molecular interactions and considerably improves the accuracy and the ranking of rigid-docking models of protein-protein complexes. A web server for the FiberDock method is available at: http://bioinfo3d.cs.tau.ac.il/FiberDock.

Journal ArticleDOI
01 Jun 2010-Proteins
TL;DR: The calculated free energy function exhibits remarkably good agreement with the experimental folding transition temperature, free energy, and specific heat changes, however, changes in enthalpy and entropy are significantly different than the experimental values.
Abstract: We study the unbiased folding/unfolding thermodynamics of the Trp-cage miniprotein using detailed molecular dynamics simulations of an all-atom model of the protein in explicit solvent using the Amberff99SB force field. Replica-exchange molecular dynamics simulations are used to sample the protein ensembles over a broad range of temperatures covering the folded and unfolded states at two densities. The obtained ensembles are shown to reach equilibrium in the 1 mus/replica timescale. The total simulation time used in the calculations exceeds 100 mus. Ensemble averages of the fraction folded, pressure, and energy differences between the folded and unfolded states as a function of temperature are used to model the free energy of the folding transition, DeltaG(P, T), over the whole region of temperatures and pressures sampled in the simulations. The DeltaG(P, T) diagram describes an ellipse over the range of temperatures and pressures sampled, predicting that the system can undergo pressure-induced unfolding and cold denaturation at low temperatures and high pressures, and unfolding at low pressures and high temperatures. The calculated free energy function exhibits remarkably good agreement with the experimental folding transition temperature (T(f) = 321 K), free energy, and specific heat changes. However, changes in enthalpy and entropy are significantly different than the experimental values. We speculate that these differences may be due to the simplicity of the semiempirical force field used in the simulations and that more elaborate force fields may be required to describe appropriately the thermodynamics of proteins.

Journal ArticleDOI
01 Apr 2010-Proteins
TL;DR: A new approach is presented for docking peptides into flexible receptors, using a new molecular dynamics‐based method, optimized potential molecular dynamics (OPMD), which uses soft‐core potentials for the protein–peptide interactions and applies a new optimization scheme to the soft‐ core potential.
Abstract: Molecular docking programs play an important role in drug development and many well-established methods exist. However, there are two situations for which the performance of most approaches is still not satisfactory, namely inclusion of receptor flexibility and docking of large, flexible ligands like peptides. In this publication a new approach is presented for docking peptides into flexible receptors. For this purpose a two step procedure was developed: first, the protein-peptide conformational space is scanned and approximate ligand poses are identified and second, the identified ligand poses are refined by a new molecular dynamics-based method, optimized potential molecular dynamics (OPMD). The OPMD approach uses soft-core potentials for the protein-peptide interactions and applies a new optimization scheme to the soft-core potential. Comparison with refinement results obtained by conventional molecular dynamics and a soft-core scaling approach shows significant improvements in the sampling capability for the OPMD method. Thus, the number of starting poses needed for successful refinement is much lower than for the other methods. The algorithm was evaluated on 15 protein-peptide complexes with 2-16mer peptides. Docking poses with peptide RMSD values <2.10 A from the equilibrated experimental structures were obtained in all cases. For four systems docking into the unbound receptor structures was performed, leading to peptide RMSD values <2.12 A. Using a specifically fitted scoring function in 11 of 15 cases the best scoring poses featured a peptide RMSD < or = 2.10 A.

Journal ArticleDOI
01 May 2010-Proteins
TL;DR: It is found that sequence similarity as quantified by environment specific substitution scores can be used to significantly improve prediction and it was found that whole structure quality did not affect the quality of loop predictions.
Abstract: Loops are the most variable regions of protein structure and are, in general, the least accurately predicted. Their prediction has been approached in two ways, ab initio and database search. In recent years, it has been thought that ab initio methods are more powerful. In light of the continued rapid expansion in the number of known protein structures, we have re-evaluated FREAD, a database search method and demonstrate that the power of database search methods may have been underestimated. We found that sequence similarity as quantified by environment specific substitution scores can be used to significantly improve prediction. In fact, FREAD performs appreciably better for an identifiable subset of loops (two thirds of shorter loops and half of the longer loops tested) than the ab initio methods of MODELLER, PLOP, and RAPPER. Within this subset, FREAD's predictive ability is length independent, in general, producing results within 2A RMSD, compared to an average of over 10A for loop length 20 for any of the other tested methods. We also benchmarked the prediction protocols on a set of 212 loops from the model structures in CASP 7 and 8. An extended version of FREAD is able to make predictions for 127 of these, it gives the best prediction of the methods tested in 61 of these cases. In examining FREAD's ability to predict in the model environment, we found that whole structure quality did not affect the quality of loop predictions.

Journal ArticleDOI
01 Jul 2010-Proteins
TL;DR: A system called 3DM is described that can automatically build an entire molecular class–specific information system (MCSIS) and implies that the availability of a large number of superfamily members with a known three‐dimensional structure is a requirement for 3DM to succeed well.
Abstract: Ten years of experience with molecular class-specific information systems (MCSIS) such as with the hand-curated G protein-coupled receptor database (GPCRDB) or the semiautomatically generated nuclear receptor database has made clear that a wide variety of questions can be answered when protein-related data from many different origins can be flexibly combined. MCSISes revolve around a multiple sequence alignment (MSA) that includes "all" available sequences from the entire superfamily, and it has been shown at many occasions that the quality of these alignments is the most crucial aspect of the MCSIS approach. We describe here a system called 3DM that can automatically build an entire MCSIS. 3DM bases the MSA on a multiple structure alignment, which implies that the availability of a large number of superfamily members with a known three-dimensional structure is a requirement for 3DM to succeed well. Thirteen MCSISes were constructed and placed on the Internet for examination. These systems have been instrumental in a large series of research projects related to enzyme activity or the understanding and engineering of specificity, protein stability engineering, DNA-diagnostics, drug design, and so forth.

Journal ArticleDOI
01 Aug 2010-Proteins
TL;DR: The principles identified here provide a framework for the design of de novo proteins that will exhibit tight heme ligand binding and for the identification of the function of structural genomic target proteins with heME ligands.
Abstract: The characteristics of heme prosthetic groups and their binding sites have been analyzed in detail in a data set of nonhomologous heme proteins. Variations in the shape, volume, and chemical composition of the binding site, in the mode of heme binding and in the number and nature of heme-protein interactions are found to result in significantly different heme environments in proteins with different functions in biology. Differences are also seen in the properties of the apo states of the proteins. The apo states of proteins that bind heme permanently in their functional form show some disorder, ranging from local unfolding in the heme binding pocket to complete unfolding to give a random coil. In contrast, proteins that bind heme transiently are fully folded in their apo and holo states, presumably allowing both apo and holo forms to remain biologically active resisting aggregation or proteolysis. The principles identified here provide a framework for the design of de novo proteins that will exhibit tight heme ligand binding and for the identification of the function of structural genomic target proteins with heme ligands.

Journal ArticleDOI
15 Nov 2010-Proteins
TL;DR: This study assesses on a large scale the possibility of deriving self‐inhibitory peptides from protein domains with globular architectures and provides an elaborate framework for the in silico selection of candidate inhibitory molecules for protein–protein interactions.
Abstract: In this study, we assess on a large scale the possibility of deriving self-inhibitory peptides from protein domains with globular architectures. Such inhibitory peptides would inhibit interactions of their origin domain by mimicking its mode of binding to cognate partners, and could serve as promising leads for rational design of inhibitory drugs. For our large-scale analysis, we analyzed short linear segments that were cut out of protein interfaces in silico in complex structures of protein-protein docking Benchmark 3.0 and CAPRI targets from rounds 1-19. Our results suggest that more than 50% of these globular interactions are dominated by one short linear segment at the domain interface, which provides more than half of the original interaction energy. Importantly, in many cases the derived peptides show strong energetic preference for their original binding mode independently of the context of their original domain, as we demonstrate by extensive computational peptide docking experiments. As an in depth case study, we computationally design a candidate peptide to inhibit the EphB4-EphrinB2 interaction based on a short peptide derived from the G-H loop in EphrinB2. Altogether, we provide an elaborate framework for the in silico selection of candidate inhibitory molecules for protein-protein interactions. Such candidate molecules can be readily subjected to wet-laboratory experiments and provide highly promising starting points for subsequent drug design.

Journal ArticleDOI
15 Nov 2010-Proteins
TL;DR: The analysis has demonstrated the importance of biological information gathering prior to docking, which significantly increased the docking success rate, and of the refinement and rescoring stage that significantly improved the ranking of the rigid docking solutions.
Abstract: The CAPRI experiment (Critical Assessment of Predicted Interactions) simulates realistic and diverse docking challenges, each case having specific properties that may be exploited by docking algorithms. Motivated by the different CAPRI challenges, we developed and implemented a comprehensive suite of docking algorithms. These were incorporated into a dynamic docking protocol, consisting of four main stages: (1) Biological and bioinformatics research aiming to predict the binding site residues, to define distance constraints between interface atoms and to analyze the flexibility of molecules; (2) Rigid or flexible docking, performed by the PatchDock or FlexDock method, which utilizes the information gathered in the previous step. Symmetric complexes are predicted by the SymmDock method; (3) Flexible refinement and re-ranking of the rigid docking solution candidates, performed by FiberDock; and finally, (4) clustering and filtering the results based on energy funnels. We analyzed the performance of our docking protocol on a large benchmark and on recent CAPRI targets. The analysis has demonstrated the importance of biological information gathering prior to docking, which significantly increased the docking success rate, and of the refinement and re-scoring stage that significantly improved the ranking of the rigid docking solutions. Our failures were mostly a result of mishandling backbone flexibility, inaccurate homology modeling, or incorrect biological assumptions. Most of the methods are available at http://bioinfo3d.cs.tau.ac.il/.

Journal ArticleDOI
01 Jan 2010-Proteins
TL;DR: Two top performing approaches are described, in which all‐atom models of the AA2AR were generated by homology modeling followed by ligand guided backbone ensemble receptor optimization (LiBERO), which suggest that despite certain inaccuracies, the optimized homology models can be useful in the drug discovery process.
Abstract: Proteins of the G-protein coupled receptor (GPCR) family present numerous attractive targets for rational drug design, but also a formidable challenge for identification and conformational modeling of their 3D structure. A recently performed assessment of blind predictions of adenosine A2a receptor (AA2AR) structure in complex with ZM241385 (ZMA) antagonist provided a first example of unbiased evaluation of the current modeling algorithms on a GPCR target with approximately 30% sequence identity to the closest structural template. Several of the 29 groups participating in this assessment exercise (Michino et al., doi: 10.1038/nrd2877) successfully predicted the overall position of the ligand ZMA in the AA2AR ligand binding pocket, however models from only three groups captured more than 40% the ligand-receptor contacts. Here we describe two of these top performing approaches, in which all-atom models of the AA2AR were generated by homology modeling followed by ligand guided backbone ensemble receptor optimization (LiBERO). The resulting AA2AR-ZMA models, along with the best models from other groups are assessed here for their vitual ligand screening (VLS) performance on a large set of GPCR ligands. We show that ligand guided optimization was critical for improvement of both ligand-receptor contacts and VLS performance as compared to the initial raw homology models. The best blindly predicted models performed on par with the crystal structure of AA2AR in selecting known antagonists from decoys, as well as from antagonists for other adenosine subtypes and AA2AR agonists. These results suggest that despite certain inaccuracies, the optimized homology models can be useful in the drug discovery process.

Journal ArticleDOI
01 Apr 2010-Proteins
TL;DR: This report summarizes the state‐of‐art knowledge about SQRs and highlights the questions that still remain unanswered and defines new structure‐based sequence fingerprints that support a subdivision of the SQR family into six groups.
Abstract: Sulfide:quinone oxidoreductases (SQR) are ubiquitous membrane-bound flavoproteins involved in sulfide detoxification, in sulfide-dependent energy conservation processes and potenatially in the homeostasis of the neurotransmitter sulfide. The first 2 structures of SQRs from the bacterium Aquifex aeolicus (Marcia et al., Proc Nad Acad Sci USA 2009; 106:96259630) and the archaeon Acidianus ambivalens (Brito et al., Biochemistry 2009; 48:5613-5622) were determined recently by Xray crystallography revealing unexpected differences in the active sites and in flavin adenine dinucleotide binding. Besides the reciprocal differences, they show a different conformation of the active site compared with another sulfide oxidizing enzyme, the flavocytochrome c-sulfide dehydrogenase (FCSD) from Allochromatium vinosum (protein data bank id: 1FCD). In addition to the new structural data, the number of available SQR-like protein sequences is continuously increasing (Pham et aL, Microbiology 2008; 154:3112-3121) and the SQR activity of new members of this protein family was recently proven too (Chan et al., J Bacteriol 2009; 191:1026-1034). In the light of the new data, here we revisit the previously proposed contradictory SQR classification and we define new structure-based sequence fingerprints that support a subdivision of the SQR family into six groups. Our report summarizes the state-of-art knowledge about SQRs and highlights the questions that still remain unanswered. Despite two decades of work already done on these enzymes, new and most exciting discoveries can be expected in the future.

Journal ArticleDOI
15 Aug 2010-Proteins
TL;DR: Structural models based on bioinformatics, site‐directed mutagenesis, domain swapping, enzyme inhibition, and spectroscopy are proposed that help explain the nature of diterpene cyclase structure, function, and evolution.
Abstract: The structures and mechanism of action of many terpene cyclases are known, but no structures of diterpene cyclases have yet been reported. Here, we propose structural models based on bioinformatics, site-directed mutagenesis, domain swapping, enzyme inhibition, and spectroscopy that help explain the nature of diterpene cyclase structure, function, and evolution. Bacterial diterpene cyclases contain approximately 20 alpha-helices and the same conserved "QW" and DxDD motifs as in triterpene cyclases, indicating the presence of a betagamma barrel structure. Plant diterpene cyclases have a similar catalytic motif and betagamma-domain structure together with a third, alpha-domain, forming an alphabetagamma structure, and in H(+)-initiated cyclases, there is an EDxxD-like Mg(2+)/diphosphate binding motif located in the gamma-domain. The results support a new view of terpene cyclase structure and function and suggest evolution from ancient (betagamma) bacterial triterpene cyclases to (betagamma) bacterial and thence to (alphabetagamma) plant diterpene cyclases.

Journal ArticleDOI
01 Nov 2010-Proteins
TL;DR: PRIME, an intermediate‐resolution protein model previously used in simulations of the aggregation of polyalanine and polyglutamine, is extended to the description of the geometry and energetics of peptides containing all 20 amino acid residues, called PRIME 20.
Abstract: We extend PRIME, an intermediate-resolution protein model previously used in simulations of the aggregation of polyalanine and polyglutamine, to the description of the geometry and energetics of peptides containing all twenty amino acid residues. The 20 amino acid side chains are classified into 14 groups according to their hydrophobicity, polarity, size, charge and potential for side chain hydrogen bonding. The parameters for extended PRIME, called PRIME 20, include hydrogen-bonding energies, side-chain interaction range and energy, and excluded volume. The parameters are obtained by applying a perceptron- learning algorithm and a modified stochastic learning algorithm that optimizes the energy gap between 711 known native states from the PDB and decoy structures generated by gapless threading. The number of independent pair-interaction parameters is chosen to be small enough to be physically meaningful yet large enough to give reasonably accurate results in discriminating decoys from native structures. The most physically meaningful results are obtained with 19 energy parameters.

Journal ArticleDOI
01 Dec 2010-Proteins
TL;DR: The question of universality of the reentrant condensation of proteins in solution induced by multivalent counterions is discussed, i.e., redissolution on adding further salts after phase separation, as recently discovered.
Abstract: The effective interactions and phase behavior of protein solutions under strong electrostatic coupling conditions are difficult to understand due to the complex charge pattern and irregular geometry of protein surfaces. This distinguishes them from related systems such as DNA or conventional colloids. In this work, we discuss the question of universality of the reentrant condensation (RC) of proteins in solution induced by multivalent counterions, i.e., redissolution on adding further salts after phase separation, as recently discovered (Zhang et al., Phys Rev Lett 2008; 101:148101). The discussion is based on a systematic investigation of five different proteins with different charge patterns and five different multivalent counterions. Zeta potential measurements confirm the effective charge inversion of proteins in the reentrant regime via binding of multivalent counterions, which is supported by Monte Carlo simulations. Charge inversion by trivalent cations requires an overall negative net charge of the protein. Statistical analysis of a representative set of protein sequences reveals that, in theory, this effect could be possible for about half of all proteins. Our results can be exploited for the control of the phase behavior of proteins, in particular facilitating protein crystallization. Proteins 2010. © 2010 Wiley-Liss, Inc.

Journal ArticleDOI
01 Jan 2010-Proteins
TL;DR: It is shown that disordered regions frequently appear to be independent functional units, and judged by complete association to certain protein domains, may be evolutionarily conserved.
Abstract: Predicting regions of disorder has become of increasing interest when determining protein structure and function. With approximately 33% of eukaryotic proteins having significant disordered regions, and an increasing occurrence of disorder in higher organisms, an analysis of the importance of disorder from an evolutionary perspective was clearly warranted. Focusing on the human proteome, we have studied how abundant disorder is and its relevance to protein function and structure. We have shown that disordered regions frequently appear to be independent functional units, and judged by complete association to certain protein domains, may be evolutionarily conserved. Our work also supports previous analyses on association between disorder and alternate splicing and provides support for the modularity of disorder by showing that with respect to splicing events, disordered regions frequently appear to be spliced as whole units.

Journal ArticleDOI
01 Oct 2010-Proteins
TL;DR: The conventional two‐state folding model breaks down when there are DMG intermediates, a realization that has major implications for future experimental work on the mechanism of protein folding.
Abstract: New experimental results show that either gain or loss of close packing can be observed as a discrete step in protein folding or unfolding reactions. This finding poses a significant challenge to the conventional two-state model of protein folding. Results of interest involve dry molten globule (DMG) intermediates, an expanded form of the protein that lacks appreciable solvent. When an unfolding protein expands to the DMG state, side chains unlock and gain conformational entropy, while liquid-like van der Waals interactions persist. Four unrelated proteins are now known to form DMGs as the first step of unfolding, suggesting that such an intermediate may well be commonplace in both folding and unfolding. Data from the literature show that peptide amide protons are protected in the DMG, indicating that backbone structure is intact despite loss of side-chain close packing. Other complementary evidence shows that secondary structure formation provides a major source of compaction during folding. In our model, the major free-energy barrier separating unfolded from native states usually occurs during the transition between the unfolded state and the DMG. The absence of close packing at this barrier provides an explanation for why phi-values, derived from a Bronsted-Leffler plot, depend primarily on structure at the mutational site and not on specific side-chain interactions. The conventional two-state folding model breaks down when there are DMG intermediates, a realization that has major implications for future experimental work on the mechanism of protein folding.

Journal ArticleDOI
01 Apr 2010-Proteins
TL;DR: The new coarse graining model PRIMO/PRIMONA for proteins and nucleic acids is proposed, which combines one to several heavy atoms into coarse‐grained sites that are chosen to allow an analytical, high‐resolution reconstruction of all‐atom models based on molecular bonding geometry constraints.
Abstract: The new coarse graining model PRIMO/PRIMONA for proteins and nucleic acids is proposed. This model combines one to several heavy atoms into coarse-grained sites that are chosen to allow an analytical, high-resolution reconstruction of all-atom models based on molecular bonding geometry constraints. The accuracy of proposed reconstruction method in terms of structure and energetics is tested and compared with other popular reconstruction methods for a variety of protein and nucleic acid test sets.

Journal ArticleDOI
01 Dec 2010-Proteins
TL;DR: This article develops a novel method for protein loop modeling, where the loop conformations are generated by fragment assembly and analytical loop closure, and derives an analytic formula for the gradient of any analytical function of dihedral angles in the space of closed loops.
Abstract: Protein loops are often involved in important biological functions such as molecular recognition, signal transduction, or enzymatic action. The three dimensional structures of loops can provide essential information for understanding molecular mechanisms behind protein functions. In this article, we develop a novel method for protein loop modeling, where the loop conformations are generated by fragment assembly and analytical loop closure. The fragment assembly method reduces the conformational space drastically, and the analytical loop closure method finds the geometrically consistent loop conformations efficiently. We also derive an analytic formula for the gradient of any analytical function of dihedral angles in the space of closed loops. The gradient can be used to optimize various restraints derived from experiments or databases, for example restraints for preferential interactions between specific residues or for preferred backbone angles. We demonstrate that the current loop modeling method outperforms previous methods that employ residue-based torsion angle maps or different loop closure strategies when tested on two sets of loop targets of lengths ranging from 4 to 12.

Journal ArticleDOI
01 Jan 2010-Proteins
TL;DR: In this article, an ad hoc algorithm called OPRA (Optimal Protein-RNA Area) was proposed to predict RNA-binding areas on proteins, based on the most updated available set of nonredundant X-ray structures of protein-RNA complexes.
Abstract: Protein-RNA interactions are essential in living organisms and they are involved in very different and important cellular processes. Thus, understanding protein-RNA recognition at molecular level is a key goal not only from a basic biological point of view but also for biotechnological and therapeutic purposes. On basis of the most updated available set of nonredundant X-ray structures of protein-RNA complexes, we have computed protein-RNA interface propensities for ribonucleotides and aminoacid residues. The results show several protein residues with high tendency to bind RNA, such as arginine, lysine, and histidine. However, we could not observe any clear preferences for protein binding among the different ribonucleotides. We applied these propensity values to predict RNA-binding areas on proteins, using an ad hoc algorithm called OPRA (Optimal Protein-RNA Area). First, for each protein residue, we derived a predictive score from its corresponding protein-RNA interface propensity weighed by its accessible surface area (ASA). Then, optimal patch energy scores were computed for each residue by adding up the individual scores of the neighboring surface residues. The resulting patch scores correlate well with the known RNA-binding sites on protein surfaces. The OPRA method has been benchmarked on a test set of 30 unbound proteins involved in protein-RNA complexes of known structure, where it is able to successfully predict RNA-binding sites on protein surfaces with around 80% positive predictive value. This can be useful for identifying potential RNA-binding sites on proteins, and can help to model protein-RNA interactions of biological and therapeutic interest. Proteins 2010. © 2009 Wiley-Liss, Inc.

Journal ArticleDOI
15 Nov 2010-Proteins
TL;DR: This analysis proves that docking methods are much more successful in identifying interfaces than in predicting complexes, and suggests that these methods have an excellent potential of addressing the interface prediction challenge.
Abstract: Reliable prediction of the amino acid residues involved in protein-protein interfaces can provide valuable insight into protein function, and inform mutagenesis studies, and drug design applications. A fast-growing number of methods are being proposed for predicting protein interfaces, using structural information, energetic criteria, or sequence conservation or by integrating multiple criteria and approaches. Overall however, their performance remains limited, especially when applied to nonobligate protein complexes, where the individual components are also stable on their own. Here, we evaluate interface predictions derived from protein-protein docking calculations. To this end we measure the overlap between the interfaces in models of protein complexes submitted by 76 participants in CAPRI (Critical Assessment of Predicted Interactions) and those of 46 observed interfaces in 20 CAPRI targets corresponding to nonobligate complexes. Our evaluation considers multiple models for each target interface, submitted by different participants, using a variety of docking methods. Although this results in a substantial variability in the prediction performance across participants and targets, clear trends emerge. Docking methods that perform best in our evaluation predict interfaces with average recall and precision levels of about 60%, for a small majority (60%) of the analyzed interfaces. These levels are significantly higher than those obtained for nonobligate complexes by most extant interface prediction methods. We find furthermore that a sizable fraction (24%) of the interfaces in models ranked as incorrect in the CAPRI assessment are actually correctly predicted (recall and precision ≥50%), and that these models contribute to 70% of the correct docking-based interface predictions overall. Our analysis proves that docking methods are much more successful in identifying interfaces than in predicting complexes, and suggests that these methods have an excellent potential of addressing the interface prediction challenge.

Journal ArticleDOI
01 Nov 2010-Proteins
TL;DR: The generated pathways are not minimum free energy pathways, but they are geometrically plausible pathways that maintain good covalent bond distances and angles, keep backbone dihedral angles in allowed Ramachandran regions, avoid eclipsed side‐chain torsion angles, avoid non‐bonded overlap, and maintain a set of hydrogen bonds and hydrophobic contacts.
Abstract: We describe a new method for rapidly generating stereochemically acceptable pathways in proteins. The method, called geometric targeting, is publicly available at the webserver http://pathways.asu.edu, and includes tools for visualization of the pathway and creating movie files for use in presentations. The user submits an initial structure and a target structure, and a pathway between the two input states is generated automatically. Besides visualization, the structural quality of the pathways makes them useful as input pathways into pathway refinement techniques and further computations. The approach in geometric targeting is to gradually change the system's RMSD relative to the target structure while enforcing a set of geometric constraints. The generated pathways are not minimum free energy pathways, but they are geometrically plausible pathways that maintain good covalent bond distances and angles, keep backbone dihedral angles in allowed Ramachandran regions, avoid eclipsed side-chain torsion angles, avoid non-bonded overlap, and maintain a set of hydrogen bonds and hydrophobic contacts. Resulting pathways for over 20 proteins featuring a wide variety of conformational changes are reported here, including the very large GroEL complex.