scispace - formally typeset
Search or ask a question

Showing papers in "Proteins in 2012"


Journal ArticleDOI
01 Jul 2012-Proteins
TL;DR: A novel program, QUARK, for template‐free protein structure prediction, which can successfully construct 3D models of correct folds in one‐third cases of short proteins up to 100 residues and outperformed the second and third best servers based on the cumulative Z‐score of global distance test‐total scores in the FM category.
Abstract: Ab initio protein folding is one of the major unsolved problems in computational biology due to the difficulties in force field design and conformational search. We developed a novel program, QUARK, for template-free protein structure prediction. Query sequences are first broken into fragments of 1–20 residues where multiple fragment structures are retrieved at each position from unrelated experimental structures. Full-length structure models are then assembled from fragments using replica-exchange Monte Carlo simulations, which are guided by a composite knowledge-based force field. A number of novel energy terms and Monte Carlo movements are introduced and the particular contributions to enhancing the efficiency of both force field and search engine are analyzed in detail. QUARK prediction procedure is depicted and tested on the structure modeling of 145 non-homologous proteins. Although no global templates are used and all fragments from experimental structures with template modeling score (TM-score) >0.5 are excluded, QUARK can successfully construct 3D models of correct folds in 1/3 cases of short proteins up to 100 residues. In the ninth community-wide Critical Assessment of protein Structure Prediction (CASP9) experiment, QUARK server outperformed the second and third best servers by 18% and 47% based on the cumulative Z-score of global distance test-total (GDT-TS) scores in the free modeling (FM) category. Although ab initio protein folding remains a significant challenge, these data demonstrate new progress towards the solution of the most important problem in the field.

844 citations


Journal ArticleDOI
01 Feb 2012-Proteins
TL;DR: This study used DichroCalc to calculate the theoretical CD spectra of a nonredundant set of structures representing most proteins in the PDB, and applied a straightforward approach for predicting protein secondary structure content using these theoreticalCD spectra as reference set.
Abstract: Circular dichroism (CD) is a spectroscopic technique commonly used to investigate the structure of proteins. Major secondary structure types, alpha-helices and beta-strands, produce distinctive CD spectra. Thus, by comparing the CD spectrum of a protein of interest to a reference set consisting of CD spectra of proteins of known structure, predictive methods can estimate the secondary structure of the protein. Currently available methods, including K2D2, use such experimental CD reference sets, which are very small in size when compared to the number of tertiary structures available in the Protein Data Bank (PDB). Conversely, given a PDB structure, it is possible to predict a theoretical CD spectrum from it. The methodological framework for this calculation was established long ago but only recently a convenient implementation called DichroCalc has been developed. In this study, we set to determine whether theoretically derived spectra could be used as reference set for accurate CD based predictions of secondary structure. We used DichroCalc to calculate the theoretical CD spectra of a nonredundant set of structures representing most proteins in the PDB, and applied a straightforward approach for predicting protein secondary structure content using these theoretical CD spectra as reference set. We show that this method improves the predictions, particularly for the wavelength interval between 200 and 240 nm and for beta-strand content. We have implemented this method, called K2D3, in a publicly accessible web server at http://www. ogic.ca/projects/k2d3.

658 citations


Journal ArticleDOI
01 Aug 2012-Proteins
TL;DR: In MD simulations of 24 proteins chosen from the refinement category of recent Critical Assessment of Structure Prediction (CASP) experiments, it is found that in most cases, simulations initiated from homology models drift away from the native structure, suggesting that force field accuracy is the primary factor limiting MD‐based refinement.
Abstract: Accurate computational prediction of protein structure represents a longstanding challenge in molecular biology and structure-based drug design. Although homology modeling techniques are widely used to produce low-resolution models, refining these models to high resolution has proven difficult. With long enough simulations and sufficiently accurate force fields, molecular dynamics (MD) simulations should in principle allow such refinement, but efforts to refine homology models using MD have for the most part yielded disappointing results. It has thus far been unclear whether MD-based refinement is limited primarily by accessible simulation timescales, force field accuracy, or both. Here, we examine MD as a technique for homology model refinement using all-atom simulations, each at least 100 μs long-more than 100 times longer than previous refinement simulations-and a physics-based force field that was recently shown to successfully fold a structurally diverse set of fast-folding proteins. In MD simulations of 24 proteins chosen from the refinement category of recent Critical Assessment of Structure Prediction (CASP) experiments, we find that in most cases, simulations initiated from homology models drift away from the native structure. Comparison with simulations initiated from the native structure suggests that force field accuracy is the primary factor limiting MD-based refinement. This problem can be mitigated to some extent by restricting sampling to the neighborhood of the initial model, leading to structural improvement that, while limited, is roughly comparable to the leading alternative methods.

232 citations


Journal ArticleDOI
01 Aug 2012-Proteins
TL;DR: A Monte Carlo‐based computational method was developed with the aim to identify and optimize potential peptide hits from the E proteins, which have potential to disrupt the protein–protein interaction in the fusion process and may serve as starting points for the development of novel inhibitors for viral E proteins.
Abstract: Fusion process is known to be the initial step of viral infection and hence targeting the entry process is a promising strategy to design antiviral therapy. The self-inhibitory peptides derived from the enveloped (E) proteins function to inhibit the protein-protein interactions in the membrane fusion step mediated by the viral E protein. Thus, they have the potential to be developed into effective antiviral therapy. Herein, we have developed a Monte Carlo-based computational method with the aim to identify and optimize potential peptide hits from the E proteins. The stability of the peptides, which indicates their potential to bind in situ to the E proteins, was evaluated by two different scoring functions, dipolar distance-scaled, finite, ideal-gas reference state and residue-specific all-atom probability discriminatory function. The method was applied to α-helical Class I HIV-1 gp41, β-sheet Class II Dengue virus (DENV) type 2 E proteins, as well as Class III Herpes Simplex virus-1 (HSV-1) glycoprotein, a E protein with a mixture of α-helix and β-sheet structural fold. The peptide hits identified are in line with the druggable regions where the self-inhibitory peptide inhibitors for the three classes of viral fusion proteins were derived. Several novel peptides were identified from either the hydrophobic regions or the functionally important regions on Class II DENV-2 E protein and Class III HSV-1 gB. They have potential to disrupt the protein-protein interaction in the fusion process and may serve as starting points for the development of novel inhibitors for viral E proteins.

137 citations


Journal ArticleDOI
01 Mar 2012-Proteins
TL;DR: This work characterize the FKBP12 protein and shows that water molecules observed in crystal structures are less stable on average than bulk water as a consequence of the high degree of spatial localization, thereby resulting in a significant loss in entropy.
Abstract: Water plays an essential role in determining the structure and function of all biological systems. Recent methodological advances allow for an accurate and efficient estimation of the thermodynamic properties of water molecules at the surface of proteins. In this work, we characterize these thermodynamic properties and relate them to various structural and functional characteristics of the protein. We find that high-energy hydration sites often exist near protein motifs typically characterized as hydrophilic, such as backbone amide groups. We also find that waters around alpha helices and beta sheets tend to be less stable than waters around loops. Furthermore, we find no significant correlation between the hydration site-free energy and the solvent accessible surface area of the site. In addition, we find that the distribution of high-energy hydration sites on the protein surface can be used to identify the location of binding sites and that binding sites of druggable targets tend to have a greater density of thermodynamically unstable hydration sites. Using this information, we characterize the FKBP12 protein and show good agreement between fragment screening hit rates from NMR spectroscopy and hydration site energetics. Finally, we show that water molecules observed in crystal structures are less stable on average than bulk water as a consequence of the high degree of spatial localization, thereby resulting in a significant loss in entropy. These findings should help to better understand the characteristics of waters at the surface of proteins and are expected to lead to insights that can guide structure-based drug design efforts. Proteins 2011. © 2012 Wiley Periodicals, Inc.

133 citations


Journal ArticleDOI
01 Feb 2012-Proteins
TL;DR: In this paper, the ENSEMBLE methodology has been used to analyze the structural properties of ensembles with different numbers of conformers, and to make recommendations about the experimental measurements that should be made for optimal ensemble modeling.
Abstract: Disordered states of proteins include the biologically functional intrinsically disordered proteins and the unfolded states of normally folded proteins. In recent years, ensemble-modeling strategies using various experimental measurements as restraints have emerged as powerful means for structurally characterizing disordered states. However, these methods are still in their infancy compared with the structural determination of folded proteins. Here, we have addressed several issues important to ensemble modeling using our ENSEMBLE methodology. First, we assessed how calculating ensembles containing different numbers of conformers affects their structural properties. We find that larger ensembles have very similar properties to smaller ensembles fit to the same experimental restraints, thus allowing a considerable speed improvement in our calculations. In addition, we analyzed the contributions of different experimental restraints to the structural properties of calculated ensembles, enabling us to make recommendations about the experimental measurements that should be made for optimal ensemble modeling. The effects of different restraints, most significantly from chemical shifts, paramagnetic relaxation enhancements and small-angle X-ray scattering, but also from other data, underscore the importance of utilizing multiple sources of experimental data. Finally, we validate our ENSEMBLE methodology using both cross-validation and synthetic experimental restraints calculated from simulated ensembles. Our results suggest that secondary structure and molecular size distribution can generally be modeled very accurately, whereas the accuracy of calculated tertiary structure is dependent on the number of distance restraints used.

119 citations


Journal ArticleDOI
01 Jul 2012-Proteins
TL;DR: A simple similarity measure for structural clustering based on atomic contacts—the fraction of common contacts—is presented and it is shown that this method produces very compact clusters in remarkably short time when applied to a collection of binary and multicomponent protein–protein and protein–DNA complexes.
Abstract: Inaccuracies in computational molecular modeling methods are often counterweighed by brute-force generation of a plethora of putative solutions. These are then typically sieved via structural clustering based on similarity measures such as the root mean square deviation (RMSD) of atomic positions. Albeit widely used, these measures suffer from several theoretical and technical limitations (e.g., choice of regions for fitting) that impair their application in multicomponent systems (N > 2), large-scale studies (e.g., interactomes), and other time-critical scenarios. We present here a simple similarity measure for structural clustering based on atomic contacts--the fraction of common contacts--and compare it with the most used similarity measure of the protein docking community--interface backbone RMSD. We show that this method produces very compact clusters in remarkably short time when applied to a collection of binary and multicomponent protein-protein and protein-DNA complexes. Furthermore, it allows easy clustering of similar conformations of multicomponent symmetrical assemblies in which chain permutations can occur. Simple contact-based metrics should be applicable to other structural biology clustering problems, in particular for time-critical or large-scale endeavors.

107 citations


Journal ArticleDOI
01 Aug 2012-Proteins
TL;DR: A new score called SP‐score is developed that fixes the cutoff distance at 4 Å and removed the size dependence using a normalization prefactor and is expected to be useful for function prediction and comparing structures with or without domains defined.
Abstract: A structure alignment program aligns two structures by optimizing a scoring function that measures structural similarity. It is highly desirable that such scoring function is independent of the sizes of proteins in comparison so that the significance of alignment across different sizes of the protein regions aligned is comparable. Here, we developed a new score called SP-score that fixes the cutoff distance at 4 A and removed the size dependence using a normalization prefactor. We further built a program called SPalign that optimizes SP-score for structure alignment. SPalign was applied to recognize proteins within the same structure fold and having the same function of DNA or RNA binding. For fold discrimination, SPalign improves sensitivity over TMalign for the chain-level comparison by 12% and over DALI for the domain-level comparison by 13% at the same specificity of 99.6%. The difference between TMalign and SPalign at the chain level is due to the inability of TMalign to detect single domain similarity between multidomain proteins. For recognizing nucleic acid binding proteins, SPalign consistently improves over TMalign by 12% and DALI by 31% in average value of Mathews correlation coefficients for four datasets. SPalign with default setting is 14% faster than TMalign. SPalign is expected to be useful for function prediction and comparing structures with or without domains defined. The source code for SPalign and the server are available at http://sparks.informatics.iupui.edu.

82 citations


Journal ArticleDOI
01 Jan 2012-Proteins
TL;DR: BSP‐SLIM was tested on 71 ligand–protein complexes from the Astex diverse set and was used in virtual screening for six target proteins, which prioritized actives of 25% and 50% in the top 9% and 17% of the library on average, respectively, demonstrating the usefulness of the template‐based coarse‐grained algorithms in the low‐resolution ligand‐protein docking and drug‐screening.
Abstract: We developed BSP-SLIM, a new method for ligand-protein blind docking using low-resolution protein structures. For a given sequence, protein structures are first predicted by I-TASSER; putative ligand binding sites are transferred from holo-template structures which are analogous to the I-TASSER models; ligand-protein docking conformations are then constructed by shape and chemical match of ligand with the negative image of binding pockets. BSP-SLIM was tested on 71 ligand-protein complexes from the Astex diverse set where the protein structures were predicted by I-TASSER with an average RMSD 2.92 A on the binding residues. Using I-TASSER models, the median ligand RMSD of BSP-SLIM docking is 3.99 A which is 5.94 A lower than that by AutoDock; the median binding-site error by BSP-SLIM is 1.77 A which is 6.23 A lower than that by AutoDock and 3.43 A lower than that by LIGSITE(CSC) . Compared to the models using crystal protein structures, the median ligand RMSD by BSP-SLIM using I-TASSER models increases by 0.87 A, while that by AutoDock increases by 8.41 A; the median binding-site error by BSP-SLIM increase by 0.69A while that by AutoDock and LIGSITE(CSC) increases by 7.31 A and 1.41 A, respectively. As case studies, BSP-SLIM was used in virtual screening for six target proteins, which prioritized actives of 25% and 50% in the top 9.2% and 17% of the library on average, respectively. These results demonstrate the usefulness of the template-based coarse-grained algorithms in the low-resolution ligand-protein docking and drug-screening. An on-line BSP-SLIM server is freely available at http://zhanglab.ccmb.med.umich.edu/BSP-SLIM.

80 citations


Journal ArticleDOI
01 Oct 2012-Proteins
TL;DR: A multibaric‐multithermal molecular dynamics (MD) simulation of a 10‐residue protein, chignolin, was performed and typical local‐minimum free‐energy conformations, folding and unfolding pathways were revealed.
Abstract: A multibaric-multithermal molecular dynamics (MD) simulation of a 10-residue protein, chignolin, was performed. All-atom model with the Amber parm99SB force field was used for the protein and the TIP3P model was used for the explicit water molecules. This MD simulation covered wide ranges of temperature between 260 and 560 K and pressure between 0.1 and 600 MPa and sampled many conformations without getting trapped in local-minimum free-energy states. Folding events to the native β-hairpin structure occurred five times and unfolding events were observed four times. As the temperature and/or pressure increases, fraction of folded chignolin decreases. The partial molar enthalpy change ΔH and partial molar volume change ΔV of unfolding were calculated as ΔH = 24.1 ± 4.9 kJ/mol and ΔV = -5.6 ± 1.5 cm(3)/mol, respectively. These values agree well with recent experimental results. Illustrating typical local-minimum free-energy conformations, folding and unfolding pathways were revealed. When chignolin unfolds from the β-hairpin structure, only the C terminus or both C and N termini open first. It may undergo an α-helix or 3(10)-helix structure and finally unfolds to the extended structure. Difference of the mechanism between temperature denaturation and pressure denaturation is also discussed. Temperature denaturation is caused by making the protein transferred to a higher entropy state and making it move around more with larger space. The reason for pressure denaturation is that water molecules approach the hydrophobic residues, which are not well hydrated at the folded state, and some hydrophobic contacts are broken.

80 citations


Journal ArticleDOI
01 Mar 2012-Proteins
TL;DR: The improved sampling of RASREC is essential in obtaining accurate structures over a benchmark set of 11 proteins in the 15‐25 kDa size range using chemical shifts, backbone RDCs and HN‐HN NOE data; in a number of cases the improved sampling methodology makes a larger contribution than incorporation of additional experimental data.
Abstract: Recent work has shown that NMR structures can be determined by integrating sparse NMR data with structure prediction methods such as Rosetta. The experimental data serve to guide the search for the lowest energy state towards the deep minimum at the native state which is frequently missed in Rosetta de novo structure calculations. However, as the protein size increases, sampling again becomes limiting; for example, the standard Rosetta protocol involving Monte Carlo fragment insertion starting from an extended chain fails to converge for proteins over 150 amino acids even with guidance from chemical shifts (CS-Rosetta) and other NMR data. The primary limitation of this protocol--that every folding trajectory is completely independent of every other--was recently overcome with the development of a new approach involving resolution-adapted structural recombination (RASREC). Here we describe the RASREC approach in detail and compare it to standard CS-Rosetta. We show that the improved sampling of RASREC is essential in obtaining accurate structures over a benchmark set of 11 proteins in the 15-25 kDa size range using chemical shifts, backbone RDCs and HN-HN NOE data; in a number of cases the improved sampling methodology makes a larger contribution than incorporation of additional experimental data. Experimental data are invaluable for guiding sampling to the vicinity of the global energy minimum, but for larger proteins, the standard Rosetta fold-from-extended-chain protocol does not converge on the native minimum even with experimental data and the more powerful RASREC approach is necessary to converge to accurate solutions.

Journal ArticleDOI
01 May 2012-Proteins
TL;DR: The predictions of ligand‐binding affinities from several methods based on end‐point molecular dynamics simulations and continuum solvation, that is, methods related to MM/PBSA and Poisson–Boltzmann and surface area solvation are compared.
Abstract: We have compared the predictions of ligand-binding affinities from several methods based on end-point molecular dynamics simulations and continuum solvation, i.e. methods related to MM/PBSA (molecular mechanics combined with Poisson-Boltzmann and surface area solvation). Two continuum-solvation models were considered, viz. the Poisson-Boltzmann (PB) and generalised Born (GB) approaches. The non-electrostatic energies were also obtained in two different ways, viz. either from the sum of the bonded, van der Waals, non-polar solvation energies, and entropy terms (as in MM/PBSA), or from the scaled protein-ligand van der Waals interaction energy (as in the linear interaction energy approach, LIE). Three different approaches to calculate electrostatic energies were tested, viz. the sum of electrostatic interaction energies and polar solvation energies, obtained either from a single simulation of the complex or from three independent simulations of the complex, the free protein, and the free ligand, or the linear-response approximation (LRA). Moreover, we investigated the effect of scaling the electrostatic interactions by an effective internal dielectric constant of the protein (e(int) ). All these methods were tested on the binding of seven biotin analogues to avidin and nine 3-amidinobenzyl-1H-indole-2-carboxamide inhibitors to factor Xa. For avidin, the best results were obtained with a combination of the LIE non-electrostatic energies with the MM+GB electrostatic energies from a single simulation, using e(int) = 4. For fXa, standard MM/GBSA, based on one simulation and using e(int) = 4-10 gave the best result. The optimum internal dielectric constant seems to be slightly higher with PB than with GB solvation. Proteins 2012. © 2012 Wiley-Liss, Inc. (Less)

Journal ArticleDOI
01 Feb 2012-Proteins
TL;DR: The most salient result of the NMR analysis is that phosphorylation in the PRR stabilizes a short α‐helix that runs from pSer235 till the very beginning of the microtubule‐binding region (Tau[Thr245‐Ser324] or MTBR of TauF4).
Abstract: Phosphorylation of the neuronal Tau protein is implicated in both the regulation of its physiological function of microtubule stabilization and its pathological propensity to aggregate into the fibers that characterize Alzheimer's diseased neurons. However, how specific phosphorylation events influence both aspects of Tau biology remains largely unknown. In this study, we address the structural impact of phosphorylation of the Tau protein by Nuclear Magnetic Resonance (NMR) spectroscopy on a functional fragment of Tau (Tau[Ser208-Ser324] = TauF4). TauF4 was phosphorylated by the proline-directed CDK2/CycA3 kinase on Thr231 (generating the AT180 epitope), Ser235, and equally on Thr212 and Thr217 in the Proline-rich region (Tau[Ser208-Gln244] or PRR). These modifications strongly decrease the capacity of TauF4 to polymerize tubulin into microtubules. While all the NMR parameters are consistent with a globally disordered Tau protein fragment, local clusters of structuration can be defined. The most salient result of our NMR analysis is that phosphorylation in the PRR stabilizes a short α-helix that runs from pSer235 till the very beginning of the microtubule-binding region (Tau[Thr245-Ser324] or MTBR of TauF4). Phosphorylation of Thr231/Ser235 creates a N-cap with helix stabilizing role while phosphorylation of Thr212/Thr217 does not induce modification of the local transient secondary structure, showing that the stabilizing effect is sequence specific. Using paramagnetic relaxation experiments, we additionally show a transient interaction between the PRR and the MTBR, observed in both TauF4 and phospho-TauF4. Proteins 2011;. © 2011 Wiley Periodicals, Inc.

Journal ArticleDOI
01 Jul 2012-Proteins
TL;DR: A novel computational multiple protein docking algorithm, Multi‐LZerD, that builds models of multimeric complexes by effectively reusing pairwise docking predictions of component proteins, and was able to predict near‐native structures for multimerics complexes of various topologies.
Abstract: The tertiary structures of protein complexes provide a crucial insight about the molecular mechanisms that regulate their functions and assembly. However, solving protein complex structures by experimental methods is often more difficult than single protein structures. Here, we have developed a novel computational multiple protein docking algorithm, Multi-LZerD, that builds models of multimeric complexes by effectively reusing pairwise docking predictions of component proteins. A genetic algorithm is applied to explore the conformational space followed by a structure refinement procedure. Benchmark on eleven hetero-multimeric complexes resulted in near-native conformations for all but one of them (a root mean square deviation smaller than 2.5A). We also show that our method copes with unbound docking cases well, outperforming the methodology that can be directly compared with our approach. Multi-LZerD was able to predict near-native structures for multimeric complexes of various topologies.

Journal ArticleDOI
01 Jul 2012-Proteins
TL;DR: The method utilizes the z‐score value of the distance measure in the feature vector space to estimate the relative contribution among the k‐nearest neighbors for prediction of the discrete and continuous solvent accessibility.
Abstract: We present a method to predict the solvent accessibility of proteins which is based on a nearest neighbor method applied to the sequence profiles. Using the method, continuous real-value prediction as well as two-state and three-state discrete predictions can be obtained. The method utilizes the z-score value of the distance measure in the feature vector space to estimate the relative contribution among the k-nearest neighbors for prediction of the discrete and continuous solvent accessibility. The Solvent accessibility database is constructed from 5717 proteins extracted from PISCES culling server with the cutoff of 25% sequence identities. Using optimal parameters, the prediction accuracies (for discrete predictions) of 78.38% (two-state prediction with the threshold of 25%), 65.1% (three-state prediction with the thresholds of 9 and 36%), and the Pearson correlation coefficient (between the predicted and true RSA's for continuous prediction) of 0.676 are achieved An independent benchmark test was performed with the CASP8 targets where we find that the proposed method outperforms existing methods. The prediction accuracies are 80.89% (for two state prediction with the threshold of 25%), 67.58% (three-state prediction), and the Pearson correlation coefficient of 0.727 (for continuous prediction) with mean absolute error of 0.148. We have also investigated the effect of increasing database sizes on the prediction accuracy, where additional improvement in the accuracy is observed as the database size increases. The SANN web server is available at http://lee.kias.re.kr/∼newton/sann/.Proteins 2012; © 2012 Wiley Periodicals, Inc.

Journal ArticleDOI
01 Aug 2012-Proteins
TL;DR: A large‐scale test carried out in a blind fashion in CASP9 shows that ULR structures are improved over initial template‐based models by refinement in more than 70% of the successfully detected ULRs.
Abstract: Contemporary template-based modeling techniques allow applications of modeling methods to vast biological problems. However, they tend to fail to provide accurate structures for less-conserved local regions in sequence even when the overall structure can be modeled reliably. We call these regions unreliable local regions (ULRs). Accurate modeling of ULRs is of enormous value because they are frequently involved in functional specificity. In this article, we introduce a new method for modeling ULRs in template-based models by employing a sophisticated loop modeling technique. Combined with our previous study on protein termini, the method is applicable to refinement of both loop and terminus ULRs. A large-scale test carried out in a blind fashion in CASP9 (the 9th Critical Assessment of techniques for protein structure prediction) shows that ULR structures are improved over initial template-based models by refinement in more than 70% of the successfully detected ULRs. It is also notable that successful modeling of several long ULRs over 12 residues is achieved. Overall, the current results show that a careful application of loop and terminus modeling can be a promising tool for model refinement in template-based modeling.

Journal ArticleDOI
01 Mar 2012-Proteins
TL;DR: A new score term is described that explicitly detects and penalizes the formation of hydrophobic patches during computational protein design and is able to design protein surfaces that include hydrophilic amino acids at naturally occurring frequencies, but do not have large hydphobic patches.
Abstract: De novo protein design requires the identification of amino-acid sequences that favor the target-folded conformation and are soluble in water. One strategy for promoting solubility is to disallow hydrophobic residues on the protein surface during design. However, naturally occurring proteins often have hydrophobic amino acids on their surface that contribute to protein stability via the partial burial of hydrophobic surface area or play a key role in the formation of protein-protein interactions. A less restrictive approach for surface design that is used by the modeling program Rosetta is to parameterize the energy function so that the number of hydrophobic amino acids designed on the protein surface is similar to what is observed in naturally occurring monomeric proteins. Previous studies with Rosetta have shown that this limits surface hydrophobics to the naturally occurring frequency (∼28%), but that it does not prevent the formation of hydrophobic patches that are considerably larger than those observed in naturally occurring proteins. Here, we describe a new score term that explicitly detects and penalizes the formation of hydrophobic patches during computational protein design. With the new term, we are able to design protein surfaces that include hydrophobic amino acids at naturally occurring frequencies, but do not have large hydrophobic patches. By adjusting the strength of the new score term, the emphasis of surface redesigns can be switched between maintaining solubility and maximizing folding free energy.

Journal ArticleDOI
01 Apr 2012-Proteins
TL;DR: An alignment free local surface comparison method for predicting a ligand molecule which binds to a query protein, named Patch‐Surfer, which represents a binding pocket as a combination of segmented surface patches, each of which is characterized by its geometrical shape, the electrostatic potential, the hydrophobicity, and the concaveness.
Abstract: Functional elucidation of proteins is one of the essential tasks in biology. Function of a protein, specifically, small ligand molecules that bind to a protein, can be predicted by finding similar local surface regions in binding sites of known proteins. Here, we developed an alignment free local surface comparison method for predicting a ligand molecule which binds to a query protein. The algorithm, named Patch-Surfer, represents a binding pocket as a combination of segmented surface patches, each of which is characterized by its geometrical shape, the electrostatic potential, the hydrophobicity, and the concaveness. Representing a pocket by a set of patches is effective to absorb difference of global pocket shape while capturing local similarity of pockets. The shape and the physicochemical properties of surface patches are represented using the 3D Zernike descriptor, which is a series expansion of mathematical 3D function. Two pockets are compared using a modified weighted bipartite matching algorithm, which matches similar patches from the two pockets. Patch-Surfer was benchmarked on three datasets, which consist in total of 390 proteins that bind to one of 21 ligands. Patch-Surfer showed superior performance to existing methods including a global pocket comparison method, Pocket-Surfer, which we have previously introduced. Particularly, as intended, the accuracy showed large improvement for flexible ligand molecules, which bind to pockets in different conformations.

Journal ArticleDOI
01 Apr 2012-Proteins
TL;DR: The results indicate that the high amyloidogenicity of aromatic amino acids is a function of hydrophobicity, β‐sheet propensity, and planar geometry and not the ability to form stabilizing or directing π–π bonds.
Abstract: Aromatic amino acids strongly promote cross-β amyloid formation; whether the amyloidogenicity of aromatic residues is due to high hydrophobicity and β-sheet propensity or formation of stabilizing π-π interactions has been debated. To clarify the role of aromatic residues on amyloid formation, the islet amyloid polypeptide 20-29 fragment [IAPP(20-29)], which contains a single aromatic residue (Phe 23), was adopted as a model. The side chain of residue 23 does not self-associate in cross-β fibrils of IAPP(20-29) (Nielsen et al., Angew Chem Int Ed 2009;48:2118-2121), allowing investigation of the amyloidogenicity of aromatic amino acids in a context where direct π-π interactions do not occur. We prepared variants of IAPP(20-29) in which Tyr, Leu, Phe, pentafluorophenylalanine (F5-Phe), Trp, cyclohexylalanine (Cha), α-naphthylalanine (1-Nap), or β-naphthylalanine (2-Nap) (in order of increasing peptide hydrophobicity) were incorporated at position 23 (SNNXGAILSS-NH2), and the kinetic and thermodynamic effects of these mutations on cross-β self-assembly were assessed. The Tyr, Leu, and Trp 23 variants failed to readily self-assemble at concentrations up to 1.5 mM, while the Cha 23 mutant fibrillized with attenuated kinetics and similar thermodynamic stability relative to the wild-type Phe 23 peptide. Conversely, the F5-Phe, 1-Nap, and 2-Nap 23 variants self-assembled at enhanced rates, forming fibrils with greater thermodynamic stability than the wild-type peptide. These results indicate that the high amyloidogenicity of aromatic amino acids is a function of hydrophobicity, β-sheet propensity, and planar geometry and not the ability to form stabilizing or directing π-π bonds.

Journal ArticleDOI
01 Aug 2012-Proteins
TL;DR: A pose‐dependent NMA is presented, which avoids the need to sample multiple eigenvectors and it offers a promising alternative to combinatorial cross‐docking in protein docking calculations.
Abstract: Modeling conformational changes in protein docking calculations is challenging. To make the calculations tractable, most current docking algorithms typically treat proteins as rigid bodies and use soft scoring functions that implicitly accommodate some degree of flexibility. Alternatively, ensembles of structures generated from molecular dynamics (MD) may be cross-docked. However, such combinatorial approaches can produce many thousands or even millions of docking poses, and require fast and sensitive scoring functions to distinguish them. Here, we present a novel approach called "EigenHex," which is based on normal mode analyses (NMAs) of a simple elastic network model of protein flexibility. We initially assume that the proteins to be docked are rigid, and we begin by performing conventional soft docking using the Hex polar Fourier correlation algorithm. We then apply a pose-dependent NMA to each of the top 1000 rigid body docking solutions, and we sample and re-score multiple perturbed docking conformations generated from linear combinations of up to 20 eigenvectors using a multi-threaded particle swarm optimization algorithm. When applied to the 63 "rigid body" targets of the Protein Docking Benchmark version 2.0, our results show that sampling and re-scoring from just one to three eigenvectors gives a modest but consistent improvement for these targets. Thus, pose-dependent NMA avoids the need to sample multiple eigenvectors and it offers a promising alternative to combinatorial cross-docking.

Journal ArticleDOI
01 Nov 2012-Proteins
TL;DR: A “structural gate,” formed between helices 71 and 92 on the ribosomal large subunit, is described, which restricts tRNA motion and promotes proofreading of the codon–anticodon.
Abstract: The ribosome catalyzes peptidyl transfer reactions at the growing nascent polypeptide chain. Here, we present a structural mechanism for selecting cognate over near-cognate A/T transfer RNA (tRNA). In part, the structural basis for the fidelity of translation relies on accommodation to filter cognate from near-cognate tRNAs. To examine the assembly of tRNAs within the ribonucleic–riboprotein complex, we conducted a series of all-atom molecular dynamics (MD) simulations of the entire solvated 70S Escherichia coli ribosome, along with its associated cofactors, proteins, and messenger RNA (mRNA). We measured the motion of the A/T state of tRNA between initial binding and full accommodation. The mechanism of rejection was investigated. Using novel in-house algorithms, we determined trajectory pathways. Despite the large intersubunit cavity, the available space is limited by the presence of the tRNA, which is equally large. This article describes a “structural gate,” formed between helices 71 and 92 on the ribosomal large subunit, which restricts tRNA motion. The gate and the interacting protein, L14, of the 50S ribosome act as steric filters in two consecutive substeps during accommodation, each requiring: (1) sufficient energy contained in the hybrid tRNA kink and (2) sufficient energy in the Watson–Crick base pairing of the codon–anticodon. We show that these barriers act to filter out near-cognate tRNA and promote proofreading of the codon–anticodon. Since proofreading is essential for understanding the fidelity of translation, our model for the dynamics of this process has substantial biomedical implications. Proteins 2012. © 2012 Wiley Periodicals, Inc.

Journal ArticleDOI
01 Jan 2012-Proteins
TL;DR: A new machine learning approach to the problem where an energy function for each rotamer in a structure is computed additively over pairs of contacting atoms, with a significant improvement in speed and a modest, but statistically significant, improvement in accuracy.
Abstract: Accurate protein side-chain conformation prediction is crucial for protein modeling and existing methods for the task are widely used; however, faster and more accurate methods are still required. Here we present a new machine learning approach to the problem where an energy function for each rotamer in a structure is computed additively over pairs of contacting atoms. A family of 156 neural networks indexed by amino acid and contacting atom types is used to compute these rotamer energies as a function of atomic contact distances. Although direct energy targets are not available for training, the neural networks can still be optimized by converting the energies to probabilities and optimizing these probabilities using Markov Chain Monte Carlo methods. The resulting predictor SIDEpro makes predictions by initially setting the rotamer probabilities for each residue from a backbone dependent rotamer library, then iteratively updating these probabilities using the trained neural networks. After convergences of the probabilities, the side-chains are set to the highest probability rotamer. Finally, a post-processing clash reduction step is applied to the models. SIDEpro represents a significant improvement in speed and a modest, but statistically significant, improvement in accuracy when compared to the state-of-the-art for rapid side-chain prediction method SCWRL4 on the following datasets: (1) 379 protein test set of SCWRL4; (2) 94 proteins from CASP9; (3) a set of 7 large protein-only complexes; and (4) a ribosome with and without the RNA. Using the SCWRL4 test set, SIDEpro's accuracy (χ1 86.14%, χ1+2 74.15%) is slightly better than SCWRL4-FRM (χ1 85.43%, χ1+2 73.47%) and it is 7.0 times faster. On the same test set SIDEpro is clearly more accurate than SCWRL4-RRM (χ1 84.15%, χ1+2 71.24%) and 2.4 times faster. Evaluation on the additional test sets yield similar accuracy results with SIDEpro being slightly more accurate than SCWRL4-FRM and clearly more accurate than SCWRL4-RRM; however, the gap in CPU time is much more significant when the methods are applied to large protein complexes. SIDEpro is part of the SCRATCH suite of predictors and available from: http://scratch.proteomics.ics.uci.edu/.

Journal ArticleDOI
01 Mar 2012-Proteins
TL;DR: It is suggested that adoption of a DOC‐bound structural state for IpaD primes the Shigella TTSA for contact with host cells, and the data presented here and in the studies leading up to this work provide the foundation for developing a model of the first step in ShIGella TTS activation.
Abstract: Type III secretion (TTS) is an essential virulence factor for Shigella flexneri, the causative agent of shigellosis. The Shigella TTS apparatus (TTSA) is an elegant nano-machine that is composed of a basal body, an external needle to deliver effectors into human cells, and a needle tip complex that controls secretion activation. IpaD is at the tip of the nascent TTSA needle where it controls the first step of TTS activation. The bile salt deoxycholate (DOC) binds to IpaD to induce recruitment of the translocator protein IpaB into the maturing tip complex. We recently used spectroscopic analyses to show that IpaD undergoes a structural rearrangement that accompanies binding to DOC. Here, we report a crystal structure of IpaD with DOC bound and test the importance of the residues that make up the DOC binding pocket on IpaD function. IpaD binds DOC at the interface between helices {alpha}3 and {alpha}7, with concomitant movement in the orientation of helix {alpha}7 relative to its position in unbound IpaD. When the IpaD residues involved in DOC binding are mutated, some are found to lead to altered invasion and secretion phenotypes. These findings suggest that adoption of a DOC-bound structural state for IpaD primes the Shigellamore » TTSA for contact with host cells. The data presented here and in the studies leading up to this work provide the foundation for developing a model of the first step in Shigella TTS activation.« less

Journal ArticleDOI
01 Jul 2012-Proteins
TL;DR: An extended protein–RNA docking benchmark composed of 71 test cases in which the coordinates of the interacting protein and RNA molecules are available from experimental structures, plus an additional set of 35 cases inWhich at least one of the interaction subunits is modeled by homology.
Abstract: We present here an extended protein-RNA docking benchmark composed of 71 test cases in which the coordinates of the interacting protein and RNA molecules are available from experimental structures, plus an additional set of 35 cases in which at least one of the interacting subunits is modeled by homology. All cases in the experimental set have available unbound protein structure, and include five cases with available unbound RNA structure, four cases with a pseudo-unbound RNA structure, and 62 cases with the bound RNA form. The additional set of modeling cases comprises five unbound-model, eight model-unbound, 19 model-bound, and three model-model protein-RNA cases. The benchmark covers all major functional categories and contains cases with different degrees of difficulty for docking, as far as protein and RNA flexibility is concerned. The main objective of this benchmark is to foster the development of protein-RNA docking algorithms and to contribute to the better understanding and prediction of protein-RNA interactions. The benchmark is freely available at http://life.bsc.es/pid/protein-rna-benchmark.

Journal ArticleDOI
01 Oct 2012-Proteins
TL;DR: A novel way of identifying essential proteins which are known for their critical role in mediating cellular processes and constructing protein complexes is proposed and analyzed using a protein ranking algorithm (ProRank).
Abstract: Detecting protein complexes from protein-protein interaction (PPI) network is becoming a difficult challenge in computational biology. There is ample evidence that many disease mechanisms involve protein complexes, and being able to predict these complexes is important to the characterization of the relevant disease for diagnostic and treatment purposes. This article introduces a novel method for detecting protein complexes from PPI by using a protein ranking algorithm (ProRank). ProRank quantifies the importance of each protein based on the interaction structure and the evolutionarily relationships between proteins in the network. A novel way of identifying essential proteins which are known for their critical role in mediating cellular processes and constructing protein complexes is proposed and analyzed. We evaluate the performance of ProRank using two PPI networks on two reference sets of protein complexes created from Munich Information Center for Protein Sequence, containing 81 and 162 known complexes, respectively. We compare the performance of ProRank to some of the well known protein complex prediction methods (ClusterONE, CMC, CFinder, MCL, MCode and Core) in terms of precision and recall. We show that ProRank predicts more complexes correctly at a competitive level of precision and recall. The level of the accuracy achieved using ProRank in comparison to other recent methods for detecting protein complexes is a strong argument in favor of the proposed method.

Journal ArticleDOI
01 Jun 2012-Proteins
TL;DR: The positional specificity of TM0077 was investigated using 4‐nitrophenyl‐β‐D‐xylopyranoside monoacetates as substrates in a β-xylosidase‐coupled assay and confirmed that both inhibitors bind covalently to the catalytic serine (Ser188).
Abstract: TM0077 from Thermotoga maritima is a member of the carbohydrate esterase family 7 and is active on a variety of acetylated compounds, including cephalosporin C. TM0077 esterase activity is confined to short-chain acyl esters (C2-C3), and is optimal around 100°C and pH 7.5. The positional specificity of TM0077 was investigated using 4-nitrophenyl-β-D-xylopyranoside monoacetates as substrates in a β-xylosidase-coupled assay. TM0077 hydrolyzes acetate at positions 2, 3, and 4 with equal efficiency. No activity was detected on xylan or acetylated xylan, which implies that TM0077 is an acetyl esterase and not an acetyl xylan esterase as currently annotated. Selenomethionine-substituted and native structures of TM0077 were determined at 2.1 and 2.5 A resolution, respectively, revealing a classic α/β-hydrolase fold. TM0077 assembles into a doughnut-shaped hexamer with small tunnels on either side leading to an inner cavity, which contains the six catalytic centers. Structures of TM0077 with covalently bound phenylmethylsulfonyl fluoride and paraoxon were determined to 2.4 and 2.1 A, respectively, and confirmed that both inhibitors bind covalently to the catalytic serine (Ser188). Upon binding of inhibitor, the catalytic serine adopts an altered conformation, as observed in other esterase and lipases, and supports a previously proposed catalytic mechanism in which Ser hydroxyl rotation prevents reversal of the reaction and allows access of a water molecule for completion of the reaction.

Journal ArticleDOI
01 May 2012-Proteins
TL;DR: An unexpected finding about the hexapeptide‐HPO42− complex is that the side chain ε‐amino group of the lysine occurs in its deprotonated form, which appears to bind HPO 42− via an N···H‐O‐P hydrogen bond.
Abstract: The hexapeptide Ser-Gly-Ala-Gly-Lys-Thr has been synthesized and characterized. It was designed as a minimal soluble peptide that would be likely to have the phosphate-binding properties observed in the P-loops of proteins that bind the β-phosphate of GTP or ATP. The β-phosphate in such proteins is bound by a combination of the side chain e-amino group of the lysine residue plus the concavity formed by successive main chain peptide NH groups called a nest, which is favored by the glycines. The hexapeptide is shown to bind HPO(4) (2-) strongly at neutral pH. The affinities of the various ionized species of phosphate and hexapeptide are analyzed, showing that they increase with pH. It is likely the main chain NH groups of the hexapeptide bind phosphate in much the same way as the corresponding P-loop atoms bind the phosphate ligand in proteins. Most proteinaceous P-loops are situated at the N-termini of α-helices, and this observation has frequently been considered a key aspect of these binding sites. Such a hexapeptide in isolation seems unlikely to form an α-helix, an expectation in accord with the CD spectra examined; this suggests that being at the N-terminus of an α-helix is not essential for phosphate binding. An unexpected finding about the hexapeptide-HPO(4) (2-) complex is that the side chain e-amino group of the lysine occurs in its deprotonated form, which appears to bind HPO(4) (2-) via an N···H-O-P hydrogen bond.

Journal ArticleDOI
01 Jan 2012-Proteins
TL;DR: The program PresCont, which predicts in a robust manner amino acids that constitute protein‐protein interfaces (PPIs), reaches state‐of‐the‐art classification quality on the basis of only four residue properties that can be readily deduced from the 3D structure of an individual protein and a multiple sequence alignment (MSA) composed of homologs.
Abstract: An important task of computational biology is to identify those parts of a polypeptide chain, which are involved in interactions with other proteins. For this purpose, we have developed the program PresCont, which predicts in a robust manner amino acids that constitute protein-protein interfaces (PPIs). PresCont reaches state-of-the-art classification quality on the basis of only four residue properties that can be readily deduced from the 3D structure of an individual protein and a multiple sequence alignment (MSA) composed of homologs. The core of PresCont is a support vector machine, which assesses solvent-accessible surface area, hydrophobicity, conservation, and the local environment of each amino acid on the protein surface. For training and performance testing, we compiled three nonoverlapping datasets consisting of permanently formed or transient complexes, respectively. A comparison with SPPIDER, ProMate, and meta-PPISP showed that PresCont compares favorably with these highly sophisticated programs, and that its prediction quality is less dependent on the type of protein complex being considered. This balance is due to a mutual compensation of classification weaknesses observed for individual properties: For PPIs of permanent complexes, solvent-accessible surface and hydrophobicity contribute most to classification quality, for PPIs of transient complexes, the assessment of the local environment is most significant. Moreover, we show that for permanent complexes a segmentation of PPIs into core and rim residues has only a moderate influence on prediction quality. PresCont is available as a web service at http://www-bioinf.uni-regensburg.de/.

Journal ArticleDOI
01 Aug 2012-Proteins
TL;DR: It is demonstrated that SPIDER scoring function ranks native and native‐like poses above geometrical decoys and that it exceeds in performance a popular ZRANK scoring function.
Abstract: Accurate prediction of the structure of protein–protein complexes in computational docking experiments remains a formidable challenge. It has been recognized that identifying native or native-like poses among multiple decoys is the major bottleneck of the current scoring functions used in docking. We have developed a novel multibody pose-scoring function that has no theoretical limit on the number of residues contributing to the individual interaction terms. We use a coarse-grain representation of a protein–protein complex where each residue is represented by its side chain centroid. We apply a computational geometry approach called Almost-Delaunay tessellation that transforms protein–protein complexes into a residue contact network, or an undirectional graph where vertex-residues are nodes connected by edges. This treatment forms a family of interfacial graphs representing a dataset of protein–protein complexes. We then employ frequent subgraph mining approach to identify common interfacial residue patterns that appear in at least a subset of native protein–protein interfaces. The geometrical parameters and frequency of occurrence of each “native” pattern in the training set are used to develop the new SPIDER scoring function. SPIDER was validated using standard “ZDOCK” benchmark dataset that was not used in the development of SPIDER. We demonstrate that SPIDER scoring function ranks native and native-like poses above geometrical decoys and that it exceeds in performance a popular ZRANK scoring function. SPIDER was ranked among the top scoring functions in a recent round of CAPRI (Critical Assessment of PRedicted Interactions) blind test of protein–protein docking methods. Proteins 2012; © 2012 Wiley Periodicals, Inc.

Journal ArticleDOI
01 Aug 2012-Proteins
TL;DR: It is postulate that targeting aromatic recognition interfaces by tryptophan could be a useful approach for inhibiting the formation of amyloids.
Abstract: Amyloid formation is associated with several human diseases including Alzheimer's disease (AD), Parkinson's disease, Type 2 Diabetes, and so forth, no disease modifying therapeutics are available for them. Because of the structural similarities between the amyloid species characterizing these diseases, (despite the lack of amino acid homology) it is believed that there might be a common mechanism of toxicity for these conditions. Thus, inhibition of amyloid formation could be a promising disease-modifying therapeutic strategy for them. Aromatic residues have been identified as crucial in formation and stabilization of amyloid structures. This finding was corroborated by high-resolution structural studies, theoretical analysis, and molecular dynamics simulations. Amongst the aromatic entities, tryptophan was found to possess the most amyloidogenic potential. We therefore postulate that targeting aromatic recognition interfaces by tryptophan could be a useful approach for inhibiting the formation of amyloids. Quinones are known as inhibitors of cellular metabolic pathways, to have anti- cancer, anti-viral and anti-bacterial properties and were shown to inhibit aggregation of several amyloidogenic proteins in vitro. We have previously described two quinone-tryptophan hybrids which are capable of inhibiting amyloid-beta, the protein associated with AD pathology, both in vitro and in vivo. Here we tested their generic properties and their ability to inhibit other amyloidogenic proteins including α-synuclein, islet amyloid polypeptide, lysozyme, calcitonin, and insulin. Both compounds showed efficient inhibition of all five proteins examined both by ThT fluorescence analysis and by electron microscope imaging. If verified in vivo, these small molecules could serve as leads for developing generic anti-amyloid drugs.