scispace - formally typeset
Search or ask a question

Showing papers in "Proteins in 1995"


Journal ArticleDOI
01 Mar 1995-Proteins
TL;DR: The work unifies several previously proposed ideas concerning the mechanism protein folding and delimits the regions of validity of these ideas under different thermodynamic conditions.
Abstract: The understanding, and even the description of protein folding is impeded by the complexity of the process. Much of this complexity can be described and understood by taking a statistical approach to the energetics of protein conformation, that is, to the energy landscape. The statistical energy landscape approach explains when and why unique behaviors, such as specific folding pathways, occur in some proteins and more generally explains the distinction between folding processes common to all sequences and those peculiar to individual sequences. This approach also gives new, quantitative insights into the interpretation of experiments and simulations of protein folding thermodynamics and kinetics. Specifically, the picture provides simple explanations for folding as a two-state first-order phase transition, for the origin of metastable collapsed unfolded states and for the curved Arrhenius plots observed in both laboratory experiments and discrete lattice simulations. The relation of these quantitative ideas to folding pathways, to uniexponential vs. multiexponential behavior in protein folding experiments and to the effect of mutations on folding is also discussed. The success of energy landscape ideas in protein structure prediction is also described. The use of the energy landscape approach for analyzing data is illustrated with a quantitative analysis of some recent simulations, and a qualitative analysis of experiments on the folding of three proteins. The work unifies several previously proposed ideas concerning the mechanism protein folding and delimits the regions of validity of these ideas under different thermodynamic conditions. © 1995 Wiley-Liss, Inc.

2,437 citations


Journal ArticleDOI
01 Dec 1995-Proteins
TL;DR: An automatic algorithm STRIDE for protein secondary structure assignment from atomic coordinates based on the combined use of hydrogen bond energy and statistically derived backbone torsional angle information is developed.
Abstract: We have developed an automatic algorithm STRIDE for protein secondary structure assignment from atomic coordinates based on the combined use of hydrogen bond energy and statistically derived backbone torsional angle information. Parameters of the pattern recognition procedure were optimized using designations provided by the crystallographers as a standard-of-truth. Comparison to the currently most widely used technique DSSP by Kabsch and Sander (Biopolymers 22:2577-2637, 1983) shows that STRIDE and DSSP assign secondary structural states in 58 and 31% of 226 protein chains in our data sample, respectively, in greater agreement with the specific residue-by-residue definitions provided by the discoverers of the structures while in 11% of the chains, the assignments are the same. STRIDE delineates every 11th helix and every 32nd strand more in accord with published assignments.

2,390 citations


Journal ArticleDOI
01 Nov 1995-Proteins
TL;DR: 3D models of human nucleoside diphosphate kinase, mouse cellular retinoic acid binding protein I, and human eosinophil neurotoxin that were calculated by MODELLER, a program for comparative protein modeling by satisfaction of spatial restraints, have good stereochemistry and are at least as similar to the crystallographic structures as the closest template structures.
Abstract: We evaluate 3D models of human nucleoside diphosphate kinase, mouse cellular retinoic acid binding protein I, and human eosinophil neurotoxin that were calculated by MODELLER, a program for comparative protein modeling by satisfaction of spatial restraints. The models have good stereochemistry and are at least as similar to the crystallographic structures as the closest template structures. The largest errors occur in the regions that were not aligned correctly or where the template structures are not similar to the correct structure. These regions correspond predominantly to exposed loops, insertions of any length, and non-conserved side chains. When a template structure with more than 40% sequence identity to the target protein is available, the model is likely to have about 90% of the mainchain atoms modeled with an rms deviation from the X-ray structure of approximately 1 A, in large part because the templates are likely to be that similar to the X-ray structure of the target. This rms deviation is comparable to the overall differences between refined NMR and X-ray crystallography structures of the same protein.

1,128 citations



Journal ArticleDOI
01 Nov 1995-Proteins
TL;DR: It would appear from the results that successful substructure recognition depends most critically on accurate definition of the “fold” of a database protein, which must correctly delineate substructures that are, and are not, likely to be conserved during protein evolution.
Abstract: We present an analysis of 10 blind predictions prepared for a recent conference, "Critical Assessment of Techniques for Protein Structure Prediction." The sequences of these proteins are not detectably similar to those of any protein in the structure database then available, but we attempted, by a threading method, to recognize similarity to known domain folds. Four of the 10 proteins, as we subsequently learned, do indeed show significant similarity to then-known structures. For 2 of these proteins the predictions were accurate, in the sense that a similar structure was at or near the top of the list of threading scores, and the threading alignment agreed well with the corresponding structural alignment. For the best predicted model mean alignment error relative to the optimal structural alignment was 2.7 residues, arising entirely from small "register shifts" of strands or helices. In the analysis we attempt to identify factors responsible for these successes and failures. Since our threading method does not use gap penalties, we may readily distinguish between errors arising from our prior definition of the "cores" of known structures and errors arising from inherent limitations in the threading potential. It would appear from the results that successful substructure recognition depends most critically on accurate definition of the "fold" of a database protein. This definition must correctly delineate substructures that are, and are not, likely to be conserved during protein evolution.

457 citations


Journal ArticleDOI
01 Apr 1995-Proteins
TL;DR: A method has been developed that makes allowance for taking into account the coupling effect among different amino acid components of a protein by a covariance matrix and a theorem is presented and proved in Appendix A that is instructive for understanding the novel method at a deeper level.
Abstract: The development of prediction methods based on statistical theory generally consists of two parts: one is focused on the exploration of new algorithms, and the other on the improvement of a training database. The current study is devoted to improving the prediction of protein structural classes from both of the two aspects. To explore a new algorithm, a method has been developed that makes allowance for taking into account the coupling effect among different amino acid components of a protein by a covariance matrix. To improve the training database, the selection of proteins is carried out so that they have (1) as many non-homologous structures as possible, and (2) a good quality of structure. Thus, 129 representative proteins are selected. They are classified into 30 alpha, 30 beta, 30 alpha + beta, 30 alpha/beta, and 9 zeta (irregular) proteins according to a new criterion that better reflects the feature of the structural classes concerned. The average accuracy of prediction by the current method for the 4 x 30 regular proteins is 99.2%, and that for 64 independent testing proteins not included in the training database is 95.3%. To further validate its efficiency, a jackknife analysis has been performed for the current method as well as the previous ones, and the results are also much in favor of the current method. To complete the mathematical basis, a theorem is presented and proved in Appendix A that is instructive for understanding the novel method at a deeper level.

432 citations


Journal ArticleDOI
01 Aug 1995-Proteins
TL;DR: It is demonstrated that a single universal mathematical function can be used to represent the partial molar heat capacity of the native and unfolded states of proteins in solution in terms of the molecular weight, the polar and apolar solvent accessible surface areas, and the total area buried from the solvent.
Abstract: The heat capacity plays a major role in the determination of the energetics of protein folding and molecular recognition. As such, a better understanding of this thermodynamic parameter and its structural origin will provide new insights for the development of better molecular design strategies. In this paper we have analyzed the absolute heat capacity of proteins in different conformations. The results of these studies indicate that three major terms account for the absolute heat capacity of a protein: (1) one term that depends only on the primary or covalent structure of a protein and contains contributions from vibrational frequencies arising from the stretching and bending modes of each valence bond and internal rotations; (2) a term that contains the contributions of noncovalent interactions arising from secondary and tertiary structure; and (3) a term that contains the contributions of hydration. For a typical globular protein in solution the bulk of the heat capacity at 25 degrees C is given by the covalent structure term (close to 85% of the total). The hydration term contributes about 15 and 40% to the total heat capacity of the native and unfolded states, respectively. The contribution of non-covalent structure to the total heat capacity of the native state is positive but very small and does not amount to more than 3% at 25 degrees C. The change in heat capacity upon unfolding is primarily given by the increase in the hydration term (about 95%) and to a much lesser extent by the loss of noncovalent interactions (up to approximately 5%).(ABSTRACT TRUNCATED AT 250 WORDS)

387 citations


Journal ArticleDOI
01 Sep 1995-Proteins
TL;DR: It is concluded that the present parameter set, which permits different coordination geometries and ligand exchange for the zinc ion, can be employed effectively for both solution and protein simulations of zinc‐containing systems.
Abstract: Force field parameters that use a combination of Lennard-Jones and electrostatic interactions are developed for divalent zinc and tested in solution and protein simulations, It is shown that the parameter set gives free energies of solution in good agreement with experiment. Molecular dynamics simulations of carboxypeptidase A and carbonic anhydrase are performed with these zinc parameters and the CHARMM 22 beta all-atom parameter set, The structural results are as accurate as those obtained in published simulations that use specifically bonded models for the zinc ion and the AMBER force field, The inclusion of longer-range electrostatic interactions by use of the Extended Electrostatics model is found to improve the equilibrium conformation of the active site. It is concluded that the present parameter set, which permits different coordination geometries and ligand exchange for the zinc ion, can be employed effectively for both solution and protein simulations of zinc-containing systems. (C) 1995 Wiley-Liss, Inc.

343 citations


Journal ArticleDOI
01 Jun 1995-Proteins
TL;DR: The LINUS algorithm, a hierarchic procedure to predict the fold of a protein from its amino acid sequence alone, was applied to large, overlapping fragments from a diverse test set of 7 X‐ray‐elucidated proteins, with encouraging results.
Abstract: We describe LINUS, a hierarchic procedure to predict the fold of a protein from its amino acid sequence alone. The algorithm, which has been implemented in a computer program, was applied to large, overlapping fragments from a diverse test set of 7 X-ray-elucidated proteins, with encouraging results. For all proteins but one, the overall fragment topology is well predicted, including both secondary and supersecondary structure. The algorithm was also applied to a molecule of unknown conformation, groES, in which X-ray structure determination is presently ongoing. LINUS is an acronym for Local Independently Nucleated Units of Structure. The procedure ascends the folding hierarchy in discrete stages, with concomitant accretion of structure at each step. The chain is represented by simplified geometry and folds under the influence of a primitive energy function. The only accurately described energetic quantity in this work is hard sphere repulsion--the principal force involved in organizing protein conformation [Richards, F. M. Ann. Rev. Biophys. Bioeng. 6:151-176, 1977]. Among other applications, the method is a natural tool for use in the human genome initiative.

272 citations


Journal ArticleDOI
01 Aug 1995-Proteins
TL;DR: The crystal structures of the complexes of CDK2 with a weakly specific CDK inhibitor, N6‐(δ2‐isopentenyl)adenine, and a strongly specific inhibitor, olomoucine will be useful in directing the search for the next generation inhibitors with improved properties.
Abstract: Cyclin-dependent kinases (CDKs) are conserved regulators of the eukaryotic cell cycle with different isoforms controlling specific phases of the cell cycle. Mitogenic or growth inhibitory signals are mediated, respectively, by activation or inhibition of CDKs which phosphorylate proteins associated with the cell cycle. The central role of CDKs in cell cycle regulation makes them a potential new target for inhibitory molecules with anti-proliferative and/or anti-neoplastic effects. We describe the crystal structures of the complexes of CDK2 with a weakly specific CDK inhibitor, N6-(δ2-isopentenyl)adenine, and a strongly specific inhibitor, olomoucine. Both inhibitors are adenine derivatives and bind in the adenine binding pocket of CDK2, but in an unexpected and different orientation from the adenine of the authentic ligand ATP. The N6-benzyl substituent in olomoucine binds outside the conserved binding pocket and is most likely responsible for its specificity. The structural information from the CDK2-olomoucine complex will be useful in directing the search for the next generation inhibitors with improved properties. © 1995 Wiley-Liss, Inc.

268 citations


Journal ArticleDOI
01 Dec 1995-Proteins
TL;DR: Packing contacts are crystal artifacts, yet they make use of the same forces that govern specific recognition in protein‐protein complexes and oligomeric proteins, and provide examples of a nonspecific protein‐ protein interaction which can be compared to biologically relevant ones.
Abstract: Packing contacts are crystal artifacts, yet they make use of the same forces that govern specific recognition in protein-protein complexes and oligomeric proteins. They provide examples of a nonspecific protein-protein interaction which can be compared to biologically relevant ones. We evaluate the number and size of pairwise interfaces in 152 crystal forms where the asymmetric unit contains a monomeric protein. In those crystal forms that have no element of 2-fold symmetry, we find that molecules form 8 to 10 pairwise interfaces. The total area of the surface buried on each molecule is large, up to 4400 A2. Pairwise interfaces bury 200-1200 A2, like interfaces generated at random in a computer simulation, and less than interfaces in protease-inhibitor or antigen-antibody complexes, which bury 1500 A2 or more. Thus, specific contacts occurring in such complexes extend over a larger surface than nonspecific ones. In crystal forms with 2-fold symmetry, pairwise interfaces are fewer and larger on average than in the absence of 2-fold symmetry. Some bury 1500-2500 A2, like interfaces in oligomeric proteins, and create "crystal oligomers" which may have formed in the solution before crystallizing.

Journal ArticleDOI
01 Jun 1995-Proteins
TL;DR: There is a correlation between the directionality in the packing interactions of non‐H‐bonded β‐ and γ‐ Branched residue pairs, the handedness of the observed enantiomers of chiral β‐branched side chains, and the handediness of the twist of β‐sheet.
Abstract: Cross-strand pair correlations are calculated for residue pairs in anti-parallel beta-sheet for two cases: pairs whose backbone atoms are hydrogen bonded together (H-bonded site) and pairs which are not (non-H-bonded site). The statistics show that this distinction is important. When glycine is located on the edge of a sheet, it shows a 3:1 preference for the H-bonded site. The strongest observed correlations are for pairs of disulfide-bonded cystines, many of which adopt a close-packed conformation with each cystine in a spiral conformation of opposite chirality to its partner. It is likely that these pairs are a signature for the family of small, cystine-rich proteins. Most other strong positive and negative correlations involve charged and polar residues. It appears that electrostatic compatibility is the strongest factor affecting pair correlation. Significant correlations are observed for beta- and gamma-branched residues in the non-H-bonded site. An examination of the structures shows a directionality in side chain packing. There is a correlation between (1) the directionality in the packing interactions of non-H-bonded beta- and gamma-branched residue pairs, (2) the handedness of the observed enantiomers of chiral beta-branched side chains, and (3) the handedness of the twist of beta-sheet. These findings have implications for the formation of beta-sheets during protein folding and the mechanism by which the sheet becomes twisted.

Journal ArticleDOI
01 Dec 1995-Proteins
TL;DR: A normal mode analysis of the closed form of dimeric citrate synthase suggests that low‐frequency normal modes may become useful for determining a first approximation of the conformational path between the closed and open forms of these proteins.
Abstract: A normal mode analysis of the closed form of dimeric citrate synthase has been performed. The largest-amplitude collective motion predicted by this method compares well with the crystallographically observed hinge-bending motion. Such a result supports those obtained previously in the case of hinge-bending motions of smaller systems, such as lysozyme or hexokinase. Taken together, all these results suggest that low-frequency normal modes may become useful for determining a first approximation of the conformational path between the closed and open forms of these proteins. © 1995 Wiley-Liss, Inc.

Journal ArticleDOI
01 Nov 1995-Proteins
TL;DR: It is found that threading methods are capable of identifying the correct fold in many cases, but not reliably enough as yet, and threading can presently not be relied upon to derive a detailed 3D model from the amino acid sequence.
Abstract: This paper evaluates the results of a protein structure prediction contest. The predictions were made using threading procedures, which employ techniques for aligning sequences with 3D structures to select the correct fold of a given sequence from a set of alternatives. Nine different teams submitted 86 predictions, on a total of 21 target proteins with little or no sequence homology to proteins of known structure. The 3D structures of these proteins were newly determined by experimental methods, but not yet published or otherwise available to the predictors. The predictions, made from the amino acid sequence alone, thus represent a genuine test of the current performance of threading methods. Only a subset of all the predictions is evaluated here. It corresponds to the 44 predictions submitted for the 11 target proteins seen to adopt known folds. The predictions for the remaining 10 proteins were not analyzed, although weak similarities with known folds may also exist in these proteins. We find that threading methods are capable of identifying the correct fold in many cases, but not reliably enough as yet. Every team predicts correctly a different set of targets, with virtually all targets predicted correctly by at least one team. Also, common folds such as TIM barrels are recognized more readily than folds with only a few known examples. However, quite surprisingly, the quality of the sequence-structure alignments, corresponding to correctly recognized folds, is generally very poor, as judged by comparison with the corresponding 3D structure alignments. Thus, threading can presently not be relied upon to derive a detailed 3D model from the amino acid sequence. This raises a very intriguing question: how is fold recognition achieved? Our analysis suggests that it may be achieved because threading procedures maximize hydrophobic interactions in the protein core, and are reasonably good at recognizing local secondary structure.

Journal ArticleDOI
01 Oct 1995-Proteins
TL;DR: The structural features stabilized by many random sequences are typical of globular proteins while the features rarely observed in proteins are those which are stabilized by only a minor part of the random sequences.
Abstract: A theoretical study has shown that the occurrence of various structural elements in stable folds of random copolymers is exponentially dependent on the own energy of the element. A similar occurrence-on-energy dependence is observed in globular proteins1 from the level of amino acid conformations to the level of overall architectures. Thus, the structural features stabilized by many random sequences are typical of globular proteins while the features rarely observed in proteins are those which are stabilized by only a minor part of the random sequences. © 1995 Wiley-Liss, Inc.

Journal ArticleDOI
01 May 1995-Proteins
TL;DR: The analysis of the thermolysin trajectories indeed revealed a large rigid body hinge‐bending motion of the Nterminal and C‐terminal domains, similar to the motion hypothesized from the crystal structure comparisons.
Abstract: Comparisons of the crystal structures of thermolysin and the thermolysin-like protease produced by B. cereus have recently led to the hypothesis that neutral proteases undergo a hinge-bending motion. We have investigated this hypothesis by analyzing molecular dynamics simulations of thermolysin in vacuum and water, using the essential dynamics method. This method is able to extract large concerted atomic motions of biological importance from a molecular dynamics trajectory. The analysis of the thermolysin trajectories indeed revealed a large rigid body hinge-bending motion of the N-terminal and C-terminal domains, similar to the motion hypothesized from the crystal structure comparisons. In addition, it appeared that the essential dynamics properties derived from the vacuum simulation were similar to those derived from the solvent simulation.

Journal ArticleDOI
01 Nov 1995-Proteins
TL;DR: Five models built by the ICM method for the Comparative Modeling section of the Meeting on the Critical Assessment of Techniques for Protein Structure Prediction reveal a high proportion of correctly predicted side chains and loops were not correctly predicted.
Abstract: Five models have been built by the ICM method for the Comparative Modeling section of the Meeting on the Critical Assessment of Techniques for Protein Structure Prediction. The targets have homologous proteins with known three-dimensional structure with sequence identity ranging from 25 to 77%. After alignment of the target sequence with the related three-dimensional structure, the modeling procedure consists of two subproblems: side-chain prediction and loop prediction. The ICM method approaches these problems with the following steps: (1) a starting model is created based on the homologous structure with the conserved portion fixed and the nonconserved portion having standard covalent geometry and free torsion angles; (2) the Biased Probability Monte Carlo (BPMC) procedure is applied to search the subspaces of either all the nonconservative side-chain torsion angles or torsion angles in a loop backbone and surrounding side chains. A special algorithm was designed to generate low-energy loop deformations. The BPMC procedure globally optimizes the energy function consisting of ECEPP/3 and solvation energy terms. Comparison of the predictions with the NMR or crystallographic solutions reveals a high proportion of correctly predicted side chains. The loops were not correctly predicted because imprinted distortions of the backbone increased the energy of the near-native conformation and thus made the solution unrecognizable. Interestingly, the energy terms were found to be reliable and the sampling of conformational space sufficient. The implications of this finding for the strategies of future comparative modeling are discussed.

Journal ArticleDOI
01 Oct 1995-Proteins
TL;DR: A comparison of a normal mode analysis and principal component analysis of a 200‐ps molecular dynamics trajectory of bovine pancreatic trypsin inhibitor in vacuum has been made in order to further elucidate the harmonic and anharmonic aspects in the dynamics of proteins.
Abstract: A comparison of a normal mode analysis and principal component analysis of a 200-ps molecular dynamics trajectory of bovine pancreatic trypsin inhibitor in vacuum has been made in order to further elucidate the harmonic and anharmonic aspects in the dynamics of proteins. An anharmonicity factor is defined which measures the degree of anharmonicity in the modes, be they principal modes or normal modes, and it is shown that the principal mode system naturally divides into anharmonic modes with peak frequencies below 80 cm-1, and harmonic modes with frequencies above this value. In general the larger the mean-square fluctuation of a principal mode, the greater the degree of anharmonicity in its motion. The anharmonic modes represent only 12% of the total number of variables, but account for 98% of the total mean-square fluctuation. The transitional nature of the anharmonic motion is demonstrated. The results strongly suggest that in a large subspace, the free energy surface, as probed by the simulation, is approximated by a multi-dimensional parabola which is just a rescaled version of the parabola corresponding to the harmonic approximation to the conformational energy surface at a single minimum. After 200 ps, the rescaling factor, termed the "normal mode rescaling factor," has apparently converged to a value whereby the mean-square fluctuation within the subspace is about twice that predicted by the normal mode analysis.

Journal ArticleDOI
01 Aug 1995-Proteins
TL;DR: The data presented here support the hypothesis that a conserved tyrosine (Y492) located on the flat and more hydrophilic surface of the CBD is essential for the functionality and suggest that the morehydrophobic surface is not directly involved in the CBD function.
Abstract: Cellobiohydrolase I (CBHI) is the major cellulase of Trichoderma reesei. The enzyme contains a discrete cellulose-binding domain (CBD), which increases its binding and activity on crystalline cellulose. We studied cellulase-cellulose interactions using site-directed mutagenesis on the basis of the three-dimensional structure of the CBD of CBHI. Three mutant proteins which have earlier been produced in Saccharomyces cerevisiae were expressed in the native host organism. The data presented here support the hypothesis that a conserved tyrosine (Y492) located on the flat and more hydrophilic surface of the CBD is essential for the functionality. The data also suggest that the more hydrophobic surface is not directly involved in the CBD function. The pH dependence of the adsorption revealed that electrostatic repulsion between the bound proteins may also control the adsorption. The binding of CBHI to cellulose was significantly affected by high ionic strength suggesting that the interaction with cellulose includes a hydrophobic effect. High ionic strength increased the activity of the isolated core and of mutant proteins on crystalline cellulose, indicating that once productively bound, the enzymes are capable of solubilizing cellulose even with a mutagenized or with no CBD.

Journal ArticleDOI
01 Jul 1995-Proteins
TL;DR: The authors investigated the role of strand s4A in the formation of serpin-proteinase complexes and in serpin inhibitor activity through homology modeling of wild type inhibitor, mutant substrate, and latent serpins.
Abstract: The mechanism of formation and the structures of serpin-inhibitor complexes are not completely understood, despite detailed knowledge of the structures of a number of cleaved and uncleaved inhibitor, noninhibitor, and latent serpins. It has been proposed from comparison of inhibitor and noninhibitor serpins in the cleaved and uncleaved forms that insertion of strand s4A into preexisting beta-sheet A is a requirement for serpin inhibitor activity. We have investigated the role of this strand in formation of serpin-proteinase complexes and in serpin inhibitor activity through homology modeling of wild type inhibitor, mutant substrate, and latent serpins, and of putative serpin-proteinase complexes. These models explain the high stability of the complexes and provide an understanding of substrate behavior in serpins with point mutations in s4A and of latency in plasminogen activator inhibitor I.

Journal ArticleDOI
01 Nov 1995-Proteins
TL;DR: This assessment shows that where sequence identity between the target and the template structure is high, comparative molecular modeling is highly successful, on the other hand, automated modeling techniques and sophisticated energy minimization methods fail to improve upon the starting structures when the sequence identity is low.
Abstract: In spite of the tremendous increase in the rate at which protein structures are being determined, there is still an enormous gap between the numbers of known DNA-derived sequences and the numbers of three-dimensional structures. In order to shed light on the biological functions of the molecules, researchers often resort to comparative molecular modeling. Earlier work has shown that when the sequence alignment is in error, then the comparative model is guaranteed to be wrong. In addition, loops, the sites of insertions and deletions in families of homologous proteins, are exceedingly difficult to model. Thus, many of the current problems in comparative molecular modeling are minor versions of the global protein folding problem. In order to assess objectively the current state of comparative molecular modeling, 13 groups submitted blind predictions of seven different proteins of undisclosed tertiary structure. This assessment shows that where sequence identity between the target and the template structure is high (> 70%), comparative molecular modeling is highly successful. On the other hand, automated modeling techniques and sophisticated energy minimization methods fail to improve upon the starting structures when the sequence identity is low (∼30%). Based on these results it appears that insertions and deletions are still major problems. Successfully deducing the correct sequence alignment when the local similarity is low is still difficult. We suggest some minimal testing of submitted coordinates that should be required of authors before papers on comparative molecular modeling are accepted for publication in journals. © 1995 Wiley-Liss, Inc.

Journal ArticleDOI
01 Nov 1995-Proteins
TL;DR: This study focuses on replacing side chains as a subtask of model building by homology by choosing position‐specific rather than generalized rotamers and by sorting the residues that have to be modelled as a function of their freedom in rotamer space.
Abstract: In this study we concentrate on replacing side chains as a subtask of model building by homology. Two problems arise. How to determine potential low energy rotamers? And how to avoid the combinatorial explosion that results from the combination of many residues for which multiple good rotamers are predicted? We attempt to solve these problems by choosing position-specific rather than generalized rotamers and by sorting the residues that have to be modelled as a function of their freedom in rotamer space. The practical advantages of our method are the quality of the models for cases of high backbone similarity, the small amount of human intervention needed, and the fact that the method automatically estimates the reliability with which each residue has been modeled. Other methods described in this issue are probably more suitable if large backbone rearrangements or loop insertions and deletions need to be modeled. © 1995 Wiley-Liss, Inc.

Journal ArticleDOI
01 Jul 1995-Proteins
TL;DR: A new measure, ρ, of structural similarity based on RMSD that is independent of the sizes of the molecules involved, or of any other special properties of molecules is introduced that will be helpful in judging success in NMR structure determination and protein folding modeling.
Abstract: Protein structures are routinely compared by their root-mean-square deviation (RMSD) in atomic coordinates after optimal rigid body superposition. What is not so clear is the significance of different RMSD values, particularly above the customary arbitrary cutoff for obvious similarity of 2-3 A. Our earlier work argued for an intrinsic cutoff for protein similarity that varied with the number of residues in the polypeptide chains being compared. Here we introduce a new measure, rho, of structural similarity based on RMSD that is independent of the sizes of the molecules involved, or of any other special properties of molecules. When rho is less than 0.4-0.5, protein structures are visually recognized to be obviously similar, but the mathematically pleasing intrinsic cutoff of rho < 1.0 corresponds to overall similarity in folding motif at a level not usually recognized until smoothing of the polypeptide chain path makes it striking. When the structures are scaled to unit radius of gyration and equal principle moments of inertia, the comparisons are even more universal, since they are no longer obscured by differences in overall size and ellipticity. With increasing chain length, the distribution of rho for pairs of random structures is skewed to higher values, but the value for the best 1% of the comparisons rises only slowly with the number of residues. This level is close to an intrinsic cutoff between similar and dissimilar comparisons, namely the maximal scaled rho possible for the two structures to be more similar to each other than one is to the other's mirror image. The intrinsic cutoff is independent of the number of residues or points being compared. For proteins having fewer than 100 residues, the 1% rho falls below the intrinsic cutoff, so that for very small proteins, geometrically significant similarity can often occur by chance. We believe these ideas will be helpful in judging success in NMR structure determination and protein folding modeling.

Journal ArticleDOI
Ulf Ryde1
01 Jan 1995-Proteins
TL;DR: The results show that it is essential to allow for bond stretching degrees of freedom in molecular dynamics simulations to get a correct description of the dynamics of the metal coordination sphere; bond length constraints may restrict the accessible part of the phase space and therefore lead to qualitatively erroneous results.
Abstract: A detailed parameterization is presented of a zinc ion with one histidine and two cysteinate ligands, together with one or two water, hydroxide, aldehyde, alcohol, or alkoxide ligands. The parameterization is tailored for the active site of alcohol dehydrogenase and is obtained entirely from quantum chemical computations. The force-field reproduces excellently the geometry of quantum chemically optimized zinc complexes as well as the crystallographic geometry of the active site of alcohol dehydrogenase and small organic structures. The parameterization is used in molecular dynamics simulations and molecular mechanical energy minimizations of alcohol dehydrogenase with a four- or five-coordinate catalytic zinc ion. The active-site zinc ion seems to prefer four-coordination over five-coordination by at least 36 kJ/mol. The only stable binding site of a fifth ligand at the active-site zinc ion is opposite to the normal substrate site, in a narrow cavity behind the zinc ion. Only molecules of the size of water or smaller may occupy this site. There are large fluctuations in the geometry of the zinc coordination sphere. A four-coordinate water molecule alternates frequently (every 7 ps) between the substrate site and the fifth binding site and even two five-coordinate water molecules may interchange ligation sites without prior dissociation. Ligand exchange at the zinc ion probably proceeds by a dissociative mechanism.(ABSTRACT TRUNCATED AT 250 WORDS)

Journal ArticleDOI
01 Jun 1995-Proteins
TL;DR: Using the Wisconsin GCG sequence analysis programs, it is demonstrated that the cysteinerich regions of INSR and EGFR conform to the structural motif found in the tumor necrosis factorreceptor (TNFR) family.
Abstract: The insulin receptor (INSR) and epidermal growth factor receptor (EGFR) are representatives of two structurally related subfamilies of tyrosine kinase receptors. Using the Wisconsin GCG sequence analysis programs, we have demonstrated that the cysteinerich regions of INSR and EGFR conform to the structural motif found in the tumor necrosis factorreceptor (TNFR) family. The study also revealed that these regions were not composed of simple repeats of eight cysteine residues as previously proposed and that the second Cysrich region of EGFR contained one fewer TNFR repeat than the first. The sequence alignments identified two cysteineresidues in INSR that could be responsible for the additional disulfide bonds known to be involved in dimer formation. The published data on the alignments for the fibronectin type III repeat region of the INSR together with previous cysteine mutagenesis studies indicated that there were two disulfide bonds linking the α and β chains of the INSR, but only one α-β linkage in the insulin-like growth factor 1 receptor (IG 1R). Database searches and sequence alignments showed that the TNFR motif is also found in the cysteine-rich repeats of laminins and the noncatalytic domains of furin-like proteases. If the starting position of the repeat is altered the characteristic laminin repeat of eight cysteine residues can be shown to consist of a TNFR-like motif fused to the last half of an EGF-like repeat. The overlapping regions of these two motifs are known to have identical disulfide bonding patterns and similar protein folds. © 1995 Wiley-Liss, Inc.

Journal ArticleDOI
01 Nov 1995-Proteins
TL;DR: The prediction experiment reveals that fold recognition has become a powerful tool in structural biology and for the first time, in a public blind test, the unknown structures of proteins have been predicted ahead of experiment to an accuracy approaching molecular detail.
Abstract: The prediction experiment reveals that fold recognition has become a powerful tool in structural biology. We applied our fold recognition technique to 13 target sequences. In two cases, replication terminating protein and prosequence of subtilisin, the predicted structures are very similar to the experimentally determined folds. For the first time, in a public blind test, the unknown structures of proteins have been predicted ahead of experiment to an accuracy approaching molecular detail. In two other cases the approximate folds have been predicted correctly. According to the assessors there were 12 recognizable folds among the target proteins. In our postprediction analysis we find that in 7 cases our fold recognition technique is successful. In several of the remaining cases the predicted folds have interesting features in common with the experimental results. We present our procedure, discuss the results, and comment on several fundamental and technical problems encountered in fold recognition. © 1995 Wiley-Liss, Inc.

Journal ArticleDOI
01 Nov 1995-Proteins
TL;DR: Accuracy of predicting protein secondary structure and solvent accessibility from sequence information has been improved significantly by using information contained in multiple sequence alignments as input to a neural 'network system.
Abstract: Accuracy of predicting protein secondary structure and solvent accessibility from sequence information has been improved significantly by using information contained in multiple sequence alignments as input to a neural 'network system. For the Asilomar meeting, predictions for 13 proteins were generated automatically using the publicly available prediction method PHD. The results confirm the estimate of 72% three-state prediction accuracy. The fairly accurate predictions of secondary structure segments made the tool useful as a starting point for modeling of higher dimensional aspects of protein structure. © 1995 Wiley-Liss, Inc.

Journal ArticleDOI
01 Nov 1995-Proteins
TL;DR: Analysis of the results of the recent protein structure prediction experiment for the method shows that it achieved a high level of success, and inspection of the threading alignments for the (αβ)8 barrels provides clues as to how fold recognition by threading works, in that these folds are recognized by parts rather than as a whole.
Abstract: Analysis of the results of the recent protein structure prediction experiment for our method shows that we achieved a high level of success. Of the 18 available prediction targets of known structure, the assessors have identified 11 chains which either entirely match a previously known fold, or which partially match a substantial region of a known fold. Of these 11 chains, we made predictions for 9, and correctly assigned the folds in 5 cases. We have also identified a further 2 chains which also partially match known folds, and both of these were correctly predicted. The success rate for our method under blind testing is therefore 7 out of 11 chains. A further 2 folds could have easily been recognized but failed due to either overzealous filtering of potential matches, or to simple human error on our part. One of the two targets for which we did not submit a prediction, prosubtilisin, would not have been recognized by our usual criteria, but even in this case, it is possible that a correct prediction could have been made by considering a combination of pairwise energy and solvation energy Z-scores. Inspection of the threading alignments for the (alpha beta)8 barrels provides clues as to how fold recognition by threading works, in that these folds are recognized by parts rather than as a whole. The prospects for developing sequence threading technology further is discussed.

Journal ArticleDOI
01 Mar 1995-Proteins
TL;DR: Four methods are compared to drive the unfolding of a protein by imposing a gradual increase in the mean radius of the protein using a penalty function added to the physical interaction function, and by weak coupling of the difference between the temperature of the radially outward moving atoms to an external temperature bath.
Abstract: Four methods are compared to drive the unfolding of a protein: (1) high tem- perature (T-run), (2) high pressure (P-run), (3) by imposing a gradual increase in the mean ra- dius of the protein using a penalty function added to the physical interaction function (F- run, radial force driven unfolding), and (4) by weak coupling of the difference between the temperature of the radially outward moving at- oms and the radially inward moving atoms to an external temperature bath (K-run, kinetic energy driven unfolding). The characteristic features of the four unfolding pathways are an- alyzed in order to detect distortions due to the size or the type of the applied perturbation, as well as the features that are common to all of them. Hen egg white lysozyme is used as a test system. The simulations are analyzed and com- pared to experimental data like 'H-NMR amide proton exchange-folding competition, heat ca- pacity, and compressibility measurements. 0 1995 Wiley-Liss, Inc.

Journal ArticleDOI
01 Sep 1995-Proteins
TL;DR: Using energy minimization and cluster analysis, a 1020 ps molecular dynamics trajectory of solvated bovine pancreatic trypsin inhibitor is analyzed, indicating that this trajectory has not been shown to have completely sampled the conformational substates available to it.
Abstract: Using energy minimization and cluster analysis, we have analyzed a 1020 ps molecular dynamics trajectory of solvated bovine pancreatic trypsin inhibitor. Elucidation of conformational substates in this way both illustrates the degree of conformational convergence in the simulation and reduces the structural data to a tractable subset. The relative movement of structures upon energy minimization was used to estimate the sizes of features on the protein potential energy surface. The structures were analyzed using their pairwise root-mean-square C alpha deviations, which gave a global measure of conformational changes that would not be apparent by monitoring single degrees of freedom. At time scales of 0.1 ps, energy minimization detected sharp transitions between energy minima separated by 0.1 A rms deviation. Larger conformational clusters containing these smaller minima and separated by 0.25 A were seen at 1 ps time scales. Both of these small features of the conformational landscape were characterized by movements in loop regions associated with small, correlated backbone dihedral angle shifts. On a nanosecond time scale, the main features of the protein energy landscape were clusters separated by over 0.7 A rms deviation, with only seven of these substates visited over the 1 ns trajectory. These substrates, discernible both before and after energy minimization, differ mainly in a monotonic pivot of the loop residues 11-18 over the course of the simulation. This loop contains lysine 17, which specifically binds to trypsin in the active site. The trajectory did not return to previously visited clusters, indicating that this trajectory has not been shown to have completely sampled the conformational substates available to it. Because the apparent convergence to a single region of conformation space depends on both the time scale of observation and the size of the conformational features examined, convergence must be operationally defined within the context of the simulation.