scispace - formally typeset
Search or ask a question

Showing papers in "Proteins in 2001"


Journal ArticleDOI
15 May 2001-Proteins
TL;DR: A remarkable improvement in prediction quality has been observed by using the pseudo‐amino acid composition and its mathematical framework and biochemical implication may also have a notable impact on improving the prediction quality of other protein features.
Abstract: The cellular attributes of a protein, such as which compartment of a cell it belongs to and how it is associated with the lipid bilayer of an organelle, are closely correlated with its biological functions. The success of human genome project and the rapid increase in the number of protein sequences entering into data bank have stimulated a challenging frontier: How to develop a fast and accurate method to predict the cellular attributes of a protein based on its amino acid sequence? The existing algorithms for predicting these attributes were all based on the amino acid composition in which no sequence order effect was taken into account. To improve the prediction quality, it is necessary to incorporate such an effect. However, the number of possible patterns for protein sequences is extremely large, which has posed a formidable difficulty for realizing this goal. To deal with such a difficulty, the pseudo-amino acid composition is introduced. It is a combination of a set of discrete sequence correlation factors and the 20 components of the conventional amino acid composition. A remarkable improvement in prediction quality has been observed by using the pseudo-amino acid composition. The success rates of prediction thus obtained are so far the highest for the same classification schemes and same data sets. It has not escaped from our notice that the concept of pseudo-amino acid composition as well as its mathematical framework and biochemical implication may also have a notable impact on improving the prediction quality of other protein features.

1,731 citations


Journal ArticleDOI
01 Jan 2001-Proteins
TL;DR: The Swiss Protein database of sequences exhibits significantly higher amounts of both low‐complexity and predicted‐to‐be‐disordered segments as compared to a non‐redundant set of sequences from the Protein Data Bank, providing additional data that nature is richer in disordered and low-complexity segments compared to the commonness of these features in the set of structurally characterized proteins.
Abstract: Intrinsic disorder refers to segments or to whole proteins that fail to self-fold into fixed 3D structure, with such disorder sometimes existing in the native state. Here we report data on the relationships among intrinsic disorder, sequence complexity as measured by Shannon's entropy, and amino acid composition. Intrinsic disorder identified in protein crystal structures, and by nuclear magnetic resonance, circular dichroism, and prediction from amino acid sequence, all exhibit similar complexity distributions that are shifted to lower values compared to, but significantly overlapping with, the distribution for ordered proteins. Compared to sequences from ordered proteins, these variously characterized intrinsically disordered segments and proteins, and also a collection of low-complexity sequences, typically have obviously higher levels of protein-specific subsets of the following amino acids: R, K, E, P, and S, and lower levels of subsets of the following: C, W, Y, I, and V. The Swiss Protein database of sequences exhibits significantly higher amounts of both low-complexity and predicted-to-be-disordered segments as compared to a non-redundant set of sequences from the Protein Data Bank, providing additional data that nature is richer in disordered and low-complexity segments compared to the commonness of these features in the set of structurally characterized proteins.

1,658 citations


Journal ArticleDOI
01 Sep 2001-Proteins
TL;DR: This review considers the meaning of the protein dielectric constants and the ways to determine their optimal values and introduces a discriminative benchmark that only includes residues whose pKa values are shifted significantly from their values in water.
Abstract: Implicit models for evaluation of electrostatic energies in proteins include dielectric constants that represent effect of the protein environment. Unfortunately, the results obtained by such models are very sensitive to the value used for the dielectric constant. Furthermore, the factors that determine the optimal value of these constants are far from being obvious. This review considers the meaning of the protein dielectric constants and the ways to determine their optimal values. It is pointed out that typical benchmarks for validation of electrostatic models cannot discriminate between consistent and inconsistent models. In particular, the observed pKa values of surface groups can be reproduced correctly by models with entirely incorrect physical features. Thus, we introduce a discriminative benchmark that only includes residues whose pKa values are shifted significantly from their values in water. We also use the semimacroscopic version of the protein dipole Langevin dipole (PDLD/S) formulation to generate a series of models that move gradually from microscopic to fully macroscopic models. These include the linear response version of the PDLD/S models, Poisson Boltzmann (PB)-type models, and Tanford Kirkwwod (TK)-type models. Using our different models and the discriminative benchmark, we show that the protein dielectric constant, ep, is not a universal constant but simply a parameter that depends on the model used. It is also shown in agreement with our previous works that ep represents the factors that are not considered explicitly. The use of a discriminative benchmark appears to help not only in identifying nonphysical models but also in analyzing effects that are not reproduced in an accurate way by consistent models. These include the effect of water penetration and the effect of the protein reorganization. Finally, we show that the optimal dielectric constant for self-energies is not the optimal constant for charge-charge interactions. Proteins 2001;44:400–417. © 2001 Wiley-Liss, Inc.

876 citations


Journal ArticleDOI
01 Aug 2001-Proteins
TL;DR: This novel computational procedure is approximately a million times faster than molecular dynamics simulations and captures the essential conformational flexibility of the protein main and side‐chains from analysis of a single, static three‐dimensional structure.
Abstract: Techniques from graph theory are applied to analyze the bond networks in proteins and identify the flexible and rigid regions. The bond network consists of distance constraints defined by the covalent and hydrogen bonds and salt bridges in the protein, identified by geometric and energetic criteria. We use an algorithm that counts the degrees of freedom within this constraint network and that identifies all the rigid and flexible substructures in the protein, including overconstrained regions (with more crosslinking bonds than are needed to rigidify the region) and underconstrained or flexible regions, in which dihedral bond rotations can occur. The number of extra constraints or remaining degrees of bond-rotational freedom within a substructure quantifies its relative rigidity/flexibility and provides a flexibility index for each bond in the structure. This novel computational procedure, first used in the analysis of glassy materials, is approximately a million times faster than molecular dynamics simulations and captures the essential conformational flexibility of the protein main and side-chains from analysis of a single, static three-dimensional structure. This approach is demonstrated by comparison with experimental measures of flexibility for three proteins in which hinge and loop motion are essential for biological function: HIV protease, adenylate kinase, and dihydrofolate reductase.

739 citations


Journal ArticleDOI
01 Aug 2001-Proteins
TL;DR: It is proposed that each subdomain forms a β‐strand and each crystalline domain a two‐layered β‐sandwich, and it is suggested that the β‐sheets may be parallel, rather than antiparallel, as has been assumed up to now.
Abstract: The amino acid sequence of the heavy chain of Bombyx mori silk fibroin was derived from the gene sequence. The 5,263-residue (391-kDa) polypeptide chain comprises 12 low-complexity "crystalline" domains made up of Gly-X repeats and covering 94% of the sequence; X is Ala in 65%, Ser in 23%, and Tyr in 9% of the repeats. The remainder includes a nonrepetitive 151-residue header sequence, 11 nearly identical copies of a 43-residue spacer sequence, and a 58-residue C-terminal sequence. The header sequence is homologous to the N-terminal sequence of other fibroins with a completely different crystalline region. In Bombyx mori, each crystalline domain is made up of subdomains of approximately 70 residues, which in most cases begin with repeats of the GAGAGS hexapeptide and terminate with the GAAS tetrapeptide. Within the subdomains, the Gly-X alternance is strict, which strongly supports the classic Pauling-Corey model, in which beta-sheets pack on each other in alternating layers of Gly/Gly and X/X contacts. When fitting the actual sequence to that model, we propose that each subdomain forms a beta-strand and each crystalline domain a two-layered beta-sandwich, and we suggest that the beta-sheets may be parallel, rather than antiparallel, as has been assumed up to now.

616 citations


Journal ArticleDOI
01 Jan 2001-Proteins
TL;DR: Fourteen models were constructed and analyzed for the comparative modeling section of Critical Assessment of Techniques for Protein Structure Prediction (CASP4), and there now is a convergence of algorithms for comparative modeling and fold recognition, particularly in the region of remote homology.
Abstract: Fourteen models were constructed and analyzed for the comparative modeling section of Critical Assessment of Techniques for Protein Structure Prediction (CASP4). Sequence identity between each target and the best possible parent(s) ranged between 55 and 13%, and the root-mean-square deviation between model and target was from 0.8 to 17.9 A. In the fold recognition section, 10 of the 11 remote homologues were recognized. The modeling protocols are a combination of automated computer algorithms, 3D-JIGSAW (for comparative modeling) and 3D-PSSM (for fold recognition), with human intervention at certain critical stages. In particular, intervention is required to check superfamily assignment, best possible parents from which to model, sequence alignments to those parents and take-off regions for modeling variable regions. There now is a convergence of algorithms for comparative modeling and fold recognition, particularly in the region of remote homology.

601 citations


Journal ArticleDOI
01 May 2001-Proteins
TL;DR: The residue composition at the interfaces, in entire proteins and in whole genomes correlates well, indicating the statistical strength of the data set, and contacts between pairs of hydrophobic and polar residues were unfavorable, and the charged residues tended to pair subject to charge complementarity, in agreement with previous reports.
Abstract: We used a nonredundant set of 621 protein-protein interfaces of known high-resolution structure to derive residue composition and residue-residue contact preferences. The residue composition at the interfaces, in entire proteins and in whole genomes correlates well, indicating the statistical strength of the data set. Differences between amino acid distributions were observed for interfaces with buried surface area of less than 1,000 A(2) versus interfaces with area of more than 5,000 A(2). Hydrophobic residues were abundant in large interfaces while polar residues were more abundant in small interfaces. The largest residue-residue preferences at the interface were recorded for interactions between pairs of large hydrophobic residues, such as Trp and Leu, and the smallest preferences for pairs of small residues, such as Gly and Ala. On average, contacts between pairs of hydrophobic and polar residues were unfavorable, and the charged residues tended to pair subject to charge complementarity, in agreement with previous reports. A bootstrap procedure, lacking from previous studies, was used for error estimation. It showed that the statistical errors in the set of pairing preferences are generally small; the average standard error is approximately 0.2, i.e., about 8% of the average value of the pairwise index (2.9). However, for a few pairs (e.g., Ser-Ser and Glu-Asp) the standard error is larger in magnitude than the pairing index, which makes it impossible to tell whether contact formation is favorable or unfavorable. The results are interpreted using physicochemical factors and their implications for the energetics of complex formation and for protein docking are discussed. Proteins 2001;43:89-102.

405 citations


Journal ArticleDOI
15 Feb 2001-Proteins
TL;DR: In this article, the authors studied the energy landscape of the peptide Ace-GEWTYDDATKTFTVTE-Nme, taken from the C-terminal fragment (41-56) of protein G, in explicit aqueous solution by a highly parallel replica-exchange approach that combines molecular dynamics trajectories with a temperature exchange Monte Carlo process.
Abstract: We studied the energy landscape of the peptide Ace-GEWTYDDATKTFTVTE-Nme, taken from the C-terminal fragment (41-56) of protein G, in explicit aqueous solution by a highly parallel replica-exchange approach that combines molecular dynamics trajectories with a temperature exchange Monte Carlo process. The combined trajectories in T and configurational space allow a replica to overcome a free energy barrier present at one temperature by increasing T, changing configurations, and cooling in a self-regulated manner, thus allowing sampling of broad regions of configurational space in short (nanoseconds) time scales. The free energy landscape of this system over a wide range of temperatures shows that the system preferentially adopts a beta hairpin structure. However, the peptide also samples other stable ensembles where the peptide adopts helices and helix-turn-helix states, among others. The helical states become increasingly stable at low temperatures, but are slightly less stable than the beta turn ensemble. The energy landscape is rugged at low T, where substates are separated by large energy barriers. These barriers disappear at higher T (approximately 330 K), where the system preferentially adopts a "molten globule" state with structures similar to the beta hairpin.

342 citations


Journal ArticleDOI
01 Jan 2001-Proteins
TL;DR: The results show that interface conservation is higher than expected by chance and usually statistically significant at the 5% level or better.
Abstract: Evolutionary information derived from the large number of available protein sequences and structures could powerfully guide both analysis and prediction of protein-protein interfaces. To test the relevance of this information, we assess the conservation of residues at protein-protein interfaces compared with other residues on the protein surface. Six homodimer families are analyzed: alkaline phosphatase, enolase, glutathione S-transferase, copper-zinc superoxide dismutase, Streptomyces subtilisin inhibitor, and triose phosphate isomerase. For each family, random simulation is used to calculate the probability (P value) that the level of conservation observed at the interface occurred by chance. The results show that interface conservation is higher than expected by chance and usually statistically significant at the 5% level or better. The effect on the P values of using different definitions of the interface and of excluding active site residues is discussed.

335 citations


Journal ArticleDOI
01 May 2001-Proteins
TL;DR: A ligand–protein inverse‐docking approach for finding potential protein targets of a small molecule by the computer‐automated docking search of a protein cavity database developed from protein structures in the Protein Data Bank.
Abstract: Ligand-protein docking has been developed and used in facilitating new drug discoveries. In this approach, docking single or multiple small molecules to a receptor site is attempted to find putative ligands. A number of studies have shown that docking algorithms are capable of finding ligands and binding conformations at a receptor site close to experimentally determined structures. These algorithms are expected to be equally applicable to the identification of multiple proteins to which a small molecule can bind or weakly bind. We introduce a ligand-protein inverse-docking approach for finding potential protein targets of a small molecule by the computer-automated docking search of a protein cavity database. This database is developed from protein structures in the Protein Data Bank (PDB). Docking is conducted with a procedure involving multiple-conformer shape-matching alignment of a molecule to a cavity followed by molecular-mechanics torsion optimization and energy minimization on both the molecule and the protein residues at the binding region. Scoring is conducted by the evaluation of molecular-mechanics energy and, when applicable, by the further analysis of binding competitiveness against other ligands that bind to the same receptor site in at least one PDB entry. Testing results on two therapeutic agents, 4H-tamoxifen and vitamin E, showed that 50% of the computer-identified potential protein targets were implicated or confirmed by experiments. The application of this approach may facilitate the prediction of unknown and secondary therapeutic target proteins and those related to the side effects and toxicity of a drug or drug candidate. Proteins 2001;43:217-226.

328 citations


Journal ArticleDOI
15 Aug 2001-Proteins
TL;DR: The main strength of the network predictor lies in the fact that neighbor lists and solvent exposure are relatively insensitive to structural changes accompanying complex formation, and it performs equally well with bound or unbound structures of the proteins.
Abstract: Protein-protein interaction sites are predicted from a neural network with sequence profiles of neighboring residues and solvent exposure as input. The network was trained on 615 pairs of nonhomologous complex-forming proteins. Tested on a different set of 129 pairs of nonhomologous complex-forming proteins, 70% of the 11,004 predicted interface residues are actually located in the interfaces. These 7732 correctly predicted residues account for 65% of the 11,805 residues making up the 129 interfaces. The main strength of the network predictor lies in the fact that neighbor lists and solvent exposure are relatively insensitive to structural changes accompanying complex formation. As such, it performs equally well with bound or unbound structures of the proteins. For a set of 35 test proteins, when the input was calculated from the bound and unbound structures, the correct fractions of the predicted interface residues were 69 and 70%, respectively.

Journal ArticleDOI
15 Aug 2001-Proteins
TL;DR: This atomic pairwise interaction potential has better selectivity especially for near‐native structures and can be used to select near‐ native folds generated by structure prediction algorithms as well as for protein structure refinement.
Abstract: A heavy atom distance-dependent knowledge-based pairwise potential has been developed. This statistical potential is first evaluated and optimized with the native structure z-scores from gapless threading. The potential is then used to recognize the native and near-native structures from both published decoy test sets, as well as decoys obtained from our group's protein structure prediction program. In the gapless threading test, there is an average z-score improvement of 4 units in the optimized atomic potential over the residue-based quasichemical potential. Examination of the z-scores for individual pairwise distance shells indicates that the specificity for the native protein structure is greatest at pairwise distances of 3.5-6.5 A, i.e., in the first solvation shell. On applying the current atomic potential to test sets obtained from the web, composed of native protein and decoy structures, the current generation of the potential performs better than residue-based potentials as well as the other published atomic potentials in the task of selecting native and near-native structures. This newly developed potential is also applied to structures of varying quality generated by our group's protein structure prediction program. The current atomic potential tends to pick lower RMSD structures than do residue-based contact potentials. In particular, this atomic pairwise interaction potential has better selectivity especially for near-native structures. As such, it can be used to select near-native folds generated by structure prediction algorithms as well as for protein structure refinement.

Journal ArticleDOI
01 Jan 2001-Proteins
TL;DR: Rosetta ab initio protein structure predictions in CASP4 were considerably more consistent and more accurate than previous abinitio structure predictions and suggest that Rosetta may soon be able to contribute to the interpretation of genome sequence information.
Abstract: Rosetta ab initio protein structure predictions in CASP4 were considerably more consistent and more accurate than previous ab initio structure predictions. Large segments were correctly predicted (>50 residues superimposed within an RMSD of 6.5 A) for 16 of the 21 domains under 300 residues for which models were submitted. Models with the global fold largely correct were produced for several targets with new folds, and for several difficult fold recognition targets, the Rosetta models were more accurate than those produced with traditional fold recognition models. These promising results suggest that Rosetta may soon be able to contribute to the interpretation of genome sequence information.

Journal ArticleDOI
01 May 2001-Proteins
TL;DR: The initial validation of a new rapid approach to molecular docking developed for prioritizing combinatorial libraries is presented and in nearly 90% of these cases docks the ligand to within 2.0 Å of the observed binding mode.
Abstract: The prioritization of the screening of combinatorial libraries is an extremely important task for the rapid identification of tight binding ligands and ultimately pharmaceutical compounds. When structural information for the target is available, molecular docking is an approach that can be used for prioritization. Here, we present the initial validation of a new rapid approach to molecular docking developed for prioritizing combinatorial libraries. The algorithm is tested on 103 individual cases from the protein data bank and in nearly 90% of these cases docks the ligand to within 2.0 A of the observed binding mode. Because the mean CPU time is <5 s/mol, this approach can process hundreds of thousands of compounds per week. Furthermore, if a somewhat less thorough search is performed, the search time drops to 1 s/mol, thus allowing millions of compounds to be docked per week and tested for potential activity. Proteins 2001;43:113-124.

Journal ArticleDOI
01 Jul 2001-Proteins
TL;DR: In this communication, a simple, rigorous derivation is provided to prove that the Bayes decision rule introduced recently for protein structural class prediction is completely the same as the earlier component‐coupled algorithm.
Abstract: It has been quite clear that the success rate for predicting protein structural class can be improved significantly by using the algorithms that incorporate the coupling effect among different amino acid components of a protein However, there is still a lot of confusion in understanding the relationship of these advanced algorithms, such as the least Mahalanobis distance algorithm, the component-coupled algorithm, and the Bayes decision rule In this communication, a simple, rigorous derivation is provided to prove that the Bayes decision rule introduced recently for protein structural class prediction is completely the same as the earlier component-coupled algorithm Meanwhile, it is also very clear from the derivative equations that the least Mahalanobis distance algorithm is an approximation of the component-coupled algorithm, also named as the covariant-discriminant algorithm introduced by Chou and Elrod in protein subcellular location prediction (Protein Engineering, 1999; 12:107-118) Clarification of the confusion will help use these powerful algorithms effectively and correctly interpret the results obtained by them, so as to conduce to the further development not only in the structural prediction area, but in some other relevant areas in protein science as well

Journal ArticleDOI
J. Read1, V.J. Winter, CM Eszes1, Richard B. Sessions, R.L. Brady 
01 May 2001-Proteins
TL;DR: The close similarity of these crystal structures suggests the distinctive activity of these enzyme isoforms is likely to result directly from variation of charged surface residues peripheral to the active site, a hypothesis supported by electrostatic calculations based on each structure.
Abstract: Lactate dehydrogenase (LDH) interconverts pyruvate and lactate with concomitant interconversion of NADH and NAD(+). Although crystal structures of a variety of LDH have previously been described, a notable absence has been any of the three known human forms of this glycolytic enzyme. We have now determined the crystal structures of two isoforms of human LDH-the M form, predominantly found in muscle; and the H form, found mainly in cardiac muscle. Both structures have been crystallized as ternary complexes in the presence of the NADH cofactor and oxamate, a substrate-like inhibitor. Although each of these isoforms has different kinetic properties, the domain structure, subunit association, and active-site regions are indistinguishable between the two structures. The pK(a) that governs the K(M) for pyruvate for the two isozymes is found to differ by about 0.94 pH units, consistent with variation in pK(a) of the active-site histidine. The close similarity of these crystal structures suggests the distinctive activity of these enzyme isoforms is likely to result directly from variation of charged surface residues peripheral to the active site, a hypothesis supported by electrostatic calculations based on each structure. Proteins 2001;43:175-185.

Journal ArticleDOI
01 May 2001-Proteins
TL;DR: The START domain superfamily is a rare case of the adaptation of a protein fold with a conserved ligand‐binding mode for both a broad variety of catalytic activities and noncatalytic regulatory functions.
Abstract: With a protein structure comparison, an iterative database search with sequence profiles, and a multiple-alignment analysis, we show that two domains with the helix-grip fold, the star-related lipid-transfer (START) domain of the MLN64 protein and the birch allergen, are homologous. They define a large, previously underappreciated superfamily that we call the START superfamily. In addition to the classical START domains that are primarily involved in eukaryotic signaling mediated by lipid binding and the birch antigen family that consists of plant proteins implicated in stress/pathogen response, the START superfamily includes bacterial polyketide cyclases/aromatases (e.g., TcmN and WhiE VI) and two families of previously uncharacterized proteins. The identification of this domain provides a structural prediction of an important class of enzymes involved in polyketide antibiotic synthesis and allows the prediction of their active site. It is predicted that all START domains contain a similar ligand-binding pocket. Modifications of this pocket determine the ligand-binding specificity and may also be the basis for at least two distinct enzymatic activities, those of a cyclase/aromatase and an RNase. Thus, the START domain superfamily is a rare case of the adaptation of a protein fold with a conserved ligand-binding mode for both a broad variety of catalytic activities and noncatalytic regulatory functions. Proteins 2001;43:134–144. © 2001 Wiley-Liss, Inc.

Journal ArticleDOI
01 Jun 2001-Proteins
TL;DR: This work presents an FDPB‐based pKa calculation method in which the hydrogen‐bond network is globally optimized for every single protonation state used, which gives a significant improvement in the accuracy of calculated pKa values, especially for buried residues.
Abstract: pK(a) calculation methods that are based on finite difference solutions to the Poisson-Boltzmann equation (FDPB) require that energy calculations be performed for a large number of different protonation states of the protein. Normally, the differences between these protonation states are modeled by changing the charges on a few atoms, sometimes the differences are modeled by adding or removing hydrogens, and in a few cases the positions of these hydrogens are optimized locally. We present an FDPB-based pK(a) calculation method in which the hydrogen-bond network is globally optimized for every single protonation state used. This global optimization gives a significant improvement in the accuracy of calculated pK(a) values, especially for buried residues. It is also shown that large errors in calculated pK(a) values are often due to structural artifacts induced by crystal packing. Optimization of the force fields and parameters used in pK(a) calculations should therefore be performed with X-ray structures that are corrected for crystal artifacts.

Journal ArticleDOI
01 Sep 2001-Proteins
TL;DR: It is proposed that engineering‐optimized specific electrostatic interactions to avoid electrostatic repulsion would reduce the type I disordered state, driving the molten globule → native (N) state, leading to the denatured → MG → N state.
Abstract: Traditionally, molecular disorder has been viewed as local or global instability. Molecules or regions displaying disorder have been considered inherently unstructured. The term has been routinely applied to cases for which no atomic coordinates can be derived from crystallized molecules. Yet, even when it appears that the molecules are disordered, prevailing conformations exist, with population times higher than those of all alternate conformations. Disordered molecules are the outcome of rugged energy landscapes away from the native state around the bottom of the funnel. Ruggedness has a biological function, creating a distribution of structured conformers that bind via conformational selection, driving association and multimolecular complex formation, whether chain-linked in folding or unlinked in binding. We classify disordered molecules into two types. The first type possesses a hydrophobic core. Here, even if the native conformation is unstable, it still has a large enough population time, enabling its experimental detection. In the second type, no such hydrophobic core exists. Hence, the native conformations of molecules belonging to this category have shorter population times, hindering their experimental detection. Although there is a continuum of distribution of hydrophobic cores in proteins, an empirical, statistically based hydrophobicity function may be used as a guideline for distinguishing the two disordered molecule types. Furthermore, the two types relate to steps in the protein folding reaction. With respect to protein design, this leads us to propose that engineering-optimized specific electrostatic interactions to avoid electrostatic repulsion would reduce the type I disordered state, driving the molten globule (MG) --> native (N) state. In contrast, for overcoming the type II disordered state, in addition to specific interactions, a stronger hydrophobic core is also indicated, leading to the denatured --> MG --> N state.

Journal ArticleDOI
01 Jan 2001-Proteins
TL;DR: It is expected that in the near future, the performance difference between humans and machines will continue to narrow and that fully automated structure prediction will become an effective companion and complement to experimental structural genomics.
Abstract: We present the results of the fully automated CAFASP3 experiment, which was car- ried out in parallel with CASP5, using the same set of prediction targets. CAFASP participation is re- stricted to fully automatic structure prediction serv- ers. The servers' performance is evaluated by using previously announced, objective, reproducible and fully automated evaluation methods. More than 60 servers participated in CAFASP3, covering all cat- egories of structure prediction. As in the previous CAFASP2 experiment, it was possible to identify a group of 5-10 top performing independent servers. This group of top performing independent servers produced relatively accurate models for all the 32 "Homology Modeling" targets, and for up to 43% of the 30 "Fold Recognition" targets. One of the most important results of CAFASP3 was the realization of the value of all the independent servers as a group, as evidenced by the superior performance of "meta- predictors" (defined here as predictors that make use of the output of other CAFASP servers). The performance of the best automated meta-predictors was roughly 30% higher than that of the best inde- pendent server. More significantly, the performance of the best automated meta-predictors was compa- rable with that of the best 5-10 human CASP predic- tors. This result shows that significant progress has been achieved in automatic structure prediction and has important implications to the prospects of automated structure modeling in the context of structural genomics. Proteins 2003;53:503-516.

Journal ArticleDOI
15 Aug 2001-Proteins
TL;DR: The ability of the intermediate‐resolution model developed in this work to accurately mimic realistic peptide behavior is demonstrated, suggesting that simulations of very long times are possible with this model.
Abstract: An intermediate-resolution model of small, homogeneous peptides is introduced, and discontinuous molecular dynamics simulation is applied to study secondary structure formation. Physically, each model residue consists of a detailed three-bead backbone and a simplified single-bead side-chain. Excluded volume and hydrogen bond interactions are constructed with discontinuous (i.e., hard-sphere and square-well) potentials. Simulation results show that the backbone motion of the model is limited to realistic regions of Φ–Ψ conformational space. Model polyalanine chains undergo a locally cooperative transition to form α-helices that are stabilized by backbone hydrogen bonding, while model polyglycine chains tend to adopt nonhelical structures. When side-chain size is increased beyond a critical diameter, steric interactions prevent formation of long α-helices. These trends in helicity as a function of residue type have been well documented by experimental, theoretical, and simulation studies and demonstrate the ability of the intermediate-resolution model developed in this work to accurately mimic realistic peptide behavior. The efficient algorithm used permits observation of the complete helix–coil transition within 15 min on a single-processor workstation, suggesting that simulations of very long times are possible with this model. Proteins 2001;44:344–360. © 2001 Wiley-Liss, Inc.

Journal ArticleDOI
01 May 2001-Proteins
TL;DR: Analysis of relevant spectroscopic data leads to the conclusions that two binding sites are involved in BSA–3HF interaction, and the interaction is slightly positively cooperative in nature with a similar binding constant.
Abstract: Recent studies have shown that various synthetic as well as therapeutically active naturally occurring flavonols possess novel luminescence properties that can potentially serve as highly sensitive monitors of their microenvironments in biologically relevant systems. We report a study on the interactions of bovine serum albumin (BSA) with the model flavonol 3-hydroxyflavone (3HF), using the excited-state proton-transfer (ESPT) luminescence of 3HF as a probe. Upon addition of BSA to the flavonoid solutions, we observe remarkable changes in the absorption, ESPT fluorescence emission and excitation profiles as well as anisotropy (r) values. Complexation of 3HF with protein results in a pronounced shift (20 nm) of the ESPT emission maximum of the probe (from lambda(max)(em) = 513 nm to lambda(max)(em) = 533 nm) accompanied by a significant increase in fluorescence intensity. The spectral data also suggest that, in addition to ESPT, the protein environment induces proton abstraction from 3HF leading to formation of anionic species in the ground state. Fairly high values of anisotropy are observed in the presence of BSA for the tautomer (r = 0.25) as well as anion (r = 0.35) species of 3HF, implying that both the species are located in motion-restricted environments of BSA molecules. Analysis of relevant spectroscopic data leads to the conclusions that two binding sites are involved in BSA-3HF interaction, and the interaction is slightly positively cooperative in nature with a similar binding constant of 1.1 - 1.3 x 10(5) M(-1) for both these sites. Proteins 2001;43:75-81.


Journal ArticleDOI
01 Aug 2001-Proteins
TL;DR: A pattern‐matching approach was used to search for possible hinge‐bending motifs in the TM helices of other ion channel proteins, which uncovered a conserved Gly‐x‐Pro motif in TM helix D5 of CLC channels.
Abstract: A number of ion channels contain transmembrane (TM) alpha-helices that contain proline-induced molecular hinges. These TM helices include the channel-forming peptide alamethicin (Alm), the S6 helix from voltage-gated potassium (Kv) channels, and the D5 helix from voltage-gated chloride (CLC) channels. For both Alm and KvS6, experimental data implicate hinge-bending motions of the helix in an aspect of channel gating. We have compared the hinge-bending motions of these TM helices in bilayer-like environments by multi-nanosecond MD simulations in an attempt to describe motions of these helices that may underlie possible modes of channel gating. Alm is an alpha-helical channel-forming peptide, which contains a central kink associated with a Gly-x-x-Pro motif in its sequence. Simulations of Alm in a TM orientation for 10 ns in an octane slab indicate that the Gly-x-x-Pro motif acts as a molecular hinge. The S6 helix from Shaker Kv channels contains a Pro-Val-Pro motif. Modeling studies and recent experimental data suggest that the KvS6 helix may be kinked in the vicinity of this motif. Simulations (10 ns) of an isolated KvS6 helix in an octane slab and in a POPC bilayer reveal hinge-bending motions. A pattern-matching approach was used to search for possible hinge-bending motifs in the TM helices of other ion channel proteins. This uncovered a conserved Gly-x-Pro motif in TM helix D5 of CLC channels. MD simulations of a model of hCLC1-D5 spanning an octane slab suggest that this channel also contains a TM helix that undergoes hinge-bending motion. In conclusion, our simulations suggest a model in which hinge-bending motions of TM helices may play a functional role in the gating mechanisms of several different families of ion channels.

Journal ArticleDOI
01 Feb 2001-Proteins
TL;DR: Domain architecture analysis shows that GGDEF is typically present in multidomain proteins containing regulatory domains of signaling pathways or protein–protein interaction modules, and Evolutionary tree analysis indicates thatGGDEF/cyclase superfamily forms a large diversified cluster of orthologous proteins present in bacteria, archaea, and eukaryotes.
Abstract: The GGDEF domain is detected in many prokaryotic proteins, most of which are of unknown function. Several bacteria carry 12-22 different GGDEF homologues in their genomes. Conducting extensive profile-based searches, we detect statistically supported sequence similarity between GGDEF domain and adenylyl cyclase catalytic domain. From this homology, we deduce that the prokaryotic GGDEF domain is a regulatory enzyme involved in nucleotide cyclization, with the fold similar to that of the eukaryotic cyclase catalytic domain. This prediction correlates with the functional information available on two GGDEF-containing proteins, namely diguanylate cyclase and phosphodiesterase A of Acetobacter xylinum, both of which regulate the turnover of cyclic diguanosine monophosphate. Domain architecture analysis shows that GGDEF is typically present in multidomain proteins containing regulatory domains of signaling pathways or protein-protein interaction modules. Evolutionary tree analysis indicates that GGDEF/cyclase superfamily forms a large diversified cluster of orthologous proteins present in bacteria, archaea, and eukaryotes. Proteins 2001;42:210-216.

Journal ArticleDOI
01 Jan 2001-Proteins
TL;DR: With the rapid increase in CASP participation and in the number of submitted predictions, special emphasis is placed on methods allowing reliable pre‐classification of submissions and on techniques useful in automated evaluation of predictions.
Abstract: The Livermore Prediction Center conducted the target collection and prediction submission processes for Critical Assessment of Protein Structure Prediction (CASP4) and Critical Assessment of Fully Automated Structure Prediction Methods (CAFASP2). We have also evaluated all the submitted predictions using criteria and methods developed during the course of three previous CASP experiments and preparation for CASP4. We present an overview of the implemented system. Particular attention is paid to newly developed evaluation techniques and data presentation schemes. With the rapid increase in CASP participation and in the number of submitted predictions, special emphasis is placed on methods allowing reliable pre-classification of submissions and on techniques useful in automated evaluation of predictions. We also present an overview of our website, including target structures, predictions, and their evaluations ( http://predictioncenter.llnl.gov).

Journal ArticleDOI
01 Sep 2001-Proteins
TL;DR: The capability of simulating protein dynamics on and beyond the few hundred ps timescale with a demonstrably accurate quantum mechanical model will bring new opportunities to extend the understanding of a range of basic processes in biology such as molecular recognition and enzyme catalysis.
Abstract: Protein structure and dynamics are the keys to a wide range of problems in biology. In principle, both can be fully understood by using quantum mechanics as the ultimate tool to unveil the molecular interactions involved. Indeed, quantum mechanics of atoms and molecules have come to play a central role in chemistry and physics. In practice, however, direct application of quantum mechanics to protein systems has been prohibited by the large molecular size of proteins. As a consequence, there is no general quantum mechanical treatment that not only exceeds the accuracy of state-of-the-art empirical models for proteins but also maintains the efficiency needed for extensive sampling in the conformational space, a requirement mandated by the complexity of protein systems. Here we show that, given recent developments in methods, a general quantum mechanical-based treatment can be constructed. We report a molecular dynamics simulation of a protein, crambin, in solution for 350 ps in which we combine a semiempirical quantum-mechanical description of the entire protein with a description of the surrounding solvent, and solvent-protein interactions based on a molecular mechanics force field. Comparison with a recent very high-resolution crystal structure of crambin (Jelsch et al., Proc Natl Acad Sci USA 2000;102:2246-2251) shows that geometrical detail is better reproduced in this simulation than when several alternate molecular mechanics force fields are used to describe the entire system of protein and solvent, even though the structure is no less flexible. Individual atomic charges deviate in both directions from "canonical" values, and some charge transfer is found between the N and C-termini. The capability of simulating protein dynamics on and beyond the few hundred ps timescale with a demonstrably accurate quantum mechanical model will bring new opportunities to extend our understanding of a range of basic processes in biology such as molecular recognition and enzyme catalysis.

Journal ArticleDOI
01 Dec 2001-Proteins
TL;DR: The structure of Ves v 5 allows a detailed analysis of the epitopes that may participate in antigenic cross‐reactivity, findings that are useful for the development of a vaccine for treatment of insect allergy.
Abstract: Ves v 5 is one of three major allergens found in yellow-jacket venom: phospholipase A(1) (Ves v 1), hyaluronidase (Ves v 2), and antigen 5 (Ves v 5). Ves v 5 is related by high amino acid sequence identity to pathogenesis-related proteins including proteins from mammals, reptiles, insects, fungi, and plants. The crystal structure of Ves v 5 has been solved and refined to a resolution of 1.9 A. The majority of residues conserved between the pathogenesis-related proteins can be rationalized in terms of hydrogen bonding patterns and hydrophobic interactions defining an alpha-beta-alpha sandwich core structure. A small number of consensus residues are solvent exposed (including two adjacent histidines) and located in an elongated cavity that forms a putative active site. The site has no structural resemblance to previously characterized enzymes. Homologous antigen 5's from a large number of different yellow jackets, hornets, and paper wasps are known and patients show varying extents of cross-reactivity to the related antigen 5's. The structure of Ves v 5 allows a detailed analysis of the epitopes that may participate in antigenic cross-reactivity, findings that are useful for the development of a vaccine for treatment of insect allergy.

Journal ArticleDOI
01 Jan 2001-Proteins
TL;DR: There is clearly much more developmental work required before predictions with the accuracy of a good homology model, or even a good fold recognition model, can be made with use of this kind of approach.
Abstract: The results of applying a fragment-based protein tertiary structure prediction method to the prediction of 8 CASP4 targets are described. The method is based on the assembly of supersecondary structural fragments taken from highly resolved protein structures using a simulated annealing algorithm. Despite the significant degree of success in this case, there is clearly much more developmental work required before predictions with the accuracy of a good homology model, or even a good fold recognition model, can be made with use of this kind of approach.

Journal ArticleDOI
01 Feb 2001-Proteins
TL;DR: The observed binding mode explains the affinities of a series of structural analogs of galanthamine and provides a rational basis for structure‐based drug design of synthetic derivatives with improved pharmacological properties.
Abstract: The 3D structure of a complex of the anti-Alzheimer drug galanthamine with Torpedo californica acetylcholinesterase is reported. Galanthamine, a tertiary alkaloid extracted from several species of Amarylidacae, is so far the only drug that shows a dual activity, being both an acetylcholinesterase inhibitor and an allosteric potentiator of the nicotinic response induced by acetylcholine and competitive agonists. The X-ray structure, at 2.5A resolution, shows an unexpected orientation of the ligand within the active site, as well as unusual protein-ligand interactions. The inhibitor binds at the base of the active site gorge, interacting with both the acyl-binding pocket and the principal quaternary ammonium-binding site. However, the tertiary amine group of galanthamine does not directly interact with Trp84. A docking study using the program AUTODOCK correctly predicts the orientation of galanthamine in the active site. The docked lowest-energy structure has a root mean square deviation of 0.5A with respect to the corresponding crystal structure of the complex. The observed binding mode explains the affinities of a series of structural analogs of galanthamine and provides a rational basis for structure-based drug design of synthetic derivatives with improved pharmacological properties. Proteins 2001;42:182-191.