scispace - formally typeset
Search or ask a question

Showing papers in "Proteins in 2000"


Journal ArticleDOI
15 Nov 2000-Proteins
TL;DR: Analysis of amino acid sequences, based on the normalized net charge and mean hydrophobicity, has been applied to two sets of proteins and shows that “natively unfolded” proteins are specifically localized within a unique region of charge‐hydrophobia phase space.
Abstract: "Natively unfolded" proteins occupy a unique niche within the protein kingdom in that they lack ordered structure under conditions of neutral pH in vitro. Analysis of amino acid sequences, based on the normalized net charge and mean hydrophobicity, has been applied to two sets of proteins: small globular folded proteins and "natively unfolded" ones. The results show that "natively unfolded" proteins are specifically localized within a unique region of charge-hydrophobicity phase space and indicate that a combination of low overall hydrophobicity and large net charge represent a unique structural feature of "natively unfolded" proteins.

2,029 citations


Journal ArticleDOI
15 Aug 2000-Proteins
TL;DR: The resulting library shows significant differences from previous ones, differences validated by considering the likelihood of systematic misfitting of models to electron density maps and by plotting changes in rotamer frequency with B‐factor.
Abstract: All published rotamer libraries contain some rotamers that exhibit impossible inter- nal atomic overlaps if built in ideal geometry with all hydrogen atoms. Removal of uncertain residues (mainly those with B-factors >40 or van der Waals overlaps >0.4 A) greatly improves the clustering of rotamer populations. Asn, Gln, or His side chains additionally benefit from flipping of their planar terminal groups when required by atomic overlaps or H-bonding. Sensitivity to skew and to the bound- aries of x angle bins is avoided by using modes rather than traditional mean values. Rotamer defini- tions are listed both as the modal values and in a preferred version that maximizes common atoms between related rotamers. The resulting library shows significant differences from previous ones, differences validated by considering the likelihood of systematic misfitting of models to electron den- sity maps and by plotting changes in rotamer fre- quency with B-factor. Few rotamers now show atomic overlaps in ideal geometry; those overlaps are relatively small and can be understood in terms of bond angle distortions compensated by favorable interactions. The new library covers 94.5% of ex- amples in the highest quality protein data with 153 rotamers and can make a significant contribution to improving the accuracy of new structures. Proteins 2000;40:389 - 408. © 2000 Wiley-Liss, Inc.

1,045 citations


Journal ArticleDOI
15 Aug 2000-Proteins
TL;DR: The effect of training a neural network secondary structure prediction algorithm with different types of multiple sequence alignment profiles derived from the same sequences, is shown to provide a range of accuracy from 70.5% to 76.4%.
Abstract: The effect of training a neural net- work secondary structure prediction algorithm with different types of multiple sequence alignment pro- files derived from the same sequences, is shown to provide a range of accuracy from 70.5% to 76.4%. The best accuracy of 76.4% (standard deviation 8.4%), is 3.1% (Q 3) and 4.4% (SOV2) better than the PHD algorithm run on the same set of 406 sequence non-redundant proteins that were not used to train either method. Residues predicted by the new method with a confidence value of 5 or greater, have an average Q 3 accuracy of 84%, and cover 68% of the residues. Relative solvent accessibility based on a two state model, for 25, 5, and 0% accessibility are predicted at 76.2, 79.8, and 86.6% accuracy respec- tively. The source of the improvements obtained from training with different representations of the same alignment data are described in detail. The new Jnet prediction method resulting from this study is available in the Jpred secondary structure prediction server, and as a stand-alone computer program from: http://barton.ebi.ac.uk/. Proteins 2000; 40:502-511. © 2000 Wiley-Liss, Inc.

817 citations


Journal ArticleDOI
01 Oct 2000-Proteins
TL;DR: A critical view of the theoretical and practical bases for the practice of assigning a potential function to a protein on the basis of sequence similarity to proteins whose function has been experimentally investigated is presented.
Abstract: The widening gap between known protein sequences and their functions has led to the practice of assigning a potential function to a pro- tein on the basis of sequence similarity to proteins whose function has been experimentally investi- gated. We present here a critical view of the theoreti- cal and practical bases for this approach. The re- sults obtained by analyzing a significant number of true sequence similarities, derived directly from structural alignments, point to the complexity of function prediction. Different aspects of protein function, including (i) enzymatic function classifica- tion, (ii) functional annotations in the form of key words, (iii) classes of cellular function, and (iv) conservation of binding sites can only be reliably transferred between similar sequences to a modest degree. The reason for this difficulty is a combina- tion of the unavoidable database inaccuracies and the plasticity of protein function. In addition, analy- sis of the relationship between sequence and func- tional descriptions defines an empirical limit for pairwise-based functional annotations, namely, the three first digits of the six numbers used as descrip- tors of protein folds in the FSSP database can be predicted at an average level as low as 7.5% se- quence identity, two of the four EC digits at 15% identity, half of the SWISS-PROT key words related to protein function would require 20% identity, and the prediction of half of the residues in the binding site can be made at the 30% sequence identity level.

760 citations


Journal ArticleDOI
01 May 2000-Proteins
TL;DR: A new computational method of docking pairs of proteins by using spherical polar Fourier correlations to accelerate the search for candidate low‐energy conformations, augmented by a rigorous but “soft” model of electrostatic complementarity.
Abstract: We present a new computational method of docking pairs of proteins by using spherical polar Fourier correlations to accelerate the search for candidate low-energy conformations. Interaction energies are estimated using a hydrophobic excluded volume model derived from the notion of "overlapping surface skins," augmented by a rigorous but "soft" model of electrostatic complementarity. This approach has several advantages over former three-dimensional grid-based fast Fourier transform (FFT) docking correlation methods even though there is no analogue to the FFT in a spherical polar representation. For example, a complete search over all six rigid-body degrees of freedom can be performed by rotating and translating only the initial expansion coefficients, many unfeasible orientations may be eliminated rapidly using only low-resolution terms, and the correlations are easily localized around known binding epitopes when this knowledge is available. Typical execution times on a single processor workstation range from 2 hours for a global search (5 x 10(8) trial orientations) to a few minutes for a local search (over 6 x 10(7) orientations). The method is illustrated with several domain dimer and enzyme-inhibitor complexes and 20 large antibody-antigen complexes, using both the bound and (when available) unbound subunits. The correct conformation of the complex is frequently identified when docking bound subunits, and a good docking orientation is ranked within the top 20 in 11 out of 18 cases when starting from unbound subunits. Proteins 2000;39:178-194.

586 citations


Journal ArticleDOI
01 Oct 2000-Proteins
TL;DR: Being a fast method, the RTB approach can be useful for normal mode analyses of large systems, paving the way for further developments and applications in contexts for which the normal modes are needed frequently, as for example during molecular dynamics calculations.
Abstract: Normal mode analysis of proteins of various sizes, ranging from 46 (crambin) up to 858 residues (dimeric citrate synthase) were performed, by using standard approaches, as well as a recently proposed method that rests on the hypothesis that low-frequency normal modes of proteins can be described as pure rigid-body motions of blocks of consecutive amino-acid residues. Such a hypothesis is strongly supported by our results, because we show that the latter method, named RTB, yields very accurate approximations for the low-frequency normal modes of all proteins considered. Moreover, the quality of the normal modes thus obtained depends very little on the way the polypeptidic chain is split into blocks. Noteworthy, with six amino-acids per block, the normal modes are almost as accurate as with a single amino-acid per block. In this case, for a protein of n residues and N atoms, the RTB method requires the diagonalization of an n x n matrix, whereas standard procedures require the diagonalization of a 3N x 3N matrix. Being a fast method, our approach can be useful for normal mode analyses of large systems, paving the way for further developments and applications in contexts for which the normal modes are needed frequently, as for example during molecular dynamics calculations.

496 citations


Journal ArticleDOI
01 Nov 2000-Proteins
TL;DR: Screening the Swissprot database revealed 3,000 repeats not annotated in existing domain databases, illustrating how in times when curated databases grapple with ever increasing backlogs, automatic (re)analysis of sequences provides an efficient way to capture this important information.
Abstract: Many large proteins have evolved by internal duplication and many internal sequence repeats correspond to functional and structural units. We have developed an automatic algorithm, RADAR, for segmenting a query sequence into repeats. The segmentation procedure has three steps: (i) repeat length is determined by the spacing between suboptimal self-alignment traces; (ii) repeat borders are optimized to yield a maximal integer number of repeats, and (iii) distant repeats are validated by iterative profile alignment. The method identifies short composition biased as well as gapped approximate repeats and complex repeat architectures involving many different types of repeats in the query sequence. No manual intervention and no prior assumptions on the number and length of repeats are required. Comparison to the Pfam-A database indicates good coverage, accurate alignments, and reasonable repeat borders. Screening the Swissprot database revealed 3,000 repeats not annotated in existing domain databases. A number of these repeats had been described in the literature but most were novel. This illustrates how in times when curated databases grapple with ever increasing backlogs, automatic (re)analysis of sequences provides an efficient way to capture this important information. Proteins 2000;41:224–237. © 2000 Wiley-Liss, Inc.

312 citations


Journal ArticleDOI
01 Jun 2000-Proteins
TL;DR: A new computationally efficient and automated “soft docking” algorithm is described to assist the prediction of the mode of binding between two proteins, using the three‐dimensional structures of the unbound molecules.
Abstract: A new computationally efficient and automated “soft docking” algorithm is described to assist the prediction of the mode of binding between two proteins, using the three-dimensional structures of the unbound molecules. The method is implemented in a software package called BiGGER (Bimolecular Complex Generation with Global Evaluation and Ranking) and works in two sequential steps: first, the complete 6-dimensional binding spaces of both molecules is systematically searched. A population of candidate protein-protein docked geometries is thus generated and selected on the basis of the geometric complementarity and amino acid pairwise affinities between the two molecular surfaces. Most of the conformational changes observed during protein association are treated in an implicit way and test results are equally satisfactory, regardless of starting from the bound or the unbound forms of known structures of the interacting proteins. In contrast to other methods, the entire molecular surfaces are searched during the simulation, using absolutely no additional information regarding the binding sites. In a second step, an interaction scoring function is used to rank the putative docked structures. The function incorporates interaction terms that are thought to be relevant to the stabilization of protein complexes. These include: geometric complementarity of the surfaces, explicit electrostatic interactions, desolvation energy, and pairwise propensities of the amino acid side chains to contact across the molecular interface. The relative functional contribution of each of these interaction terms to the global scoring function has been empirically adjusted through a neural network optimizer using a learning set of 25 protein-protein complexes of known crystallographic structures. In 22 out of 25 protein-protein complexes tested, near-native docked geometries were found with Cα RMS deviations ≤ 4.0 A from the experimental structures, of which 14 were found within the 20 top ranking solutions. The program works on widely available personal computers and takes 2 to 8 hours of CPU time to run any of the docking tests herein presented. Finally, the value and limitations of the method for the study of macromolecular interactions, not yet revealed by experimental techniques, are discussed. Proteins 2000;39:372–384. © 2000 Wiley-Liss, Inc.

308 citations


Journal ArticleDOI
01 Jun 2000-Proteins
TL;DR: A number of studies have addressed the question of which are the critical residues at protein‐binding sites, but although the total number of mutations was large, the number of protein interfaces was small, with some of the interfaces closely related.
Abstract: A number of studies have addressed the question of which are the critical residues at protein-binding sites. These studies examined either a single or a few protein–protein interfaces. The most extensive study to date has been an analysis of alanine-scanning mutagenesis. However, although the total number of mutations was large, the number of protein interfaces was small, with some of the interfaces closely related. Here we show that although overall binding sites are hydrophobic, they are studded with specific, conserved polar residues at specific locations, possibly serving as energy “hot spots.” Our results confirm and generalize the alanine-scanning data analysis, despite its limited size. Previously Trp, Arg, and Tyr were shown to constitute energetic hot spots. These were rationalized by their polar interactions and by their surrounding rings of hydrophobic residues. However, there was no compelling reason as to why specifically these residues were conserved. Here we show that other polar residues are similarly conserved. These conserved residues have been detected consistently in all interface families that we have examined. Our results are based on an extensive examination of residues which are in contact across protein interfaces. We utilize all clustered interface families with at least five members and with sequence similarity between the members in the range of 20–90%. There are 11 such clustered interface families, comprising a total of 97 crystal structures. Our three-dimensional superpositioning analysis of the occurrences of matched residues in each of the families identifies conserved residues at spatially similar environments. Additionally, in enzyme inhibitors, we observe that residues are more conserved at the interfaces than at other locations. On the other hand, antibody–protein interfaces have similar surface conservation as compared to their corresponding linear sequence alignment, consistent with the suggestion that evolution has optimized protein interfaces for function. Proteins 2000;39:331–342. © 2000 Wiley-Liss, Inc.

307 citations


Journal ArticleDOI
15 May 2000-Proteins
TL;DR: It is found that there is no correlation between backbone movement of a residue upon ligand binding and the flexibility of its side chain, which is relevant to reduction of search space in docking algorithms by inclusion of side‐chain flexibility for a limited number of binding pocket residues.
Abstract: Ligand binding may involve a wide range of structural changes in the receptor protein, from hinge movement of entire domains to small side-chain rearrangements in the binding pocket residues. The analysis of side chain flexibility gives insights valuable to improve docking algorithms and can provide an index of amino-acid side-chain flexibility potentially useful in molecular biology and protein engineering studies. In this study we analyzed side-chain rearrangements upon ligand binding. We constructed two non-redundant data- bases (980 and 353 entries) of "paired" protein struc- tures in complexed (holo-protein) and uncomplexed (apo-protein) forms from the PDB macromolecular structural database. The number and identity of binding pocket residues that undergo side-chain conformational changes were determined. We show that, in general, only a small number of residues in the pocket undergo such changes (e.g., ;85% of cases show changes in three residues or less). The flexibility scale has the following order: Lys > Arg, Gln, Met > Glu, Ile, Leu > Asn, Thr, Val, Tyr, Ser, His, Asp > Cys, Trp, Phe; thus, Lys side chains in binding pockets flex 25 times more often then do the Phe side chains. Normalizing for the number of flexible dihe- dral bonds in each amino acid attenuates the scale somewhat, however, the clear trend of large, polar amino acids being more flexible in the pocket than aromatic ones remains. We found no correlation between backbone movement of a residue upon ligand binding and the flexibility of its side chain. These results are relevant to 1. Reduction of search space in docking algorithms by inclusion of side- chain flexibility for a limited number of binding pocket residues; and 2. Utilization of the amino acid flexibility scale in protein engineering studies to alter the flexibility of binding pockets. Proteins 2000;39:261-268. © 2000 Wiley-Liss, Inc.

298 citations


Journal ArticleDOI
15 Aug 2000-Proteins
TL;DR: Overall this study invites attention to the robustness of the average properties controlled by the low frequency motions, which are invariably reproduced in all approaches, and the utility and efficiency of the ANM, the computational time cost of which is of the order of “minutes” (real time) as opposed to “days” for MD simulations.
Abstract: The dynamics of a-amylase inhibi- tors has been investigated using molecular dynam- ics (MD) simulations and two analytical approaches, the Gaussian network model (GNM) and anisotropic network model (ANM). MD simulations use a full atomic approach with empirical force fields, while the analytical approaches are based on a coarse- grained single-site-per-residue model with a single- parameter harmonic potential between sufficiently close (r < 7 A) residue pairs. The major difference between the GNM and the ANM is that no direc- tional preferences can be obtained in the GNM, all residue fluctuations being theoretically isotropic, while ANM does incorporate directional prefer- ences. The dominant modes of motions are identi- fied by (i) the singular value decomposition (SVD) of the MD trajectory matrices, and (ii) the similarity transformation of the Kirchhoff matrices of inter- residue contacts in the GNM or ANM. The mean- square fluctuations of individual residues and the cross-correlations between domain movements re- tain the same characteristics, in all approaches— although the dispersion of modes and detailed ampli- tudes of motion obtained in the ANM conform more closely with MD results. The major weakness of the analytical approaches appears, on the other hand, to be their inadequacy to account for the anhar- monic motions or multimeric transitions driven by the slowest collective mode observed in MD. Such motions usually suffer, however, from MD sampling inefficiencies, and multiple independent runs should be tested before making conclusions about their validity and detailed mechanisms. Overall this study invites attention to (i) the robustness of the average properties (mean-square fluctuations, cross-correla- tions) controlled by the low frequency motions, which are invariably reproduced in all approaches, and (ii) the utility and efficiency of the ANM, the computational time cost of which is of the order of "minutes" (real time), as opposed to "days" for MD simulations. Proteins 2000;40:512-524.

Journal ArticleDOI
15 Nov 2000-Proteins
TL;DR: By using an unsupervised cluster analyzer, a local structural alphabet composed of 16 folding patterns of five consecutive Cα (“protein blocks”) is identified and the dependence that exists between successive blocks is explicitly taken into account.
Abstract: By using an unsupervised cluster analyzer, we have identified a local structural alpha- bet composed of 16 folding patterns of five consecu- tive C a ("protein blocks"). The dependence that exists between successive blocks is explicitly taken into account. A Bayesian approach based on the relation protein block-amino acid propensity is used for prediction and leads to a success rate close to 35%. Sharing sequence windows associated with certain blocks into "sequence families" improves the prediction accuracy by 6%. This prediction accu- racy exceeds 75% when keeping the first four pre- dicted protein blocks at each site of the protein. In addition, two different strategies are proposed: the first one defines the number of protein blocks in each site needed for respecting a user-fixed predic- tion accuracy, and alternatively, the second one defines the different protein sites to be predicted with a user-fixed number of blocks and a chosen accuracy. This last strategy applied to the ubiquitin conjugating enzyme (a/b protein) shows that 91% of the sites may be predicted with a prediction accu- racy larger than 77% considering only three blocks per site. The prediction strategies proposed im- prove our knowledge about sequence-structure de- pendence and should be very useful in ab initio protein modelling. Proteins 2000;41:271-287.

Journal ArticleDOI
01 Jan 2000-Proteins
TL;DR: The history of the linear extrapolation method is traced, how the method is used to measure protein stability is reviewed, and some of the other important uses are discussed.
Abstract: The two most common methods of measuring the conformational stability of a protein are differential scanning calorimetry and an analysis of solvent denaturation curves by using the linear extrapolation method. In this article, we trace the history of the linear extrapolation method, review how the method is used to measure protein stability, and then discuss some of the other important uses.

Journal ArticleDOI
01 Jun 2000-Proteins
TL;DR: Using optimized hydropathy analysis of proteins in several, diverse proteomes, it is shown that organisms of the three domains of life—Eukarya, Eubacteria, and Archaea—have similar proportions of α‐helical membrane proteins within their genomes and that these are matched by the complexity of the aqueous components.
Abstract: One may speculate that higher or- ganisms require a proportionately greater abun- dance of membrane proteins within their genomes in order to furnish the requirements of differenti- ated cell types, compartmentalization, and intercel- lular signalling. With the recent availability of sev- eral complete prokaryotic genome sequences and sufficient progress in many eukaryotic genome se- quencing projects, we seek to test this hypothesis. Using optimized hydropathy analysis of proteins in several, diverse proteomes, we show that organisms of the three domains of life—Eukarya, Eubacteria, and Archaea— have similar proportions of a-helical membrane proteins within their genomes and that these are matched by the complexity of the aqueous components. Proteins 2000;39:417- 420.

Journal ArticleDOI
01 Oct 2000-Proteins
TL;DR: Scores calculated from intermolecular contacts of proteins in the crystalline state are used to differentiate monomeric and homodimeric proteins, by classification into two categories separated by a cut‐off score value.
Abstract: Scores calculated from intermolecular contacts of proteins in the crystalline state are used to differentiate monomeric and homodimeric proteins, by classification into two categories separated by a cut-off score value. The generalized classification error is estimated by using bootstrap re-sampling on a nonredundant set of 172 water-soluble proteins whose prevalent quaternary state in solution is known to be either monomeric or homodimeric. A statistical potential, based on atom-pair frequencies across interfaces observed with homodimers, is found to yield an error rate of 12.5%. This indicates a small but significant improvement over the measure of solvent accessible surface area buried in the contact interface, which achieves an error rate of 15.4%. A further modification of the latter parameter relating the two most extensive contacts of the crystal results in an even lower error rate of 11.1%.

Journal ArticleDOI
01 Jul 2000-Proteins
TL;DR: The quality of the longest SCOP‐query/SCOP‐hit alignment via an intermediate sequence is examined, and it is found that ISS produced longer alignments than PSI‐BLAST searches alone, of nearly comparable per‐residue quality.
Abstract: Sequence alignment programs such as BLAST and PSI-BLAST are used routinely in pairwise, profile-based, or intermediate-sequence-search (ISS) methods to detect remote homologies for the purposes of fold assignment and comparative modeling. Yet, the sequence alignment quality of these methods at low sequence identity is not known. We have used the CE structure alignment program (Shindyalov and Bourne, Prot Eng 1998;11:739) to derive sequence alignments for all superfamily and family-level related proteins in the SCOP domain database. CE aligns structures and their sequences based on distances within each protein, rather than on interprotein distances. We compared BLAST, PSI-BLAST, CLUSTALW, and ISS alignments with the CE structural alignments. We found that global alignments with CLUSTALW were very poor at low sequence identity (<25%), as judged by the CE alignments. We used PSI-BLAST to search the nonredundant sequence database (nr) with every sequence in SCOP using up to four iterations. The resulting matrix was used to search a database of SCOP sequences. PSI-BLAST is only slightly better than BLAST in alignment accuracy on a per-residue basis, but PSI-BLAST matrix alignments are much longer than BLAST's, and so align correctly a larger fraction of the total number of aligned residues in the structure alignments. Any two SCOP sequences in the same superfamily that shared a hit or hits in the nr PSI-BLAST searches were identified as linked by the shared intermediate sequence. We examined the quality of the longest SCOP-query/ SCOP-hit alignment via an intermediate sequence, and found that ISS produced longer alignments than PSI-BLAST searches alone, of nearly comparable per-residue quality. At 10-15% sequence identity, BLAST correctly aligns 28%, PSI-BLAST 40%, and ISS 46% of residues according to the structure alignments. We also compared CE structure alignments with FSSP structure alignments generated by the DALI program. In contrast to the sequence methods, CE and structure alignments from the FSSP database identically align 75% of residue pairs at the 10-15% level of sequence identity, indicating that there is substantial room for improvement in these sequence alignment methods. BLAST produced alignments for 8% of the 10,665 nonimmunoglobulin SCOP superfamily sequence pairs (nearly all <25% sequence identity), PSI-BLAST matched 17% and the double-PSI-BLAST ISS method aligned 38% with E-values <10.0. The results indicate that intermediate sequences may be useful not only in fold assignment but also in achieving more complete sequence alignments for comparative modeling.

Journal ArticleDOI
15 Aug 2000-Proteins
TL;DR: The application of fold recognition methods in order to produce a model of the HCV E2 protein and experimental evidence is provided to show that CD81 recognition by E2 is isolate or strain specific and possibly mediated by the second hypervariable region (HVR2) of E2.
Abstract: Several experimental studies on hepatitis C virus (HCV) have suggested the envelope glycoprotein E2 as a key antigen for an effective vaccine against the virus. Knowledge of its structure, therefore, would present a significant step forward in the fight against this disease. This paper reports the application of fold recognition methods in order to produce a model of the HCV E2 protein. Such investigation highlighted the envelope protein E of Tick Borne Encephalitis virus as a possible template for building a model of HCV E2. Mapping of experimental data onto the model allowed the prediction of a composite interaction site between E2 and its proposed cellular receptor CD81, as well as a heparin binding domain. In addition, experimental evidence is provided to show that CD81 recognition by E2 is isolate or strain specific and possibly mediated by the second hypervariable region (HVR2) of E2. Finally, the studies have also allowed a rough model for the quaternary structure of the envelope glycoproteins E1 and E2 complex to be proposed. Proteins 2000;40:355-366.

Journal ArticleDOI
01 Jun 2000-Proteins
TL;DR: It is found in the present case that the contribution from the non‐polar states to the protein‐ligand binding energy is rather small, but it is clearly expected that this term is not negligible in cases where the protein provides preorganized environment to stabilize the residual charges of the ligand.
Abstract: Several strategies for evaluation of the protein-ligand binding free energies are examined. Particular emphasis is placed on the Linear Response Approximation (LRA) (Lee et. al., Prot Eng 1992;5:215-228) and the Linear Interaction Energy (LIE) method (Aqvist et. al., Prot Eng 1994;7:385-391). The performance of the Protein Dipoles Langevin Dipoles (PDLD) method and its semi-microscopic version (the PDLD/S method) is also considered. The examination is done by using these methods in the evaluating of the binding free energies of neutral C2-symmetric cyclic urea-based molecules to Human Immunodeficiency Virus (HIV) protease. Our starting point is the introduction of a thermodynamic cycle that decomposes the total binding free energy to electrostatic and non-electrostatic contributions. This cycle is closely related to the cycle introduced in our original LRA study (Lee et. al., Prot Eng 1992;5:215-228). The electrostatic contribution is evaluated within the LRA formulation by averaging the protein-ligand (and/or solvent-ligand) electrostatic energy over trajectories that are propagated on the potentials of both the polar and non-polar (where all residual charges are set to zero) states of the ligand. This average involves a scaling factor of 0.5 for the contributions from each state and this factor is being used in both the LRA and LIE methods. The difference is, however, that the LIE method neglects the contribution from trajectories over the potential of the non-polar state. This approximation is entirely valid in studies of ligands in water but not necessarily in active sites of proteins. It is found in the present case that the contribution from the non-polar states to the protein-ligand binding energy is rather small. Nevertheless, it is clearly expected that this term is not negligible in cases where the protein provides preorganized environment to stabilize the residual charges of the ligand. This contribution can be particularly important in cases of charged ligands. The analysis of the non-electrostatic term is much more complex. It is concluded that within the LRA method one has to complete the relevant thermodynamic cycle by evaluating the binding free energy of the "non-polar" ligand, l;, where all the residual charges are set to zero. It is shown that the LIE term, which involves the scaling of the van der Waals interaction by a constant beta (usually in the order of 0.15 to 0.25), corresponds to this part of the cycle. In order to elucidate the nature of this non-electrostatic term and the origin of the scaling constant beta, it is important to evaluate explicitly the different contributions to the binding energy of the non-polar ligand, DeltaG(bind,l;). Since this cannot be done at present (for relatively large ligands) by rigorous free energy perturbation approaches, we evaluate DeltaG(bind,l;) by the PDLD approach, augmented by microscopic calculations of the change in configurational entropy upon binding. This evaluation takes into account the van der Waals, hydrophobic, water penetration and entropic contributions, which are the most important free energy contributions that make up the total DeltaG(bind,l;). The sum of these contributions is scaled by a factor straight theta and it is argued that obtaining a quantitative balance between these contributions should result in straight theta = 1. By doing so we should have a reliable estimate of the value of the LIE beta and a way to understand its origin. The present approach gives straight theta values between 0.5 and 0.73, depending on the approximation used. This is encouraging but still not satisfying. Nevertheless, one might be able to use our PDLD approach to estimate the change of the LIE straight theta between different protein active sites. It is pointed out that the LIE method is quite similar to our original approach where the electrostatic term was evaluated by the LRA method and the non-electrostatic term by the PDLD method (with its vdw, solvation,

Journal ArticleDOI
15 Aug 2000-Proteins
TL;DR: The computer system PROSPECT for the protein fold recognition using the threading method and allows a user to incorporate constraints about a target protein, e.g., disulfide bonds, active sites, and NOE distance restraints, into thethreading process.
Abstract: The computer system PROSPECT for the protein fold recognition using the threading method is described and evaluated in this article. For a given target protein sequence and a template structure, PROSPECT guarantees to find a globally optimal threading alignment between the two. The scoring function for a threading alignment em- ployed in PROSPECT consists of four additive terms: i) a mutation term, ii) a singleton fitness term, iii) a pairwise-contact potential term, and iv) alignment gap penalties. The current version of PROSPECT considers pair contacts only between core (a-helix or b-strand) residues and alignment gaps only in loop regions. PROSPECT finds a globally optimal threading efficiently when pairwise contacts are considered only between residues that are spatially close (7 A or less between the C b atoms in the current implementation). On a test set consisting of 137 pairs of target-template proteins, each pair being from the same superfamily and having se- quence identity < 30%, PROSPECT recognizes 69% of the templates correctly and aligns 66% of the structurally alignable residues correctly. These num- bers may be compared with the 55% fold recognition and 64% alignment accuracy for the same test set using only scoring terms i), ii), and (iv), indicating the significant contribution from the contact term. The fold recognition and alignment accuracy are further improved to 72% and 74%, respectively, when the secondary structure information predicted by the PHD program is used in scoring. PROSPECT also allows a user to incorporate constraints about a target protein, e.g., disulfide bonds, active sites, and NOE distance restraints, into the threading process. The system rigorously finds a globally optimal threading under the specified constraints. Test re- sults have shown that the constraints can further improve the performance of PROSPECT. Proteins 2000;40:343-354. © 2000 Wiley-Liss, Inc.

Journal ArticleDOI
01 Oct 2000-Proteins
TL;DR: It is shown that it is impossible to find a pair potential with the above flexible form that recognizes all native folds, and a potential that rates correctly a subset of the decoy structures was constructed and optimized.
Abstract: The results of an optimization of a folding potential are reported The complete energy function is modeled as a sum of pairwise interactions with a flexible functional form The relevant distance between two amino acids (2 − 9 A) is divided into 13 intervals, and the energy of each interval is optimized independently We show, in accord with a previous publication (Tobi et al, Proteins 2000;40:71–85) that it is impossible to find a pair potential with the above flexible form that recognizes all native folds Nevertheless, a potential that rates correctly a subset of the decoy structures was constructed and optimized The resulting potential is compared with a distance-dependent statistical potential of Bahar and Jernigan It is further tested against decoy structures that were created in the Levitt's group On average, the new potential places native shapes lower in energy and provides higher Z scores than other potentials Proteins 2000;41:40–46 © 2000 Wiley-Liss, Inc

Journal ArticleDOI
01 Jun 2000-Proteins
TL;DR: This work investigates the stability of three different ensembles of the 36‐mer villin headpiece subdomain, the native, a compact folding intermediate, and the random coil, finding the native ensemble to be ≈26 kcal/mol more stable than the folding intermediate and ≈39 kcal/ mol morestable than the random Coil ensemble.
Abstract: We investigated the stability of three different ensembles of the 36-mer villin headpiece subdomain, the native, a compact folding intermediate, and the random coil. Structures were taken from a 1-micros molecular dynamics folding simulation and a 100-ns control simulation on the native structure. Our approach for each conformation is to first determine the solute internal energy from the molecular mechanics potential and then to add the change resulting from solvation (DeltaG(solv)). Explicit water was used to run the simulation, and a continuum model was used to estimate DeltaG(solv) with the finite difference Poisson-Boltzmann model accounting for the polarization part and a linearly surface area-dependent term for the non-polar part. We leave out the solute vibrational entropy from these values but demonstrate that there is no statistical difference among the native, folding intermediate, and random coil ensembles. We find the native ensemble to be approximately 26 kcal/mol more stable than the folding intermediate and approximately 39 kcal/mol more stable than the random coil ensemble. With an experimental estimate for the free energy of denaturation equal to 3 kcal/mol, we approximate the non-native degeneracy to lie between 10(16) and 10.(25) We also present a possible scheme for the mechanism of folding, first-order exponential decay of a putative transition state, with an estimate for the t(1/2) of folding of approximately 1 micros.

Journal ArticleDOI
01 Sep 2000-Proteins
TL;DR: The recently solved first crystal structure of the vertebrate‐type ferredoxin, the truncated adrenodoxin Adx(4‐108), is discussed, that offers the unique opportunity for better understanding of the structure‐function relationships and stabilization of this protein, as well as of the molecular architecture of [2Fe‐2S] ferredoxins in general.
Abstract: Adrenodoxin is an iron-sulfur protein that belongs to the broad family of the [2Fe-2S]-type ferredoxins found in plants, animals and bacteria. Its primary function as a soluble electron carrier between the NADPH-dependent adrenodoxin reductase and several cytochromes P450 makes it an irreplaceable component of the steroid hormones biosynthesis in the adrenal mitochondria of vertebrates. This review intends to summarize current knowledge about structure, function, and biochemical behavior of this electron transferring protein. We discuss the recently solved first crystal structure of the vertebrate-type ferredoxin, the truncated adrenodoxin Adx(4-108), that offers the unique opportunity for better understanding of the structure-function relationships and stabilization of this protein, as well as of the molecular architecture of [2Fe-2S] ferredoxins in general. The aim of this review is also to discuss molecular requirements for the formation of the electron transfer complex. Essential comparison between bacterial putidaredoxin and mammalian adrenodoxin will be provided. These proteins have similar tertiary structure, but show remarkable specificity for interactions only with their own cognate cytochrome P450. The discussion will be largely centered on the protein-protein recognition and kinetics of adrenodoxin dependent reactions. Proteins 2000;40:590–612. © 2000 Wiley-Liss, Inc.

Journal ArticleDOI
01 Jan 2000-Proteins
TL;DR: A structure‐based thermodynamic stability analysis of non‐structurally homologous proteins for which high resolution structures of their complexes with specific ligands are available indicates that for all 16 proteins considered, the binding sites have a dual character and are characterized by the presence of regions with very low structural stability and regions with high stability.
Abstract: During the course of biological function, proteins interact with other proteins, ligands, substrates, inhibitors, etc. These interactions occur at precisely defined locations within the protein but their effects are sometimes propagated to distal regions, triggering highly specific responses. These effects can be used as signals directed to activate or inhibit other sites, modulate interactions with other molecules, and/or establish inter-molecular communication networks. During the past decade, it has become evident that the energy of stabilization of the protein structure is not evenly distributed throughout the molecule and that, under native conditions, proteins lack global cooperativity and are characterized by the occurrence of multiple independent local unfolding events. From a biological point of view, it is important to assess if this uneven distribution reflects specific functional requirements. For example, are binding sites more likely to be found in well structured regions, unstable regions, or mixed regions? In this article, we have addressed these questions by performing a structure-based thermodynamic stability analysis of non-structurally homologous proteins for which high resolution structures of their complexes with specific ligands are available. The results of these studies indicate that for all 16 proteins considered, the binding sites have a dual character and are characterized by the presence of regions with very low structural stability and regions with high stability. In many cases the low stability regions are loops that become stable and cover a significant portion of low molecular weight ligands upon binding. For enzymes, catalytic residues are usually, but not always, located in regions with high structural stability. It is shown that this arrangement provides significant advantages for the optimization of binding affinity of small ligands. In allosteric enzymes, low stability regions in the regulatory site are shown to play a crucial role in the transmission of information to the catalytic site.

Journal ArticleDOI
01 Jan 2000-Proteins
TL;DR: A novel method of heavy‐atom analysis was used to overcome difficulties in interpretation of extremely anisotropic diffraction and provide an empirical determination of the structure of tropomyosin at 7Å resolution.
Abstract: Tropomyosin is a 400A-long coiled coil that polymerizes to form a continuous filament that associates with actin in muscle and numerous non-muscle cells. Tropomyosin and troponin together form a calcium-sensitive switch that is responsible for thin-filament regulation of striated muscle. Subtle structural features of the molecule, including non-canonical aspects of its coiled-coil motif, undoubtedly influence its association with f-actin and its role in thin filament regulation. Previously, careful inspection of native diffraction intensities was sufficient to construct a model of tropomyosin at 9A resolution in a spermine-induced crystal form that diffracts anisotropically to 4A resolution. Single isomorphous replacement (SIR) phasing has now provided an empirical determination of the structure at 7A resolution. A novel method of heavy-atom analysis was used to overcome difficulties in interpretation of extremely anisotropic diffraction. The packing arrangement of the molecules in the crystal, and important aspects of the tropomyosin geometry such as non-uniformities of the pitch and variable bending and radius of the coiled coil are evident. Proteins 2000;38:49–59. ©2000 Wiley-Liss, Inc.

Journal ArticleDOI
15 Feb 2000-Proteins
TL;DR: Analysis of the kinetics of water escape in terms of a survival time correlation function shows a power law behavior in time that can be interpreted in termsof a broad distribution of energy barriers, relative to κBT, for water exchange.
Abstract: The kinetics of water penetration and escape in cytochrome c (cyt c) is studied by molecular dynamics (MD) simulations at various temperatures. Water molecules that penetrate the protein interior during the course of an MD simulation are identified by monitoring the number of water molecules in the first coordination shell (within 3.5 A) of each water molecule in the system. Water molecules in the interior of cyt c have 0–3 water molecules in their first hydration shell and this coordination number persists for extended periods of time. At T = 300 K we identify over 200 events in which water molecules penetrate the protein and reside inside for at least 5 picoseconds (ps) within a 1.5 nanoseconds (ns) time period. Twenty-seven (27) water molecules reside for at least 300 ps, 17 water molecules reside in the protein interior for times longer than 500 ps, and two interior water molecules do not escape; at T = 360 K one water molecule does not escape; at 430 K all water molecules exchange. Some of the internal water molecules show mean square displacements (MSD) of 1 A2 characteristic of structural waters. Others show MSD as large as 12 A2, suggesting that some of these water molecules occupy transient cavities and diffuse extensively within the protein. Motions of protein-bound water molecules are rotationally hindered, but show large librations. Analysis of the kinetics of water escape in terms of a survival time correlation function shows a power law behavior in time that can be interpreted in terms of a broad distribution of energy barriers, relative to κBT, for water exchange. At T = 300 K estimates of the roughness of the activation energy distribution is 4–10 kJ/mol (2–4 κBT). Activation enthalpies for water escape are 6–23 kJ/mol. The difference in activation entropies between fast exchanging (0.01 ns) and slow exchanging (0.1–1 ns) water molecules is −27 J/K/mol. Dunitz (Science 1997;264:670.) has estimated the maximum entropy loss of a water molecule due to binding to be 28 J/K/mol. Therefore, our results suggest that the entropy of interior water molecules is similar to entropy of bulk water. Proteins 2000;38:261–272. Published 2000 Wiley-Liss, Inc.

Journal ArticleDOI
01 Sep 2000-Proteins
TL;DR: In this paper, an analytical model with experimental parameters from chymotrypsin inhibitor 2 was used to elucidate the relationship among several different van't Hoff enthalpies used in calorimetric analyses.
Abstract: The experimental calorimetric two-state criterion requires the van't Hoff enthalpy DeltaH(vH) around the folding/unfolding transition midpoint to be equal or very close to the calorimetric enthalpy DeltaH(cal) of the entire transition. We use an analytical model with experimental parameters from chymotrypsin inhibitor 2 to elucidate the relationship among several different van't Hoff enthalpies used in calorimetric analyses. Under reasonable assumptions, the implications of these DeltaH(vH)'s being approximately equal to DeltaH(cal) are equivalent: Enthalpic variations among denatured conformations in real proteins are much narrower than some previous lattice-model estimates, suggesting that the energy landscape theory "folding to glass transition temperature ratio" T(f) /T(g) may exceed 6.0 for real calorimetrically two-state proteins. Several popular three-dimensional lattice protein models, with different numbers of residue types in their alphabets, are found to fall short of the high experimental standard for being calorimetrically two-state. Some models postulate a multiple-conformation native state with substantial pre-denaturational energetic fluctuations well below the unfolding transition temperature, or predict a significant post-denaturational continuous conformational expansion of the denatured ensemble at temperatures well above the transition point, or both. These scenarios either disagree with experiments on protein size and dynamics, or are inconsistent with conventional interpretation of calorimetric data. However, when empirical linear baseline subtractions are employed, the resulting DeltaH(vH)/DeltaH(cal)'s for some models can be increased to values closer to unity, and baseline subtractions are found to correspond roughly to an operational definition of native-state conformational diversity. These results necessitate a re-assessment of theoretical models and experimental interpretations.

Journal ArticleDOI
01 Feb 2000-Proteins
TL;DR: It is demonstrated that 60% correctness is the upper limit for a 4‐type class prediction from amino acid composition alone for an unknown query protein, based on the normality assumption and the Bayes decision rule for minimum error.
Abstract: Proteins of known structures are usually classified into four structural classes: all-a, all-beta, alpha+beta, and alpha/beta type of proteins. A number of methods to predicting the structural class of a protein based on its amino acid composition have been developed during the past few years. Recently, a component-coupled method was developed for predicting protein structural class according to amino acid composition. This method is based on the least Mahalanobis distance principle, and yields much better predicted results in comparison with the previous methods. However, the success rates reported for structural class prediction by different investigators are contradictory. The highest reported accuracies by this method are near 100%, but the lowest one is only about 60%. The goal of this study is to resolve this paradox and to determine the possible upper limit of prediction rate for structural classes. In this paper, based on the normality assumption and the Bayes decision rule for minimum error, a new method is proposed for predicting the structural class of a protein according to its amino acid composition. The detailed theoretical analysis indicates that if the four protein folding classes are governed by the normal distributions, the present method will yield the optimum predictive result in a statistical sense. A non-redundant data set of 1,189 protein domains is used to evaluate the performance of the new method, Our results demonstrate that 60% correctness is the upper limit for a 4-type class prediction from amino acid composition alone for an unknown query protein. The apparent relatively high accuracy level (more than 90%) attained in the previous studies was due to the preselection of test sets, which may not be adequately representative of all unrelated proteins. Proteins 2000;38:165-175, (C) 2000 Wiley-Liss, Inc.

Journal ArticleDOI
01 Mar 2000-Proteins
TL;DR: The results indicate that salt bridges and their networks may have an important role in resisting deformation/unfolding of the protein structure at high temperatures, particularly in critical regions such as around the active site.
Abstract: Here we seek to understand the higher frequency of occurrence of salt bridges in proteins from thermophiles as compared to their mesophile homologs. We focus on glutamate dehydrogenase, owing to the availability of high resolution thermophilic (from Pyrococcus furiosus) and mesophilic (from Clostridium symbiosum) protein structures, the large protein size and the large difference in melting temperatures. We investigate the location, statistics and electrostatic strengths of salt bridges and of their networks within corresponding monomers of the thermophilic and mesophilic enzymes. We find that many of the extra salt bridges which are present in the thermophilic glutamate dehydrogenase monomer but absent in the mesophilic enzyme, form around the active site of the protein. Furthermore, salt bridges in the thermostable glutamate dehydrogenase cluster within the hydrophobic folding units of the monomer, rather than between them. Computation of the electrostatic contribution of salt bridge energies by solving the Poisson equation in a continuum solvent medium, shows that the salt bridges in Pyrococcus furiosus glutamate dehydrogenase are highly stabilizing. In contrast, the salt bridges in the mesophilic Clostridium symbiosum glutamate dehydrogenase are only marginally stabilizing. This is largely the outcome of the difference in the protein environment around the salt bridges in the two proteins. The presence of a larger number of charges, and hence, of salt bridges contributes to an electrostatically more favorable protein energy term. Our results indicate that salt bridges and their networks may have an important role in resisting deformation/unfolding of the protein structure at high temperatures, particularly in critical regions such as around the active site.

Journal ArticleDOI
01 Jul 2000-Proteins
TL;DR: It is proposed that many proteins (in particular, thermophilic proteins and “complex” proteins systems) are designed (by evolution) to have significant kinetic stability when confronted with the destabilizing effect of irreversible alterations.
Abstract: In vitro thermal denaturation experiments suggest that, because of the possibility of irreversible alterations, thermodynamic stability (i.e., a positive value for the unfolding Gibbs energy) does not guarantee that a protein will remain in the native state during a given timescale. Furthermore, irreversible alterations are more likely to occur in vivo than in vitro because (a) some irreversible processes (e.g., aggregation, "undesirable" interactions with other macromolecular components, and proteolysis) are expected to be fast in the "crowded" cellular environment and (b) in many cases, the relevant timescale in vivo (probably related to the half-life for protein degradation) is expected to be longer than the timescale of the usual in vitro experiments (of the order of minutes). We propose, therefore, that many proteins (in particular, thermophilic proteins and "complex" proteins systems) are designed (by evolution) to have significant kinetic stability when confronted with the destabilizing effect of irreversible alterations. We show that, as long as these alterations occur mainly from non-native states (a Lumry-Eyring scenario), the required kinetic stability may be achieved through the design of a sufficiently high activation barrier for unfolding, which we define as the Gibbs energy barrier that separates the native state from the non-native ensemble (unfolded, partially folded, and misfolded states) in the following generalized Lumry-Eyring model: Native State Non-Native Ensemble --> Irreversibly Denatured Protein. Finally, using familial amyloid polyneuropathy (FAP) as an illustrative example, we discuss the relation between stability and amyloid fibril formation in terms of the above viewpoint, which leads us to the two following tentative suggestions: (a) the hot spot defined by the FAP-associated amyloidogenic mutations of transthyretin reflects the structure of the transition state for unfolding and (b) substances that decrease the in vitro rate of transthyretin unfolding could also be inhibitors of amyloid fibril formation.

Journal ArticleDOI
01 Nov 2000-Proteins
TL;DR: A new program named “DARWIN” has been developed to perform docking calculations with proteins and other biological molecules, which uses the Genetic Algorithm to optimize the molecule's conformation and orientation under the selective pressure of minimizing the potential energy of the complex.
Abstract: A new program named "DARWIN" has been developed to perform docking calculations with proteins and other biological molecules. The program uses the Genetic Algorithm to optimize the molecule's conformation and orientation under the selective pressure of minimizing the potential energy of the complex. A unique feature of DARWIN is that it communicates with the molecular mechanics program CHARMM to make the energy calculations. A second important feature is its parallel interface, which allows simultaneous use of multiple stand-alone copies of CHARMM to rapidly evaluate large numbers of potential solutions. This permits an "accuracy first" approach to docking, which avoids many of the common assumptions and shortcuts often made to reduce computation time. The method was applied to three protein-carbohydrate complexes: the crystallographically determined structures of Concanavalin A and Fab Se155-4; and a model structure for Fab ME36.1. Conformations close to the crystal structures were obtained with this approach, but some "false positive" solutions were also selected. Many of these could be eliminated by introducing different methods for simulating solvent effects. An effective screening method for docking a database of compounds to a single target enzyme using DARWIN is also presented.