scispace - formally typeset
Search or ask a question

Showing papers in "Proteins in 2008"


Journal ArticleDOI
15 Nov 2008-Proteins
TL;DR: The results suggest that PROPKA 2.0 provides a good description of the protein–ligand interactions that have an important effect on the pKa values of titratable groups, thereby permitting fast and accurate determination of the protonation states of key residues and ligand functional groups within the binding or active site of a protein.
Abstract: The PROPKA method for the prediction of the pK(a) values of ionizable residues in proteins is extended to include the effect of non-proteinaceous ligands on protein pK(a) values as well as predict the change in pK(a) values of ionizable groups on the ligand itself. This new version of PROPKA (PROPKA 2.0) is, as much as possible, developed by adapting the empirical rules underlying PROPKA 1.0 to ligand functional groups. Thus, the speed of PROPKA is retained, so that the pK(a) values of all ionizable groups are computed in a matter of seconds for most proteins. This adaptation is validated by comparing PROPKA 2.0 predictions to experimental data for 26 protein-ligand complexes including trypsin, thrombin, three pepsins, HIV-1 protease, chymotrypsin, xylanase, hydroxynitrile lyase, and dihydrofolate reductase. For trypsin and thrombin, large protonation state changes (|n| > 0.5) have been observed experimentally for 4 out of 14 ligand complexes. PROPKA 2.0 and Klebe's PEOE approach (Czodrowski P et al. J Mol Biol 2007;367:1347-1356) both identify three of the four large protonation state changes. The protonation state changes due to plasmepsin II, cathepsin D and endothiapepsin binding to pepstatin are predicted to within 0.4 proton units at pH 6.5 and 7.0, respectively. The PROPKA 2.0 results indicate that structural changes due to ligand binding contribute significantly to the proton uptake/release, as do residues far away from the binding site, primarily due to the change in the local environment of a particular residue and hence the change in the local hydrogen bonding network. Overall the results suggest that PROPKA 2.0 provides a good description of the protein-ligand interactions that have an important effect on the pK(a) values of titratable groups, thereby permitting fast and accurate determination of the protonation states of key residues and ligand functional groups within the binding or active site of a protein.

971 citations


Journal ArticleDOI
01 Apr 2008-Proteins
TL;DR: In a comparison to five well‐established model quality assessment programs, QMEAN shows a statistically significant improvement over nearly all quality measures describing the ability of the scoring function to identify the native structure and to discriminate good from bad models.
Abstract: In protein structure prediction, a considerable number of alternative models are usually produced from which subsequently the final model has to be selected. Thus, a scoring function for the identification of the best model within an ensemble of alternative models is a key component of most protein structure prediction pipelines. QMEAN, which stands for Qualitative Model Energy ANalysis, is a composite scoring function describing the major geometrical aspects of protein structures. Five different structural descriptors are used. The local geometry is analyzed by a new kind of torsion angle potential over three consecutive amino acids. A secondary structure-specific distance-dependent pairwise residue-level potential is used to assess long-range interactions. A solvation potential describes the burial status of the residues. Two simple terms describing the agreement of predicted and calculated secondary structure and solvent accessibility, respectively, are also included. A variety of different implementations are investigated and several approaches to combine and optimize them are discussed. QMEAN was tested on several standard decoy sets including a molecular dynamics simulation decoy set as well as on a comprehensive data set of totally 22,420 models from server predictions for the 95 targets of CASP7. In a comparison to five well-established model quality assessment programs, QMEAN shows a statistically significant improvement over nearly all quality measures describing the ability of the scoring function to identify the native structure and to discriminate good from bad models. The three-residue torsion angle potential turned out to be very effective in recognizing the native fold.

919 citations


Journal ArticleDOI
15 Feb 2008-Proteins
TL;DR: It is demonstrated how such ensembles of predictors can be designed in‐house under controlled conditions and permit significant improvements in recognition by using a concept taken from protein loop energetics and applying it to the general problem of 3D clustering.
Abstract: Structural and functional annotation of the large and growing database of genomic sequences is a major problem in modern biology. Protein structure prediction by detecting remote homology to known structures is a well-established and successful annotation technique. However, the broad spectrum of evolutionary change that accompanies the divergence of close homologues to become remote homologues cannot easily be captured with a single algorithm. Recent advances to tackle this problem have involved the use of multiple predictive algorithms available on the Internet. Here we demonstrate how such ensembles of predictors can be designed in-house under controlled conditions and permit significant improvements in recognition by using a concept taken from protein loop energetics and applying it to the general problem of 3D clustering. We have developed a stringent test that simulates the situation where a protein sequence of interest is submitted to multiple different algorithms and not one of these algorithms can make a confident (95%) correct assignment. A method of meta-server prediction (Phyre) that exploits the benefits of a controlled environment for the component methods was implemented. At 95% precision or higher, Phyre identified 64.0% of all correct homologous query-template relationships, and 84.0% of the individual test query proteins could be accurately annotated. In comparison to the improvement that the single best fold recognition algorithm (according to training) has over PSI-Blast, this represents a 29.6% increase in the number of correct homologous query-template relationships, and a 46.2% increase in the number of accurately annotated queries. It has been well recognised in fold prediction, other bioinformatics applications, and in many other areas, that ensemble predictions generally are superior in accuracy to any of the component individual methods. However there is a paucity of information as to why the ensemble methods are superior and indeed this has never been systematically addressed in fold recognition. Here we show that the source of ensemble power stems from noise reduction in filtering out false positive matches. The results indicate greater coverage of sequence space and improved model quality, which can consequently lead to a reduction in the experimental workload of structural genomics initiatives.

436 citations


Journal ArticleDOI
01 Aug 2008-Proteins
TL;DR: A new threading algorithm MUSTER is developed by extending the previous sequence profile–profile alignment method, PPA, which shows a better performance than using the conventional optimization methods based on the PROSUP database.
Abstract: We develop a new threading algorithm MUSTER by extending the previous sequence profile-profile alignment method, PPA. It combines various sequence and structure information into single-body terms which can be conveniently used in dynamic programming search: (1) sequence profiles; (2) secondary structures; (3) structure fragment profiles; (4) solvent accessibility; (5) dihedral torsion angles; (6) hydrophobic scoring matrix. The balance of the weighting parameters is optimized by a grading search based on the average TM-score of 111 training proteins which shows a better performance than using the conventional optimization methods based on the PROSUP database. The algorithm is tested on 500 nonhomologous proteins independent of the training sets. After removing the homologous templates with a sequence identity to the target >30%, in 224 cases, the first template alignment has the correct topology with a TM-score >0.5. Even with a more stringent cutoff by removing the templates with a sequence identity >20% or detectable by PSI-BLAST with an E-value 0.5. Dependent on the homology cutoffs, the average TM-score of the first threading alignments by MUSTER is 5.1-6.3% higher than that by PPA. This improvement is statistically significant by the Wilcoxon signed rank test with a P-value < 1.0 x 10(-13), which demonstrates the effect of additional structural information on the protein fold recognition. The MUSTER server is freely available to the academic community at http://zhang.bioinformatics.ku.edu/MUSTER.

377 citations


Journal ArticleDOI
01 May 2008-Proteins
TL;DR: The results suggest that selecting expression constructs for crystal trials based primarily on expression solubility is insufficient and instead, AnSEC scoring as a measure of protein polydispersity was found to be predictive of ultimate structure determination success and essential for identifying appropriate boundaries for truncation series.
Abstract: Successful protein expression, purification, and crystallization for challenging targets typically requires evaluation of a multitude of expression constructs. Often many iterations of truncations and point mutations are required to identify a suitable derivative for recombinant expression. Making and characterizing these variants is a significant barrier to success. We have developed a rapid and efficient cloning process and combined it with a protein microscreening approach to characterize protein suitability for structural studies. The Polymerase Incomplete Primer Extension (PIPE) cloning method was used to rapidly clone 448 protein targets and then to generate 2143 truncations from 96 targets with minimal effort. Proteins were expressed, purified, and characterized via a microscreening protocol, which incorporates protein quantification, liquid chromatography mass spectrometry and analytical size exclusion chromatography (AnSEC) to evaluate suitability of the protein products for X-ray crystallography. The results suggest that selecting expression constructs for crystal trials based primarily on expression solubility is insufficient. Instead, AnSEC scoring as a measure of protein polydispersity was found to be predictive of ultimate structure determination success and essential for identifying appropriate boundaries for truncation series. Overall structure determination success was increased by at least 38% by applying this combined PIPE cloning and microscreening approach to recalcitrant targets.

282 citations


Journal ArticleDOI
01 Apr 2008-Proteins
TL;DR: The prediction performance of SVM models developed in this study is better than the existing methods on the same datasets and a web server ‘Pprint’ was developed for predicting RNA binding residues in a protein sequence which is freely available.
Abstract: RNA-binding proteins (RBPs) play key roles in post-transcriptional control of gene expression, which, along with transcriptional regulation, is a major way to regulate patterns of gene expression during development. Thus, the identification and prediction of RNA binding sites is an important step in comprehensive understanding of how RBPs control organism development. Combining evolutionary information and support vector machine (SVM), we have developed an improved method for predicting RNA binding sites or RNA interacting residues in a protein sequence. The prediction models developed in this study have been trained and tested on 86 RNA binding protein chains and evaluated using fivefold cross validation technique. First, a SVM model was developed that achieved a maximum Matthew's correlation coefficient (MCC) of 0.31. The performance of this SVM model further improved the MCC from 0.31 to 0.45, when multiple sequence alignment in the form of PSSM profiles was used as input to the SVM, which is far better than the maximum MCC achieved by previous methods (0.41) on the same dataset. In addition, SVM models were also developed on an alternative dataset that contained 107 RBP chains. Utilizing PSSM as input information to the SVM, the training/testing on this alternate dataset achieved a maximum MCC of 0.32. Conclusively, the prediction performance of SVM models developed in this study is better than the existing methods on the same datasets. A web server 'Pprint' was also developed for predicting RNA binding residues in a protein sequence which is freely available at http://www.imtech.res.in/raghava/pprint/.

270 citations


Journal ArticleDOI
15 Nov 2008-Proteins
TL;DR: This update of Benchmark 3.0 includes 40 new test cases, representing a 48% increase from Benchmark 2.0, and will facilitate the development of new algorithms that require a large number of training examples.
Abstract: We present version 3.0 of our publicly available protein-protein docking benchmark. This update includes 40 new test cases, representing a 48% increase from Benchmark 2.0. For all of the new cases, the crystal structures of both binding partners are available. As with Benchmark 2.0, Structural Classification of Proteins (Murzin et al., J Mol Biol 1995;247:536-540) was used to remove redundant test cases. The 124 unbound-unbound test cases in Benchmark 3.0 are classified into 88 rigid-body cases, 19 medium-difficulty cases, and 17 difficult cases, based on the degree of conformational change at the interface upon complex formation. In addition to providing the community with more test cases for evaluating docking methods, the expansion of Benchmark 3.0 will facilitate the development of new algorithms that require a large number of training examples. Benchmark 3.0 is available to the public at http://zlab.bu.edu/benchmark.

264 citations


Journal ArticleDOI
01 Aug 2008-Proteins
TL;DR: This work extracts orientation‐dependent interactions from protein structures by treating each polar atom as a dipole with a direction, and reveals that the orientation preference between hydrogen‐bonded atoms is not enough to account for the structural specificity of proteins.
Abstract: Proteins fold into unique three-dimensional structures by specific, orientation-dependent interactions between amino acid residues. Here, we extract orientation-dependent interactions from protein structures by treating each polar atom as a dipole with a direction. The resulting statistical energy function successfully refolds 13 out of 16 fully unfolded secondary-structure terminal regions of 10-23 amino acid residues in 15 small proteins. Dissecting the orientation-dependent energy function reveals that the orientation preference between hydrogen-bonded atoms is not enough to account for the structural specificity of proteins. The result has significant implications on the theoretical and experimental searches for specific interactions involved in protein folding and molecular recognition between proteins and other biologically active molecules.

241 citations


Journal ArticleDOI
01 Aug 2008-Proteins
TL;DR: A distance‐dependent knowledge‐based scoring function to predict protein–protein interactions and the binding scores predicted by ITScore‐PP correlated well with the experimentally determined binding affinities, yielding a correlation coefficient of R = 0.71.
Abstract: Using an efficient iterative method, we have developed a distance-dependent knowledge-based scoring function to predict protein-protein interactions. The function, referred to as ITScore-PP, was derived using the crystal structures of a training set of 851 protein-protein dimeric complexes containing true biological interfaces. The key idea of the iterative method for deriving ITScore-PP is to improve the interatomic pair potentials by iteration, until the pair potentials can distinguish true binding modes from decoy modes for the protein-protein complexes in the training set. The iterative method circumvents the challenging reference state problem in deriving knowledge-based potentials. The derived scoring function was used to evaluate the ligand orientations generated by ZDOCK 2.1 and the native ligand structures on a diverse set of 91 protein-protein complexes. For the bound test cases, ITScore-PP yielded a success rate of 98.9% if the top 10 ranked orientations were considered. For the more realistic unbound test cases, the corresponding success rate was 40.7%. Furthermore, for faster orientational sampling purpose, several residue-level knowledge-based scoring functions were also derived following the similar iterative procedure. Among them, the scoring function that uses the side-chain center of mass (SCM) to represent a residue, referred to as ITScore-PP(SCM), showed the best performance and yielded success rates of 71.4% and 30.8% for the bound and unbound cases, respectively, when the top 10 orientations were considered. ITScore-PP was further tested using two other published protein-protein docking decoy sets, the ZDOCK decoy set and the RosettaDock decoy set. In addition to binding mode prediction, the binding scores predicted by ITScore-PP also correlated well with the experimentally determined binding affinities, yielding a correlation coefficient of R = 0.71 on a test set of 74 protein-protein complexes with known affinities. ITScore-PP is computationally efficient. The average run time for ITScore-PP was about 0.03 second per orientation (including optimization) on a personal computer with 3.2 GHz Pentium IV CPU and 3.0 GB RAM. The computational speed of ITScore-PP(SCM) is about an order of magnitude faster than that of ITScore-PP. ITScore-PP and/or ITScore-PP(SCM) can be combined with efficient protein docking software to study protein-protein recognition.

233 citations


Journal ArticleDOI
01 Mar 2008-Proteins
TL;DR: The method employs the Elastic Network Model, which is very efficient and was validated against a large data set of proteins, and can be used in applications such as flexible protein–protein and protein–ligand docking, flexible docking of protein structures into cryo‐EM maps, and refinement of low‐resolution EM structures.
Abstract: Proteins are highly flexible molecules. Prediction of molecular flexibility aids in the comprehension and prediction of protein function and in providing details of functional mechanisms. The ability to predict the locations, directions, and extent of molecular movements can assist in fitting atomic resolution structures to low-resolution EM density maps and in predicting the complex structures of interacting molecules (docking). There are several types of molecular movements. In this work, we focus on the prediction of hinge movements. Given a single protein structure, the method automatically divides it into the rigid parts and the hinge regions connecting them. The method employs the Elastic Network Model, which is very efficient and was validated against a large data set of proteins. The output can be used in applications such as flexible protein-protein and protein-ligand docking, flexible docking of protein structures into cryo-EM maps, and refinement of low-resolution EM structures. The web server of HingeProt provides convenient visualization of the results and is available with two mirror sites at http://www.prc.boun.edu.tr/appserv/prc/HingeProt3 and http://bioinfo3d.cs.tau.ac.il/HingeProt/.

219 citations


Journal ArticleDOI
01 Nov 2008-Proteins
TL;DR: The background and the principles of existing flexible protein–protein docking methods are described, focusing on the algorithms and their rational.
Abstract: Treating flexibility in molecular docking is a major challenge in cell biology research. Here we describe the background and the principles of existing flexible protein-protein docking methods, focusing on the algorithms and their rational. We describe how protein flexibility is treated in different stages of the docking process: in the preprocessing stage, rigid and flexible parts are identified and their possible conformations are modeled. This preprocessing provides information for the subsequent docking and refinement stages. In the docking stage, an ensemble of pre-generated conformations or the identified rigid domains may be docked separately. In the refinement stage, small-scale movements of the backbone and side-chains are modeled and the binding orientation is improved by rigid-body adjustments. For clarity of presentation, we divide the different methods into categories. This should allow the reader to focus on the most suitable method for a particular docking problem.

Journal ArticleDOI
15 Aug 2008-Proteins
TL;DR: This work proposes an algorithm for detecting gene–disease associations based on the human protein–protein interaction network, known gene-diseases associations, protein sequence, and protein functional information at the molecular level, and provided evidence that, despite the noise/incompleteness of experimental data and unfinished ontology of diseases, identification of candidate genes can be successful even when a large number of candidate disease terms are predicted on simultaneously.
Abstract: One of the most important tasks of modern bioinformatics is the development of computational tools that can be used to understand and treat human disease. To date, a variety of methods have been explored and algorithms for candidate gene prioritization are gaining in their usefulness. Here, we propose an algorithm for detecting gene-disease associations based on the human protein-protein interaction network, known gene-disease associations, protein sequence, and protein functional information at the molecular level. Our method, PhenoPred, is supervised: first, we mapped each gene/protein onto the spaces of disease and functional terms based on distance to all annotated proteins in the protein interaction network. We also encoded sequence, function, physicochemical, and predicted structural properties, such as secondary structure and flexibility. We then trained support vector machines to detect gene-disease associations for a number of terms in Disease Ontology and provided evidence that, despite the noise/incompleteness of experimental data and unfinished ontology of diseases, identification of candidate genes can be successful even when a large number of candidate disease terms are predicted on simultaneously.

Journal ArticleDOI
01 Dec 2008-Proteins
TL;DR: A thermodynamic assessment of how each surface group on proteins contributes to the overall hydration and osmolation is provided, and it is found that the major solvation effects on protein side‐chains originate from the osmolytes, and that the hydration mostly depends on the size of the side‐chain.
Abstract: Protein stability and solubility depend strongly on the presence of osmolytes, because of the protein preference to be solvated by either water or osmolyte. It has traditionally been assumed that only this relative preference can be measured, and that the individual solvation contributions of water and osmolyte are inaccessible. However, it is possible to determine hydration and osmolyte solvation (osmolation) separately using Kirkwood-Buff theory, and this fact has recently been utilized by several researchers. Here, we provide a thermodynamic assessment of how each surface group on proteins contributes to the overall hydration and osmolation. Our analysis is based on transfer free energy measurements with model-compounds that were previously demonstrated to allow for a very successful prediction of osmolyte-dependent protein stability. When combined with Kirkwood-Buff theory, the Transfer Model provides a space-resolved solvation pattern of the peptide unit, amino acids, and the folding/unfolding equilibrium of proteins in the presence of osmolytes. We find that the major solvation effects on protein side-chains originate from the osmolytes, and that the hydration mostly depends on the size of the side-chain. The peptide backbone unit displays a much more variable hydration in the different osmolyte solutions. Interestingly, the presence of sucrose leads to simultaneous accumulation of both the sugar and water in the vicinity of peptide groups, resulting from a saccharide accumulation that is less than the accumulation of water, a net preferential exclusion. Only the denaturing osmolyte, urea, obeys the classical solvent exchange mechanism in which the preferential interaction with the peptide unit excludes water.

Journal ArticleDOI
01 Apr 2008-Proteins
TL;DR: It is shown that the transient‐complex theory is predictive of electrostatic rate enhancement and can help parameterize PB calculations and prediction is improved when the nonlinear PB equation is used.
Abstract: The association of two proteins is bounded by the rate at which they, via diffusion, find each other while in appropriate relative orientations. Orientational constraints restrict this rate to approximately 10(5)-10(6) M(-1) s(-1). Proteins with higher association rates generally have complementary electrostatic surfaces; proteins with lower association rates generally are slowed down by conformational changes upon complex formation. Previous studies (Zhou, Biophys J 1997;73:2441-2445) have shown that electrostatic enhancement of the diffusion-limited association rate can be accurately modeled by $k_{\bf D}$ = $k_{D}0\ {exp} ( - \langle U_{el} \rangle;{\star}/k_{B} T),$ where k(D) and k(D0) are the rates in the presence and absence of electrostatic interactions, respectively, U(el) is the average electrostatic interaction energy in a "transient-complex" ensemble, and k(B)T is the thermal energy. The transient-complex ensemble separates the bound state from the unbound state. Predictions of the transient-complex theory on four protein complexes were found to agree well with the experiment when the electrostatic interaction energy was calculated with the linearized Poisson-Boltzmann (PB) equation (Alsallaq and Zhou, Structure 2007;15:215-224). Here we show that the agreement is further improved when the nonlinear PB equation is used. These predictions are obtained with the dielectric boundary defined as the protein van der Waals surface. When the dielectric boundary is instead specified as the molecular surface, electrostatic interactions in the transient complex become repulsive and are thus predicted to retard association. Together these results demonstrate that the transient-complex theory is predictive of electrostatic rate enhancement and can help parameterize PB calculations.

Journal ArticleDOI
01 Aug 2008-Proteins
TL;DR: The presented data unequivocally suggest that predecessor genes of mammalian heme peroxidases have segregated very early in evolution, showing that even in certain prokaryotic organisms, genes encoding putative antimicrobial enzymes are found providing a group of bacteria with an evolutionary advantage over the others.
Abstract: The authors have reconstructed the phylogenetic relationships of the main evolutionary lines of mammalian heme containing peroxidases. The sequences of intensively investigated human myeloperoxidase, eosinophil peroxidase, and lactoperoxidase, which participate in host defence against infections, were aligned together with newly found open reading frames coding for highly similar putative peroxidase domains in all kingdoms of life. The evolutionary relationships were reconstructed using neighbor-joining, maximum parsimony, and maximum likelihood methods. It is demonstrated that this enzyme superfamily obeys the rules of birth-and-death model of multigene family evolution and contains proteins with a variety of function that could be grouped in seven subfamilies. On the basis of occurrence and the fact that two main enzymatic activities are related with these metalloproteins, they propose the name peroxidase–cyclooxygenase superfamily for this widely spread group of heme-containing oxidoreductases. Well known structure–function relationships in mammalian peroxidases formed the basis for the critical inspection of all subfamilies. The presented data unequivocally suggest that predecessor genes of mammalian heme peroxidases have segregated very early in evolution. Before organisms developed an acquired immunity, their antimicrobial defence depended on enzymes that were recruited upon pathogen invasion and could produce antimicrobial reaction products. Thus, these peroxidatic heme proteins evolved to important components in the innate immune defence system. This work shows that even in certain prokaryotic organisms, genes encoding putative antimicrobial enzymes are found providing a group of bacteria with an evolutionary advantage over the others. Proteins 2008. © 2008 Wiley-Liss, Inc.

Journal ArticleDOI
15 Feb 2008-Proteins
TL;DR: A fast and accurate protocol, LoopBuilder, for the prediction of loop conformations in proteins, which includes extensive sampling of backbone conformations, side chain addition, the use of a statistical potential to select a subset of these conformATIONS, and an energy minimization and ranking with an all‐atom force field.
Abstract: We describe a fast and accurate protocol, LoopBuilder, for the prediction of loop conformations in proteins. The procedure includes extensive sampling of backbone conformations, side chain addition, the use of a statistical potential to select a subset of these conformations, and, finally, an energy minimization and ranking with an all-atom force field. We find that the Direct Tweak algorithm used in the previously developed LOOPY program is successful in generating an ensemble of conformations that on average are closer to the native conformation than those generated by other methods. An important feature of Direct Tweak is that it checks for interactions between the loop and the rest of the protein during the loop closure process. DFIRE is found to be a particularly effective statistical potential that can bias conformation space toward conformations that are close to the native structure. Its application as a filter prior to a full molecular mechanics energy minimization both improves prediction accuracy and offers a significant savings in computer time. Final scoring is based on the OPLS/SBG-NP force field implemented in the PLOP program. The approach is also shown to be quite successful in predicting loop conformations for cases where the native side chain conformations are assumed to be unknown, suggesting that it will prove effective in real homology modeling applications. Proteins 2008. © 2007 Wiley-Liss, Inc.

Journal ArticleDOI
15 Feb 2008-Proteins
TL;DR: A new model of rabbit 15S‐LOX1 is proposed that should provide new insight into the catalytic mechanism involving induced conformational change of the binding pocket, and may also be helpful for the structure‐based design of LOX inhibitors.
Abstract: Lipoxygenases (LOXs) are a family of nonheme iron dioxygenases that catalyze the regioselective and stereospecific hydroperoxidation of polyunsaturated fatty acids, and are involved in a variety of inflammatory diseases and cancers. The crystal structure of rabbit 15S-LOX1 that was reported by Gillmor et al. in 1997 has played key roles for understanding the properties of mammalian LOXs. In this structure, three segments, including 12 residues in the superficial α2 helix, are absent and have usually been described as “disordered.” By reinterpreting the original crystallographic data we were able to elucidate two different conformations of the molecule, both having well ordered α2 helices. Surprisingly, one molecule contained an inhibitor and the other did not, thereby adopting a closed and an open form, respectively. They differed in the conformation of the segments that were absent in the original structure, which is highlighted by a 12 A movement of α2. Consequently, they showed a difference in the size and shape of the substrate-binding cavity. The new model should provide new insight into the catalytic mechanism involving induced conformational change of the binding pocket. It may also be helpful for the structure-based design of LOX inhibitors. Proteins 2008. © 2007 Wiley-Liss, Inc.

Journal ArticleDOI
01 Jul 2008-Proteins
TL;DR: The results implicate that the adsorption of Aβ to anionic lipids, which could become exposed to the outer membrane leaflet by cell injury, may serve as an in vivo mechanism of templated‐aggregation and drive the pathogenesis of AD.
Abstract: The lipid membrane has been shown to mediate the fibrillogenesis and toxicity of Alzheimer's disease (AD) amyloid-beta (Abeta) peptide. Electrostatic interactions between Abeta40 and the phospholipid headgroup have been found to control the association and insertion of monomeric Abeta into lipid monolayers, where Abeta exhibited enhanced interactions with charged lipids compared with zwitterionic lipids. To elucidate the molecular-scale structural details of Abeta-membrane association, we have used complementary X-ray and neutron scattering techniques (grazing-incidence X-ray diffraction, X-ray reflectivity, and neutron reflectivity) in this study to investigate in situ the association of Abeta with lipid monolayers composed of either the anionic lipid 1,2-dipalmitoyl-sn-glycero-3-[phospho-rac-(1-glycerol)] (DPPG), the zwitterionic lipid 1,2-dipalmitoyl-sn-glycero-3-phosphocholine (DPPC), or the cationic lipid 1,2-dipalmitoyl 3-trimethylammonium propane (DPTAP) at the air-buffer interface. We found that the anionic lipid DPPG uniquely induced crystalline ordering of Abeta at the membrane surface that closely mimicked the beta-sheet structure in fibrils, revealing an intriguing templated ordering effect of DPPG on Abeta. Furthermore, incubating Abeta with lipid vesicles containing the anionic lipid 1-palmitoyl-2-oleoyl-sn-glycero-3-[phospho-rac-(1-glycerol)] (POPG) induced the formation of amyloid fibrils, confirming that the templated ordering of Abeta at the membrane surface seeded fibril formation. This study provides a detailed molecular-scale characterization of the early structural fluctuation and assembly events that may trigger the misfolding and aggregation of Abeta in vivo. Our results implicate that the adsorption of Abeta to anionic lipids, which could become exposed to the outer membrane leaflet by cell injury, may serve as an in vivo mechanism of templated-aggregation and drive the pathogenesis of AD.

Journal ArticleDOI
Lee Sael1, Bin Li1, David La1, Yi Fang1, Karthik Ramani1, Raif M. Rustamov1, Daisuke Kihara 
01 Sep 2008-Proteins
TL;DR: A global surface shape representation by three‐dimensional (3D) Zernike descriptors, which represent a protein structure compactly as a series expansion of 3D functions is introduced, which will open up new possibility of large‐scale global and local protein surface shape comparison.
Abstract: Characterization and identification of similar tertiary structure of proteins provides rich information for investigating function and evolution. The importance of structure similarity searches is increasing as structure databases continue to expand, partly due to the structural genomics projects. A crucial drawback of conventional protein structure comparison methods, which compare structures by their main-chain orientation or the spatial arrangement of secondary structure, is that a database search is too slow to be done in real-time. Here we introduce a global surface shape representation by three-dimensional (3D) Zernike descriptors, which represent a protein structure compactly as a series expansion of 3D functions. With this simplified representation, the search speed against a few thousand structures takes less than a minute. To investigate the agreement between surface representation defined by 3D Zernike descriptor and conventional main-chain based representation, a benchmark was performed against a protein classification generated by the combinatorial extension algorithm. Despite the different representation, 3D Zernike descriptor retrieved proteins of the same conformation defined by combinatorial extension in 89.6% of the cases within the top five closest structures. The real-time protein structure search by 3D Zernike descriptor will open up new possibility of large-scale global and local protein surface shape comparison.

Journal ArticleDOI
01 May 2008-Proteins
TL;DR: In this paper, the authors classified propellers into six separate structural groups by the SCOP and CATH databases and found that most propellers group together in a cluster map of all-beta folds generated by sequence similarity, because of numerous pairwise matches.
Abstract: beta-Propellers are toroidal folds, in which repeated, four-stranded beta-meanders are arranged in a circular and slightly tilted fashion, like the blades of a propeller. They are found in all domains of life, with a strong preponderance among eukaryotes. Propellers show considerable sequence diversity and are classified into six separate structural groups by the SCOP and CATH databases. Despite this diversity, they often show similarities across groups, not only in structure but also in sequence, raising the possibility of a common origin. In agreement with this hypothesis, most propellers group together in a cluster map of all-beta folds generated by sequence similarity, because of numerous pairwise matches, many of which are individually nonsignificant. In total, 45 of 60 propellers in the SCOP25 database, covering four SCOP folds, are clustered in this group and analysis with sensitive sequence comparison methods shows that they are similar at a level indicative of homology. Two mechanisms appear to contribute to the evolution of beta-propellers: amplification from single blades and subsequent functional differentiation. The observation of propellers with nearly identical blades in genomic sequences show that these mechanisms are still operating today.

Journal ArticleDOI
01 Jul 2008-Proteins
TL;DR: The effective combination of independently developed docking protocols (ZDOCK/ZRANK, and RosettaDock), indicating that using diverse search and scoring functions can improve protein docking results.
Abstract: To determine the structures of protein-protein interactions, protein docking is a valuable tool that complements experimental methods to characterize protein complexes. Although protein docking can often produce a near-native solution within a set of global docking predictions, there are sometimes predictions that require refinement to elucidate correct contacts and conformation. Previously, we developed the ZRANK algorithm to rerank initial docking predictions from ZDOCK, a docking program developed by our lab. In this study, we have applied the ZRANK algorithm toward refinement of protein docking models in conjunction with the protein docking program RosettaDock. This was performed by reranking global docking predictions from ZDOCK, performing local side chain and rigid-body refinement using RosettaDock, and selecting the refined model based on ZRANK score. For comparison, we examined using RosettaDock score instead of ZRANK score, and a larger perturbation size for the RosettaDock search, and determined that the larger RosettaDock perturbation size with ZRANK scoring was optimal. This method was validated on a protein-protein docking benchmark. For refining docking benchmark predictions from the newest ZDOCK version, this led to improved structures of top-ranked hits in 20 of 27 cases, and an increase from 23 to 27 cases with hits in the top 20 predictions. Finally, we optimized the ZRANK energy function using refined models, which provides a significant improvement over the original ZRANK energy function. Using this optimized function and the refinement protocol, the numbers of cases with hits ranked at number one increased from 12 to 19 and from 7 to 15 for two different ZDOCK versions. This shows the effective combination of independently developed docking protocols (ZDOCK/ZRANK, and RosettaDock), indicating that using diverse search and scoring functions can improve protein docking results.

Journal ArticleDOI
15 May 2008-Proteins
TL;DR: The RMSD of the present approach improves if one considers only strongly shifted pK Aexp in contrast to the other methods under these conditions, and the method allows interpreting pK’s in terms of pH dependent hydrogen bonding pattern and salt bridge geometries.
Abstract: pK(A) in proteins are determined by electrostatic energy computations using a small number of optimized protein conformations derived from crystal structures. In these protein conformations hydrogen positions and geometries of salt bridges on the protein surface were determined self-consistently with the protonation pattern at three pHs (low, ambient, and high). Considering salt bridges at protein surfaces is most relevant, since they open at low and high pH. In the absence of these conformational changes, computed pK(A)(comp) of acidic (basic) groups in salt bridges underestimate (overestimate) experimental pK(A)(exp), dramatically. The pK(A)(comp) for 15 different proteins with 185 known pK(A)(exp) yield an RMSD of 1.12, comparable with two other methods. One of these methods is fully empirical with many adjustable parameters. The other is also based on electrostatic energy computations using many non-optimized side chain conformers but employs larger dielectric constants at short distances of charge pairs that diminish their electrostatic interactions. These empirical corrections that account implicitly for additional conformational flexibility were needed to describe the energetics of salt bridges appropriately. This is not needed in the present approach. The RMSD of the present approach improves if one considers only strongly shifted pK(A)(exp) in contrast to the other methods under these conditions. Our method allows interpreting pK(A)(comp) in terms of pH dependent hydrogen bonding pattern and salt bridge geometries. A web service is provided to perform pK(A) computations.

Journal ArticleDOI
15 Feb 2008-Proteins
TL;DR: In case of docking searches that minimize the influence of local side chain conformational changes inclusion of global flexibility can significantly improve the agreement of the near‐native docking solutions with the corresponding experimental structures.
Abstract: Protein-protein association can frequently involve significant backbone conformational changes of the protein partners. A computationally rapid method has been developed that allows to approximately account for global conformational changes during systematic protein-protein docking starting from many thousands of start configurations. The approach employs precalculated collective degrees of freedom as additional variables during protein-protein docking minimization. The global collective degrees of freedom are obtained from normal mode analysis using a Gaussian network model for the protein. Systematic docking searches were performed on 10 test systems that differed in the degree of conformational change associated with complex formation and in the degree of overlap between observed conformational changes and precalculated flexible degrees of freedom. The results indicate that in case of docking searches that minimize the influence of local side chain conformational changes inclusion of global flexibility can significantly improve the agreement of the near-native docking solutions with the corresponding experimental structures. For docking of unbound protein partners in several cases an improved ranking of near native docking solutions was observed. This was achieved at a very modest ( approximately 2-fold) increase of computational demands compared to rigid docking. For several test cases the number of docking solutions close to experiment was also significantly enhanced upon inclusion of soft collective degrees of freedom. This result indicates that inclusion of global flexibility can facilitate in silico protein-protein association such that a greater number of different start configurations results in favorable complex formation.

Journal ArticleDOI
01 May 2008-Proteins
TL;DR: The results suggest that in some cases, notably homology modeling, the common use of nonredundant datasets, culled from the PDB based on sequence, may mask important structural and functional information.
Abstract: It is often assumed that in the Protein Data Bank (PDB), two proteins with similar sequences will also have similar structures. Accordingly, it has proved useful to develop subsets of the PDB from which "redundant" structures have been removed, based on a sequence-based criterion for similarity. Similarly, when predicting protein structure using homology modeling, if a template structure for modeling a target sequence is selected by sequence alone, this implicitly assumes that all sequence-similar templates are equivalent. Here, we show that this assumption is often not correct and that standard approaches to create subsets of the PDB can lead to the loss of structurally and functionally important information. We have carried out sequence-based structural superpositions and geometry-based structural alignments of a large number of protein pairs to determine the extent to which sequence similarity ensures structural similarity. We find many examples where two proteins that are similar in sequence have structures that differ significantly from one another. The source of the structural differences usually has a functional basis. The number of such proteins pairs that are identified and the magnitude of the dissimilarity depend on the approach that is used to calculate the differences; in particular sequence-based structure superpositioning will identify a larger number of structurally dissimilar pairs than geometry-based structural alignments. When two sequences can be aligned in a statistically meaningful way, sequence-based structural superpositioning provides a meaningful measure of structural differences. This approach and geometry-based structure alignments reveal somewhat different information and one or the other might be preferable in a given application. Our results suggest that in some cases, notably homology modeling, the common use of nonredundant datasets, culled from the PDB based on sequence, may mask important structural and functional information. We have established a data base of sequence-similar, structurally dissimilar protein pairs that will help address this problem (http://luna.bioc.columbia.edu/rachel/seqsimstrdiff.htm).

Journal ArticleDOI
01 Oct 2008-Proteins
TL;DR: MolAxis is a novel algorithm that uses state‐of‐the‐art computational geometry techniques to approximate and scan a useful subset of the outer medial axis, thereby reducing the dimension of the problem and consequently rendering the algorithm extremely efficient.
Abstract: Channels and cavities play important roles in macromolecular functions, serving as access/exit routes for substrates/products, cofactor and drug binding, catalytic sites, and ligand/protein. In addition, channels formed by transmembrane (TM) proteins serve as transporters and ion channels. MolAxis is a new sensitive and fast tool for the identification and classification of channels and cavities of various sizes and shapes in macromolecules. MolAxis constructs corridors, which are pathways that represent probable routes taken by small molecules passing through channels. The outer medial axis of the molecule is the collection of points that have more than one closest atom. It is composed of two-dimensional surface patches and can be seen as a skeleton of the complement of the molecule. We have implemented in MolAxis a novel algorithm that uses state-of-the-art computational geometry techniques to approximate and scan a useful subset of the outer medial axis, thereby reducing the dimension of the problem and consequently rendering the algorithm extremely efficient. MolAxis is designed to identify channels that connect buried cavities to the outside of macromolecules and to identify TM channels in proteins. We apply MolAxis to enzyme cavities and TM proteins. We further utilize MolAxis to monitor channel dimensions along Molecular Dynamics trajectories of a human Cytochrome P450. MolAxis constructs high quality corridors for snapshots at picosecond time-scale intervals substantiating the gating mechanism in the 2e substrate access channel. We compare our results with previous tools in terms of accuracy, performance and underlying theoretical guarantees of finding the desired pathways. MolAxis is available on line as a web-server and as a standalone easy-to-use program (http://bioinfo3d.cs.tau.ac.il/MolAxis/). Proteins 2008. © 2008 Wiley-Liss, Inc.

Journal ArticleDOI
15 May 2008-Proteins
TL;DR: This study identified residues that have significant contributions to binding with six substrates using molecular dynamics simulations and Molecular Mechanics Generalized Born Surface Area calculations and defined an empirical parameter called free energy/variability (FV) value, which was shown to identify single resistant mutations with an accuracy of 88%.
Abstract: HIV-1 protease has been an important drug target for the antiretroviral treatment of HIV infection. The efficacy of protease drugs is impaired by the rapid emergence of resistant virus strains. Understanding the molecular basis and evaluating the potency of an inhibitor to combat resistance are no doubt important in AIDS therapy. In this study, we first identified residues that have significant contributions to binding with six substrates using molecular dynamics simulations and Molecular Mechanics Generalized Born Surface Area calculations. Among the critical residues, Asp25, Gly27, Ala28, Asp29, and Gly49 are well conserved, with which the potent drugs should form strong interactions. We then calculated the contribution of each residue to binding with eight FDA approved drugs. We analyzed the conservation of each protease residue and also compared the interaction between the HIV protease and individual residues of the drugs and substrates. Our analyses showed that resistant mutations usually occur at less conserved residues forming more favorable interactions with drugs than with substrates. To quantitatively integrate the binding free energy and conservation information, we defined an empirical parameter called free energy/variability (FV) value, which is the product of the contribution of a single residue to the binding free energy and the sequence variability at that position. As a validation, the FV value was shown to identify single resistant mutations with an accuracy of 88%. Finally, we evaluated the potency of a newly approved drug, darunavir, to combat resistance and predicted that darunavir is more potent than amprenavir but may be susceptible to mutations on Val32 and Ile84.

Journal ArticleDOI
01 May 2008-Proteins
TL;DR: The principal finding is that, in general, the half‐life of a protein does not depend on the presence of degradation signals within its sequence, even of ubiquitination sites, but correlates mainly with the length of its polypeptide chain and with various measures of structural disorder.
Abstract: Targeted turnover of proteins is a key element in the regulation of practically all basic cellular processes. The underlying physicochemical and/or sequential signals, however, are not fully understood. This issue is particularly pertinent in light of the recent recognition that intrinsically unstructured/disordered proteins, common in eukaryotic cells, are extremely susceptible to proteolytic degradation in vitro. The in vivo half-lives of proteins were determined recently in a high-throughput study encompassing the entire yeast proteome; here we examine whether these half-lives correlate with the presence of classical degradation motifs (PEST region, destruction-box, KEN-box, or the N-terminal residue) or with various physicochemical characteristics, such as the size of the protein, the degree of structural disorder, or the presence of low-complexity regions. Our principal finding is that, in general, the half-life of a protein does not depend on the presence of degradation signals within its sequence, even of ubiquitination sites, but correlates mainly with the length of its polypeptide chain and with various measures of structural disorder. Two distinct modes of involvement of disorder in degradation are proposed. Susceptibility to degradation of longer proteins, containing larger numbers of residues in conformational disorder, suggests an extensive function, whereby the effect of disorder can be ascribed to its mere physical presence. However, after normalization for protein length, the only signal that correlates with half-life is disorder, which indicates that it also acts in an intensive manner, that is, as a specific signal, perhaps in conjunction with the recognition of classical degradation motifs. The significance of correlation is rather low; thus protein degradation is not determined by a single characteristic, but is a multi-factorial process that shows large protein-to-protein variations. Protein disorder, nevertheless, plays a key signalling role in many cases. Proteins 2008. © 2007 Wiley-Liss, Inc.

Journal ArticleDOI
15 Nov 2008-Proteins
TL;DR: The H3‐rules are revised and an improved classification scheme for CDR‐H3 structure modeling is proposed and the concept of “antibody druggability” is discussed, which can be applied as an indicator of antibody evaluation during drug discovery.
Abstract: Among the six complementarity-determining regions (CDRs) in the variable domains of an antibody, the third CDR of the heavy chain (CDR-H3), which lies in the center of the antigen-binding site, plays a particularly important role in antigen recognition. CDR-H3 shows significant variability in its length, sequence, and structure. Although difficult, model building of this segment is the most critical step in antibody modeling. Since our first proposal of the "H3-rules," which classify CDR-H3 structure based on amino acid sequence, the number of experimentally determined antibody structures has increased. Here, we revise these H3-rules and propose an improved classification scheme for CDR-H3 structure modeling. In addition, we determine the common features of CDR-H3 in antibody drugs as well as discuss the concept of "antibody druggability," which can be applied as an indicator of antibody evaluation during drug discovery.

Journal ArticleDOI
01 Jun 2008-Proteins
TL;DR: The proposed method is robust enough to detect local similarity among active sites of different sizes, to discriminate between protein subfamilies and to recover the known targets of promiscuous ligands by virtual screening.
Abstract: A novel method to measure distances between druggable protein cavities is presented. Starting from user-defined ligand binding sites, eight topological and physicochemical properties are projected from cavity-lining protein residues to an 80 triangle-discretised sphere placed at the centre of the binding site, thus defining a cavity fingerprint. Representing binding site properties onto a discretised sphere presents many advantages: (i) a normalised distance between binding sites of different sizes may be easily derived by summing up the normalised differences between the 8 computed descriptors; (ii) a structural alignment of two proteins is simply done by systematically rotating/translating one mobile sphere around one immobile reference; (iii) a certain degree of fuzziness in the comparison is reached by projecting global amino acid properties (e.g., charge, size, functional groups count, distance to the site centre) independently of local rotameric/tautomeric states of cavity-lining residues. The method was implemented in a new program (SiteAlign) and tested in a number of various scenarios: measuring the distance between 376 related active site pairs, computing the cross-similarity of members of a protein family, predicting the targets of ligands with various promiscuity levels. The proposed method is robust enough to detect local similarity among active sites of different sizes, to discriminate between protein subfamilies and to recover the known targets of promiscuous ligands by virtual screening.

Journal ArticleDOI
01 Nov 2008-Proteins
TL;DR: Empirical scoring functions to calculate binding affinities of protein–ligand complexes have been calibrated based on experimental structure and affinity data collected from public and industrial sources and superior performance is observed in many cases, but the results also illustrate the need for further improvements.
Abstract: Empirical scoring functions to calculate binding affinities of protein–ligand complexes have been calibrated based on experimental structure and affinity data collected from public and industrial sources. Public data were taken from the AffinDB database, whereas access to industrial data was gained through the Scoring Function Consortium (SFC), a collaborative effort with various pharmaceutical companies and the Cambridge Crystallographic Data Center. More than 850 complexes were obtained by the data collection procedure and subsequently used to setup different training sets for the parameterization of new scoring functions. Over 60 different descriptors were evaluated for all complexes, including terms accounting for interactions with and among aromatic ring systems as well as many surface-dependent terms. After exploratory correlation and regression analyses, stepwise variable selection procedures and systematic searches, the most suitable descriptors were chosen as variables to calibrate regression functions by means of multiple linear regression or partial least squares analysis. Eight different functions are presented herein. Cross-validated r2 (Q2) values of up to 0.72 and standard errors (sPRESS) generally below 1.15 pKi units suggest highly predictive functions. Extensive unbiased validation was carried out by testing the functions on large data sets from the PDBbind database as used by Wang et al. (J Chem Inf Comput Sci 2004;44:2114–2125) in a comparative analysis of other scoring functions. Superior performance of the SFCscore functions is observed in many cases, but the results also illustrate the need for further improvements. Proteins 2008. © 2008 Wiley-Liss, Inc.