Protein function prediction
About: Protein function prediction is a(n) research topic. Over the lifetime, 1435 publication(s) have been published within this topic receiving 68780 citation(s).
TL;DR: A stand-alone I-TASSER Suite that can be used for off-line protein structure and function prediction and three complementary algorithms to enhance function inferences are developed, the consensus of which is derived by COACH4 using support vector machines.
Abstract: The lowest free-energy conformations are identified by structure clustering. A second round of assembly simulation is conducted, starting from the centroid models, to remove steric clashes and refine global topology. Final atomic structure models are constructed from the low-energy conformations by a two-step atomic-level energy minimization approach. The correctness of the global model is assessed by the confidence score, which is based on the significance of threading alignments and the density of structure clustering; the residue-level local quality of the structural models and B factor of the target protein are evaluated by a newly developed method, ResQ, built on the variation of modeling simulations and the uncertainty of homologous alignments through support vector regression training. For function annotation, the structure models with the highest confidence scores are matched against the BioLiP5 database of ligand-protein interactions to detect homologous function templates. Functional insights on ligand-binding site (LBS), Enzyme Commission (EC) and Gene Ontology (GO) are deduced from the functional templates. We developed three complementary algorithms (COFACTOR, TM-SITE and S-SITE) to enhance function inferences, the consensus of which is derived by COACH4 using support vector machines. Detailed instructions for installation, implementation and result interpretation of the Suite can be found in the Supplementary Methods and Supplementary Tables 1 and 2. The I-TASSER Suite pipeline was tested in recent communitywide structure and function prediction experiments, including CASP10 (ref. 1) and CAMEO2. Overall, I-TASSER generated the correct fold with a template modeling score (TM-score) >0.5 for 10 out of 36 “New Fold” (NF) targets in the CASP10, which have no homologous templates in the Protein Data Bank (PDB). Of the 110 template-based modeling targets, 92 had a TM-score >0.5, and 89 had the templates drawn closer to the native with an average r.m.s. deviation improvement of 1.05 Å in the same threadingaligned regions6. In CAMEO, COACH generated LBS predictions for 4,271 targets with an average accuracy 0.86, which was 20% higher than that of the second-best method in the experiment. Here we illustrate I-TASSER Suite–based structure and function modeling using six examples (Fig. 1b–g) from the communitywide blind tests1,2. R0006 and R0007 are two NF targets from CASP10, and I-TASSER constructed models of correct fold with a TM-score of 0.62 for both targets (Fig. 1b,c). An illustration of local quality estimation by ResQ is shown for T0652, which has an average error 0.75 Å compared to the actual deviation of the model from the native (Fig. 1h). The four LBS prediction examples (Fig. 1d–g) are from CASP10 (ref. 1) and CAMEO2; COACH generated ligand models all with a ligand r.m.s. deviation below 2 Å. COACH also correctly assigned the threeand fourdigit EC numbers to the enzyme targets C0050 and C0046 (Supplementary Table 3). In summary, we developed a stand-alone I-TASSER Suite that can be used for off-line protein structure and function prediction. The I-TASSER Suite: protein structure and function prediction
Abstract: A computational method system, and computer program are provided for inferring functional links from genome sequences. One method is based on the observation that some pairs of proteins A′ and B′ have homologs in another organism fused into a single protein chain AB. A trans-genome comparison of sequences can reveal these AB sequences, which are Rosetta Stone sequences because they decipher an interaction between A′ and B. Another method compares the genomic sequence of two or more organisms to create a phylogenetic profile for each protein indicating its presence or absence across all the genomes. The profile provides information regarding functional links between different families of proteins. In yet another method a combination of the above two methods is used to predict functional links.
TL;DR: This Viewpoint begins by describing the essential features of the methods, the accuracy of the models, and their application to the prediction and understanding of protein function, both for single proteins and on the scale of whole genomes.
Abstract: Genome sequencing projects are producing linear amino acid sequences, but full understanding of the biological role of these proteins will require knowledge of their structure and function. Although experimental structure determination methods are providing high-resolution structure information about a subset of the proteins, computational structure prediction methods will provide valuable information for the large fraction of sequences whose structures will not be determined experimentally. The first class of protein structure prediction methods, including threading and comparative modeling, rely on detectable similarity spanning most of the modeled sequence and at least one known structure. The second class of methods, de novo or ab initio methods, predict the structure from sequence alone, without relying on similarity at the fold level between the modeled sequence and any of the known structures. In this Viewpoint, we begin by describing the essential features of the methods, the accuracy of the models, and their application to the prediction and understanding of protein function, both for single proteins and on the scale of whole genomes. We then discuss the important role that protein structure prediction methods play in the growing worldwide effort in structural genomics.
TL;DR: Structural descriptors for protein functional sites are crucial for unlocking the secrets in both the sequence and structural-genomics projects.
Abstract: The genome-sequencing projects are providing a detailed 'parts list' of life. A key to comprehending this list is understanding the function of each gene and each protein at various levels. Sequence-based methods for function prediction are inadequate because of the multifunctional nature of proteins. However, just knowing the structure of the protein is also insufficient for prediction of multiple functional sites. Structural descriptors for protein functional sites are crucial for unlocking the secrets in both the sequence and structural-genomics projects.
TL;DR: This approach correctly predicts a functional category for 72% of the 1,393 characterized proteins with at least one partner of known function, and has been applied to predict functions for 364 previously uncharacterized proteins.
Abstract: A global analysis of 2,709 published interactions between proteins of the yeast Saccharomyces cerevisiae has been performed, enabling the establishment of a single large network of 2,358 interactions among 1,548 proteins. Proteins of known function and cellular location tend to cluster together, with 63% of the interactions occurring between proteins with a common functional assignment and 76% occurring between proteins found in the same subcellular compartment. Possible functions can be assigned to a protein based on the known functions of its interacting partners. This approach correctly predicts a functional category for 72% of the 1,393 characterized proteins with at least one partner of known function, and has been applied to predict functions for 364 previously uncharacterized proteins.