scispace - formally typeset
Search or ask a question
Journal ArticleDOI

I-TASSER: a unified platform for automated protein structure and function prediction

25 Mar 2010-Nature Protocols (Nature Publishing Group)-Vol. 5, Iss: 4, pp 725-738
TL;DR: The iterative threading assembly refinement (I-TASSER) server is an integrated platform for automated protein structure and function prediction based on the sequence- to-structure-to-function paradigm.
Abstract: The iterative threading assembly refinement (I-TASSER) server is an integrated platform for automated protein structure and function prediction based on the sequence-to-structure-to-function paradigm. Starting from an amino acid sequence, I-TASSER first generates three-dimensional (3D) atomic models from multiple threading alignments and iterative structural assembly simulations. The function of the protein is then inferred by structurally matching the 3D models with other known proteins. The output from a typical server run contains full-length secondary and tertiary structure predictions, and functional annotations on ligand-binding sites, Enzyme Commission numbers and Gene Ontology terms. An estimate of accuracy of the predictions is provided based on the confidence score of the modeling. This protocol provides new insights and guidelines for designing of online server systems for the state-of-the-art protein structure and function predictions. The server is available at http://zhanglab.ccmb.med.umich.edu/I-TASSER.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
15 Jul 2021-Nature
TL;DR: For example, AlphaFold as mentioned in this paper predicts protein structures with an accuracy competitive with experimental structures in the majority of cases using a novel deep learning architecture. But the accuracy is limited by the fact that no homologous structure is available.
Abstract: Proteins are essential to life, and understanding their structure can facilitate a mechanistic understanding of their function. Through an enormous experimental effort1–4, the structures of around 100,000 unique proteins have been determined5, but this represents a small fraction of the billions of known protein sequences6,7. Structural coverage is bottlenecked by the months to years of painstaking effort required to determine a single protein structure. Accurate computational approaches are needed to address this gap and to enable large-scale structural bioinformatics. Predicting the three-dimensional structure that a protein will adopt based solely on its amino acid sequence—the structure prediction component of the ‘protein folding problem’8—has been an important open research problem for more than 50 years9. Despite recent progress10–14, existing methods fall far short of atomic accuracy, especially when no homologous structure is available. Here we provide the first computational method that can regularly predict protein structures with atomic accuracy even in cases in which no similar structure is known. We validated an entirely redesigned version of our neural network-based model, AlphaFold, in the challenging 14th Critical Assessment of protein Structure Prediction (CASP14)15, demonstrating accuracy competitive with experimental structures in a majority of cases and greatly outperforming other methods. Underpinning the latest version of AlphaFold is a novel machine learning approach that incorporates physical and biological knowledge about protein structure, leveraging multi-sequence alignments, into the design of the deep learning algorithm. AlphaFold predicts protein structures with an accuracy competitive with experimental structures in the majority of cases using a novel deep learning architecture.

10,601 citations

Journal ArticleDOI
TL;DR: An updated protocol for Phyre2, which uses advanced remote homology detection methods to build 3D models, predict ligand binding sites and analyze the effect of amino acid variants for a user's protein sequence.
Abstract: Phyre2 is a web-based tool for predicting and analyzing protein structure and function. Phyre2 uses advanced remote homology detection methods to build 3D models, predict ligand binding sites, and analyze amino acid variants in a protein sequence. Phyre2 is a suite of tools available on the web to predict and analyze protein structure, function and mutations. The focus of Phyre2 is to provide biologists with a simple and intuitive interface to state-of-the-art protein bioinformatics tools. Phyre2 replaces Phyre, the original version of the server for which we previously published a paper in Nature Protocols. In this updated protocol, we describe Phyre2, which uses advanced remote homology detection methods to build 3D models, predict ligand binding sites and analyze the effect of amino acid variants (e.g., nonsynonymous SNPs (nsSNPs)) for a user's protein sequence. Users are guided through results by a simple interface at a level of detail they determine. This protocol will guide users from submitting a protein sequence to interpreting the secondary and tertiary structure of their models, their domain composition and model quality. A range of additional available tools is described to find a protein structure in a genome, to submit large number of sequences at once and to automatically run weekly searches for proteins that are difficult to model. The server is available at http://www.sbg.bio.ic.ac.uk/phyre2 . A typical structure prediction will be returned between 30 min and 2 h after submission.

7,941 citations

Journal ArticleDOI
TL;DR: A stand-alone I-TASSER Suite that can be used for off-line protein structure and function prediction and three complementary algorithms to enhance function inferences are developed, the consensus of which is derived by COACH4 using support vector machines.
Abstract: The lowest free-energy conformations are identified by structure clustering. A second round of assembly simulation is conducted, starting from the centroid models, to remove steric clashes and refine global topology. Final atomic structure models are constructed from the low-energy conformations by a two-step atomic-level energy minimization approach. The correctness of the global model is assessed by the confidence score, which is based on the significance of threading alignments and the density of structure clustering; the residue-level local quality of the structural models and B factor of the target protein are evaluated by a newly developed method, ResQ, built on the variation of modeling simulations and the uncertainty of homologous alignments through support vector regression training. For function annotation, the structure models with the highest confidence scores are matched against the BioLiP5 database of ligand-protein interactions to detect homologous function templates. Functional insights on ligand-binding site (LBS), Enzyme Commission (EC) and Gene Ontology (GO) are deduced from the functional templates. We developed three complementary algorithms (COFACTOR, TM-SITE and S-SITE) to enhance function inferences, the consensus of which is derived by COACH4 using support vector machines. Detailed instructions for installation, implementation and result interpretation of the Suite can be found in the Supplementary Methods and Supplementary Tables 1 and 2. The I-TASSER Suite pipeline was tested in recent communitywide structure and function prediction experiments, including CASP10 (ref. 1) and CAMEO2. Overall, I-TASSER generated the correct fold with a template modeling score (TM-score) >0.5 for 10 out of 36 “New Fold” (NF) targets in the CASP10, which have no homologous templates in the Protein Data Bank (PDB). Of the 110 template-based modeling targets, 92 had a TM-score >0.5, and 89 had the templates drawn closer to the native with an average r.m.s. deviation improvement of 1.05 Å in the same threadingaligned regions6. In CAMEO, COACH generated LBS predictions for 4,271 targets with an average accuracy 0.86, which was 20% higher than that of the second-best method in the experiment. Here we illustrate I-TASSER Suite–based structure and function modeling using six examples (Fig. 1b–g) from the communitywide blind tests1,2. R0006 and R0007 are two NF targets from CASP10, and I-TASSER constructed models of correct fold with a TM-score of 0.62 for both targets (Fig. 1b,c). An illustration of local quality estimation by ResQ is shown for T0652, which has an average error 0.75 Å compared to the actual deviation of the model from the native (Fig. 1h). The four LBS prediction examples (Fig. 1d–g) are from CASP10 (ref. 1) and CAMEO2; COACH generated ligand models all with a ligand r.m.s. deviation below 2 Å. COACH also correctly assigned the threeand fourdigit EC numbers to the enzyme targets C0050 and C0046 (Supplementary Table 3). In summary, we developed a stand-alone I-TASSER Suite that can be used for off-line protein structure and function prediction. The I-TASSER Suite: protein structure and function prediction

4,693 citations

Journal ArticleDOI
TL;DR: The new version of the MPI Bioinformatics Toolkit is introduced, focusing on improved features for the comprehensive analysis of proteins, as well as on promoting teaching.

1,757 citations

Journal ArticleDOI
TL;DR: Focuses have been made on the introduction of new methods for atomic-level structure refinement, local structure quality estimation and biological function annotations, which are designed to address the requirements from the user community and to increase the accuracy of modeling predictions.
Abstract: The I-TASSER server (http://zhanglab.ccmb.med.umich.edu/I-TASSER) is an online resource for automated protein structure prediction and structure-based function annotation. In I-TASSER, structural templates are first recognized from the PDB using multiple threading alignment approaches. Full-length structure models are then constructed by iterative fragment assembly simulations. The functional insights are finally derived by matching the predicted structure models with known proteins in the function databases. Although the server has been widely used for various biological and biomedical investigations, numerous comments and suggestions have been reported from the user community. In this article, we summarize recent developments on the I-TASSER server, which were designed to address the requirements from the user community and to increase the accuracy of modeling predictions. Focuses have been made on the introduction of new methods for atomic-level structure refinement, local structure quality estimation and biological function annotations. We expect that these new developments will improve the quality of the I-TASSER server and further facilitate its use by the community for high-resolution structure and function prediction.

1,698 citations

References
More filters
Journal ArticleDOI
TL;DR: A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original.
Abstract: The BLAST programs are widely used tools for searching protein and DNA databases for sequence similarities. For protein comparisons, a variety of definitional, algorithmic and statistical refinements described here permits the execution time of the BLAST programs to be decreased substantially while enhancing their sensitivity to weak similarities. A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original. In addition, a method is introduced for automatically combining statistically significant alignments produced by BLAST into a position-specific score matrix, and searching the database using this matrix. The resulting Position-Specific Iterated BLAST (PSIBLAST) program runs at approximately the same speed per iteration as gapped BLAST, but in many cases is much more sensitive to weak but biologically relevant sequence similarities. PSI-BLAST is used to uncover several new and interesting members of the BRCT superfamily.

70,111 citations


"I-TASSER: a unified platform for au..." refers background in this paper

  • ..., evolutionarily related homologous templates are identified by sequence or sequence profile comparison...

    [...]

Journal ArticleDOI
TL;DR: The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing.
Abstract: Genomic sequencing has made it clear that a large fraction of the genes specifying the core biological functions are shared by all eukaryotes. Knowledge of the biological role of such shared proteins in one organism can often be transferred to other organisms. The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing. To this end, three independent ontologies accessible on the World-Wide Web (http://www.geneontology.org) are being constructed: biological process, molecular function and cellular component.

35,225 citations


"I-TASSER: a unified platform for au..." refers background in this paper

  • ..., a library of 26,045 nonredundant entries with known GO term...

    [...]

Journal ArticleDOI
TL;DR: The goals of the PDB are described, the systems in place for data deposition and access, how to obtain further information and plans for the future development of the resource are described.
Abstract: The Protein Data Bank (PDB; http://www.rcsb.org/pdb/ ) is the single worldwide archive of structural data of biological macromolecules. This paper describes the goals of the PDB, the systems in place for data deposition and access, how to obtain further information, and near-term plans for the future development of the resource.

34,239 citations


"I-TASSER: a unified platform for au..." refers background in this paper

  • ..., with a solved protein structure in the Protein Data Bank (PDB) librar...

    [...]

Journal ArticleDOI
TL;DR: A comparative protein modelling method designed to find the most probable structure for a sequence given its alignment with related structures, which is automated and illustrated by the modelling of trypsin from two other serine proteinases.

12,386 citations


"I-TASSER: a unified platform for au..." refers background in this paper

  • ...It needs to be mentioned that despite extensive benchmark test...

    [...]

Journal ArticleDOI
TL;DR: A two-stage neural network has been used to predict protein secondary structure based on the position specific scoring matrices generated by PSI-BLAST and achieved an average Q3 score of between 76.5% to 78.3% depending on the precise definition of observed secondary structure used, which is the highest published score for any method to date.

5,512 citations


"I-TASSER: a unified platform for au..." refers methods in this paper

  • ...A sequence profile is then created based on multiple alignment of the sequence homologs, which is also used to predict the secondary structure using PSIPRE...

    [...]