scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Template-based protein structure modeling using the RaptorX web server

01 Aug 2012-Nature Protocols (Nature Publishing Group)-Vol. 7, Iss: 8, pp 1511-1522
TL;DR: This protocol presents a community-wide web-based method using RaptorX (http://raptorx.uchicago.edu/) for protein secondary structure prediction, template-based tertiary structure modeling, alignment quality assessment and sophisticated probabilistic alignment sampling.
Abstract: A key challenge of modern biology is to uncover the functional role of the protein entities that compose cellular proteomes. To this end, the availability of reliable three-dimensional atomic models of proteins is often crucial. This protocol presents a community-wide web-based method using RaptorX ( http://raptorx.uchicago.edu/ ) for protein secondary structure prediction, template-based tertiary structure modeling, alignment quality assessment and sophisticated probabilistic alignment sampling. RaptorX distinguishes itself from other servers by the quality of the alignment between a target sequence and one or multiple distantly related template proteins (especially those with sparse sequence profiles) and by a novel nonlinear scoring function and a probabilistic-consistency algorithm. Consequently, RaptorX delivers high-quality structural models for many targets with only remote templates. At present, it takes RaptorX ∼35 min to finish processing a sequence of 200 amino acids. Since its official release in August 2011, RaptorX has processed ∼6,000 sequences submitted by ∼1,600 users from around the world.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: An updated protocol for Phyre2, which uses advanced remote homology detection methods to build 3D models, predict ligand binding sites and analyze the effect of amino acid variants for a user's protein sequence.
Abstract: Phyre2 is a web-based tool for predicting and analyzing protein structure and function. Phyre2 uses advanced remote homology detection methods to build 3D models, predict ligand binding sites, and analyze amino acid variants in a protein sequence. Phyre2 is a suite of tools available on the web to predict and analyze protein structure, function and mutations. The focus of Phyre2 is to provide biologists with a simple and intuitive interface to state-of-the-art protein bioinformatics tools. Phyre2 replaces Phyre, the original version of the server for which we previously published a paper in Nature Protocols. In this updated protocol, we describe Phyre2, which uses advanced remote homology detection methods to build 3D models, predict ligand binding sites and analyze the effect of amino acid variants (e.g., nonsynonymous SNPs (nsSNPs)) for a user's protein sequence. Users are guided through results by a simple interface at a level of detail they determine. This protocol will guide users from submitting a protein sequence to interpreting the secondary and tertiary structure of their models, their domain composition and model quality. A range of additional available tools is described to find a protein structure in a genome, to submit large number of sequences at once and to automatically run weekly searches for proteins that are difficult to model. The server is available at http://www.sbg.bio.ic.ac.uk/phyre2 . A typical structure prediction will be returned between 30 min and 2 h after submission.

7,941 citations

Journal ArticleDOI
TL;DR: An update to the SWISS-MODEL server is presented, which includes the implementation of a new modelling engine, ProMod3, and the introduction a new local model quality estimation method, QMEANDisCo.
Abstract: Homology modelling has matured into an important technique in structural biology, significantly contributing to narrowing the gap between known protein sequences and experimentally determined structures. Fully automated workflows and servers simplify and streamline the homology modelling process, also allowing users without a specific computational expertise to generate reliable protein models and have easy access to modelling results, their visualization and interpretation. Here, we present an update to the SWISS-MODEL server, which pioneered the field of automated modelling 25 years ago and been continuously further developed. Recently, its functionality has been extended to the modelling of homo- and heteromeric complexes. Starting from the amino acid sequences of the interacting proteins, both the stoichiometry and the overall structure of the complex are inferred by homology modelling. Other major improvements include the implementation of a new modelling engine, ProMod3 and the introduction a new local model quality estimation method, QMEANDisCo. SWISS-MODEL is freely available at https://swissmodel.expasy.org.

7,022 citations

Journal ArticleDOI
TL;DR: The latest version of the SWISS-MODEL expert system for protein structure modelling is described, which makes extensive use of model quality estimation for selection of the most suitable templates and provides estimates of the expected accuracy of the resulting models.
Abstract: Protein structure homology modelling has become a routine technique to generate 3D models for proteins when experimental structures are not available. Fully automated servers such as SWISS-MODEL with user-friendly web interfaces generate reliable models without the need for complex software packages or downloading large databases. Here, we describe the latest version of the SWISS-MODEL expert system for protein structure modelling. The SWISS-MODEL template library provides annotation of quaternary structure and essential ligands and co-factors to allow for building of complete structural models, including their oligomeric structure. The improved SWISS-MODEL pipeline makes extensive use of model quality estimation for selection of the most suitable templates and provides estimates of the expected accuracy of the resulting models. The accuracy of the models generated by SWISS-MODEL is continuously evaluated by the CAMEO system. The new web site allows users to interactively search for templates, cluster them by sequence similarity, structurally compare alternative templates and select the ones to be used for model building. In cases where multiple alternative template structures are available for a protein of interest, a user-guided template selection step allows building models in different functional states. SWISS-MODEL is available at http://swissmodel.expasy.org/.

4,235 citations

Journal ArticleDOI
TL;DR: A new deep learning method that predicts contacts by integrating both evolutionary coupling (EC) and sequence conservation information through an ultra-deep neural network formed by two deep residual neural networks that greatly outperforms existing methods and leads to much more accurate contact-assisted folding.
Abstract: Motivation Protein contacts contain key information for the understanding of protein structure and function and thus, contact prediction from sequence is an important problem. Recently exciting progress has been made on this problem, but the predicted contacts for proteins without many sequence homologs is still of low quality and not very useful for de novo structure prediction. Method This paper presents a new deep learning method that predicts contacts by integrating both evolutionary coupling (EC) and sequence conservation information through an ultra-deep neural network formed by two deep residual neural networks. The first residual network conducts a series of 1-dimensional convolutional transformation of sequential features; the second residual network conducts a series of 2-dimensional convolutional transformation of pairwise information including output of the first residual network, EC information and pairwise potential. By using very deep residual networks, we can accurately model contact occurrence patterns and complex sequence-structure relationship and thus, obtain higher-quality contact prediction regardless of how many sequence homologs are available for proteins in question. Results Our method greatly outperforms existing methods and leads to much more accurate contact-assisted folding. Tested on 105 CASP11 targets, 76 past CAMEO hard targets, and 398 membrane proteins, the average top L long-range prediction accuracy obtained by our method, one representative EC method CCMpred and the CASP11 winner MetaPSICOV is 0.47, 0.21 and 0.30, respectively; the average top L/10 long-range accuracy of our method, CCMpred and MetaPSICOV is 0.77, 0.47 and 0.59, respectively. Ab initio folding using our predicted contacts as restraints but without any force fields can yield correct folds (i.e., TMscore>0.6) for 203 of the 579 test proteins, while that using MetaPSICOV- and CCMpred-predicted contacts can do so for only 79 and 62 of them, respectively. Our contact-assisted models also have much better quality than template-based models especially for membrane proteins. The 3D models built from our contact prediction have TMscore>0.5 for 208 of the 398 membrane proteins, while those from homology modeling have TMscore>0.5 for only 10 of them. Further, even if trained mostly by soluble proteins, our deep learning method works very well on membrane proteins. In the recent blind CAMEO benchmark, our fully-automated web server implementing this method successfully folded 6 targets with a new fold and only 0.3L-2.3L effective sequence homologs, including one β protein of 182 residues, one α+β protein of 125 residues, one α protein of 140 residues, one α protein of 217 residues, one α/β of 260 residues and one α protein of 462 residues. Our method also achieved the highest F1 score on free-modeling targets in the latest CASP (Critical Assessment of Structure Prediction), although it was not fully implemented back then. Availability http://raptorx.uchicago.edu/ContactMap/

779 citations

Journal ArticleDOI
Liisa M. Pelttari1, Sofia Khan1, Mikko Vuorela2, Johanna I. Kiiski1, Sara Vilske1, Viivi Nevanlinna1, Salla Ranta1, Johanna Schleutker3, Johanna Schleutker4, Johanna Schleutker5, Robert Winqvist2, Anne Kallioniemi5, Thilo Dörk6, Natalia Bogdanova6, Jonine Figueroa, Paul D.P. Pharoah7, Marjanka K. Schmidt8, Alison M. Dunning7, Montserrat Garcia-Closas9, Manjeet K. Bolla7, Joe Dennis7, Kyriaki Michailidou7, Qin Wang7, John L. Hopper10, Melissa C. Southey10, Efraim H. Rosenberg8, Peter A. Fasching11, Peter A. Fasching12, Matthias W. Beckmann11, Julian Peto13, Isabel dos-Santos-Silva13, Elinor J. Sawyer14, Ian Tomlinson15, Barbara Burwinkel16, Barbara Burwinkel17, Harald Surowy16, Harald Surowy17, Pascal Guénel18, Thérèse Truong18, Stig E. Bojesen19, Stig E. Bojesen20, Børge G. Nordestgaard20, Børge G. Nordestgaard19, Javier Benitez, Anna González-Neira, Susan L. Neuhausen21, Hoda Anton-Culver22, Hermann Brenner17, Volker Arndt17, Alfons Meindl23, Rita K. Schmutzler24, Hiltrud Brauch25, Hiltrud Brauch17, Hiltrud Brauch26, Thomas Brüning27, Annika Lindblom28, Sara Margolin28, Arto Mannermaa29, Jaana M. Hartikainen29, Georgia Chenevix-Trench30, kConFab10, kConFab30, Aocs Investigators31, Laurien Van Dyck31, Hilde Janssen32, Hilde Janssen17, Jenny Chang-Claude17, Anja Rudolph, Paolo Radice, Paolo Peterlongo33, Emily Hallberg33, Janet E. Olson10, Janet E. Olson34, Graham G. Giles10, Graham G. Giles34, Roger L. Milne35, Christopher A. Haiman35, Fredrick Schumacher36, Jacques Simard36, Martine Dumont37, Martine Dumont38, Vessela N. Kristensen37, Vessela N. Kristensen38, Anne Lise Børresen-Dale39, Wei Zheng39, Alicia Beeghly-Fadiel40, Mervi Grip41, Mervi Grip42, Irene L. Andrulis42, Gord Glendon43, Peter Devilee44, Caroline Seynaeve44, Maartje J. Hooning45, Margriet Collée46, Angela Cox46, Simon S. Cross7, Mitul Shah7, Robert Luben17, Ute Hamann47, Ute Hamann17, Diana Torres48, Anna Jakubowska48, Jan Lubinski33, Fergus J. Couch, Drakoulis Yannoukakos9, Nick Orr9, Anthony J. Swerdlow28, Hatef Darabi28, Jingmei Li28, Kamila Czene28, Per Hall7, Douglas F. Easton1, Johanna Mattson1, Carl Blomqvist1, Kristiina Aittomäki1, Heli Nevanlinna 
05 May 2016-PLOS ONE
TL;DR: It is suggested that loss-of-function mutations in RAD 51B are rare, but common variation at the RAD51B region is significantly associated with familial breast cancer risk.
Abstract: Common variation on 14q24.1, close to RAD51B, has been associated with breast cancer: rs999737 and rs2588809 with the risk of female breast cancer and rs1314913 with the risk of male breast cancer. The aim of this study was to investigate the role of RAD51B variants in breast cancer predisposition, particularly in the context of familial breast cancer in Finland. We sequenced the coding region of RAD51B in 168 Finnish breast cancer patients from the Helsinki region for identification of possible recurrent founder mutations. In addition, we studied the known rs999737, rs2588809, and rs1314913 SNPs and RAD51B haplotypes in 44,791 breast cancer cases and 43,583 controls from 40 studies participating in the Breast Cancer Association Consortium (BCAC) that were genotyped on a custom chip (iCOGS). We identified one putatively pathogenic missense mutation c.541C>T among the Finnish cancer patients and subsequently genotyped the mutation in additional breast cancer cases (n = 5259) and population controls (n = 3586) from Finland and Belarus. No significant association with breast cancer risk was seen in the meta-analysis of the Finnish datasets or in the large BCAC dataset. The association with previously identified risk variants rs999737, rs2588809, and rs1314913 was replicated among all breast cancer cases and also among familial cases in the BCAC dataset. The most significant association was observed for the haplotype carrying the risk-alleles of all the three SNPs both among all cases (odds ratio (OR): 1.15, 95% confidence interval (CI): 1.11-1.19, P = 8.88 x 10-16) and among familial cases (OR: 1.24, 95% CI: 1.16-1.32, P = 6.19 x 10-11), compared to the haplotype with the respective protective alleles. Our results suggest that loss-of-function mutations in RAD51B are rare, but common variation at the RAD51B region is significantly associated with familial breast cancer risk.

715 citations


Cites methods from "Template-based protein structure mo..."

  • ...Secondary structure prediction was done with RaptorX [27] and protein-protein interaction with PredictProtein [28]....

    [...]

  • ...According to RaptorX secondary structure prediction software the arginine in position 181 is located in beta-sheet with the likelihood of 83.4% but, as determined by PredictProtein, the amino acid is not predicted to directly participate in protein-protein interactions....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: The goals of the PDB are described, the systems in place for data deposition and access, how to obtain further information and plans for the future development of the resource are described.
Abstract: The Protein Data Bank (PDB; http://www.rcsb.org/pdb/ ) is the single worldwide archive of structural data of biological macromolecules. This paper describes the goals of the PDB, the systems in place for data deposition and access, how to obtain further information, and near-term plans for the future development of the resource.

34,239 citations


"Template-based protein structure mo..." refers background or methods in this paper

  • ...The numbered labels indicate the location of the following screen features: (1) tabs for switching between the three-state and eight-state prediction; (2) hovering over a residue will give detailed statistics on the secondary-state distribution; (3) the status: a current running time of the job; (4) a download link for the prediction results; and (5) a color-code legend for secondary structure diagram....

    [...]

  • ...The numbered labels indicate the location of the following screen features: (1) a drop-down menu for switching between alternative alignments; (2) the alignment between target sequence and template; (3) indication of the status: a current running time of the job; (4) a link for download of the prediction result; and (5) a legend indicating the alignment color coding....

    [...]

  • ...The numbered labels indicate the location of the following screen features: (1) the rank of currently selected model; (2) the quality score of the model; (3) the PDB IDs for the set templates used for modeling; (4) a drop-down menu for selecting alternative structure models; (5) tabs for switching between structure prediction, function annotation and BLAST output; (6) interactive viewer displaying the currently selected model structure; (7) menu for controlling the interactive viewer; (8) alignment used for structure modeling; (9) indication of the status: a current running time of the job; (10) download links for prediction results; and (11) a user guide for the interactive structure viewer....

    [...]

Journal ArticleDOI
TL;DR: AutoDock4 incorporates limited flexibility in the receptor and its utility in analysis of covalently bound ligands is reported, using both a grid‐based docking method and a modification of the flexible sidechain technique.
Abstract: We describe the testing and release of AutoDock4 and the accompanying graphical user interface AutoDockTools. AutoDock4 incorporates limited flexibility in the receptor. Several tests are reported here, including a redocking experiment with 188 diverse ligand-protein complexes and a cross-docking experiment using flexible sidechains in 87 HIV protease complexes. We also report its utility in analysis of covalently bound ligands, using both a grid-based docking method and a modification of the flexible sidechain technique.

15,616 citations

Journal ArticleDOI
TL;DR: The definition and use of family-specific, manually curated gathering thresholds are explained and some of the features of domains of unknown function (also known as DUFs) are discussed, which constitute a rapidly growing class of families within Pfam.
Abstract: Pfam is a widely used database of protein families and domains. This article describes a set of major updates that we have implemented in the latest release (version 24.0). The most important change is that we now use HMMER3, the latest version of the popular profile hidden Markov model package. This software is approximately 100 times faster than HMMER2 and is more sensitive due to the routine use of the forward algorithm. The move to HMMER3 has necessitated numerous changes to Pfam that are described in detail. Pfam release 24.0 contains 11,912 families, of which a large number have been significantly updated during the past two years. Pfam is available via servers in the UK (http://pfam.sanger.ac.uk/), the USA (http://pfam.janelia.org/) and Sweden (http://pfam.sbc.su.se/).

14,075 citations

Journal ArticleDOI
TL;DR: Pfam as discussed by the authors is a widely used database of protein families, containing 14 831 manually curated entries in the current version, version 27.0, and has been updated several times since 2012.
Abstract: Pfam, available via servers in the UK (http://pfam.sanger.ac.uk/) and the USA (http://pfam.janelia.org/), is a widely used database of protein families, containing 14 831 manually curated entries in the current release, version 27.0. Since the last update article 2 years ago, we have generated 1182 new families and maintained sequence coverage of the UniProt Knowledgebase (UniProtKB) at nearly 80%, despite a 50% increase in the size of the underlying sequence database. Since our 2012 article describing Pfam, we have also undertaken a comprehensive review of the features that are provided by Pfam over and above the basic family data. For each feature, we determined the relevance, computational burden, usage statistics and the functionality of the feature in a website context. As a consequence of this review, we have removed some features, enhanced others and developed new ones to meet the changing demands of computational biology. Here, we describe the changes to Pfam content. Notably, we now provide family alignments based on four different representative proteome sequence data sets and a new interactive DNA search interface. We also discuss the mapping between Pfam and known 3D structures.

9,415 citations

Journal ArticleDOI
TL;DR: A new method for multiple sequence alignment that provides a dramatic improvement in accuracy with a modest sacrifice in speed as compared to the most commonly used alternatives but avoids the most serious pitfalls caused by the greedy nature of this algorithm.

6,727 citations