scispace - formally typeset
Search or ask a question

Showing papers on "De novo protein structure prediction published in 2015"


Journal ArticleDOI
TL;DR: A new meta-predictor (MetaPSICOV) is designed which combines three distinct approaches for inferring covariation signals from multiple sequence alignments, considers a broad range of other sequence-derived features and, uniquely, a range of metrics which describe both the local and global quality of the input multiple sequence alignment.
Abstract: Motivation: Recent developments of statistical techniques to infer direct evolutionary couplings between residue pairs have rendered covariation-based contact prediction a viable means for accurate 3D modelling of proteins, with no information other than the sequence required. To extend the usefulness of contact prediction, we have designed a new meta-predictor (MetaPSICOV) which combines three distinct approaches for inferring covariation signals from multiple sequence alignments, considers a broad range of other sequence-derived features and, uniquely, a range of metrics which describe both the local and global quality of the input multiple sequence alignment. Finally, we use a two-stage predictor, where the second stage filters the output of the first stage. This two-stage predictor is additionally evaluated on its ability to accurately predict the long range network of hydrogen bonds, including correctly assigning the donor and acceptor residues. Results: Using the original PSICOV benchmark set of 150 protein families, MetaPSICOV achieves a mean precision of 0.54 for top-L predicted long range contacts—around 60% higher than PSICOV, and around 40% better than CCMpred. In de novo protein structure prediction using FRAGFOLD, MetaPSICOV is able to improve the TM-scores of models by a median of 0.05 compared with PSICOV. Lastly, for predicting long range hydrogen bonding, MetaPSICOV-HB achieves a precision of 0.69 for the top-L/10 hydrogen bonds compared with just 0.26 for the baseline MetaPSICOV. Availability and implementation: MetaPSICOV is available as a freely available web server at http://bioinf.cs.ucl.ac.uk/MetaPSICOV. Raw data (predicted contact lists and 3D models) and source code can be downloaded from http://bioinf.cs.ucl.ac.uk/downloads/MetaPSICOV. Contact: ku.ca.lcu@senoj.t.d Supplementary information: Supplementary data are available at Bioinformatics online.

355 citations


Journal ArticleDOI
01 Aug 2015-Proteins
TL;DR: This paper presents an ab initio protein folding method to build three‐dimensional models using predicted contacts and secondary structures that improves the quality and accuracy of structural models and in particular generates better β‐sheets than other algorithms.
Abstract: Predicted protein residue–residue contacts can be used to build three-dimensional models and consequently to predict protein folds from scratch. A considerable amount of effort is currently being spent to improve contact prediction accuracy, whereas few methods are available to construct protein tertiary structures from predicted contacts. Here, we present an ab initio protein folding method to build three-dimensional models using predicted contacts and secondary structures. Our method first translates contacts and secondary structures into distance, dihedral angle, and hydrogen bond restraints according to a set of new conversion rules, and then provides these restraints as input for a distance geometry algorithm to build tertiary structure models. The initially reconstructed models are used to regenerate a set of physically realistic contact restraints and detect secondary structure patterns, which are then used to reconstruct final structural models. This unique two-stage modeling approach of integrating contacts and secondary structures improves the quality and accuracy of structural models and in particular generates better β-sheets than other algorithms. We validate our method on two standard benchmark datasets using true contacts and secondary structures. Our method improves TM-score of reconstructed protein models by 45% and 42% over the existing method on the two datasets, respectively. On the dataset for benchmarking reconstructions methods with predicted contacts and secondary structures, the average TM-score of best models reconstructed by our method is 0.59, 5.5% higher than the existing method. The CONFOLD web server is available at http://protein.rnet.missouri.edu/confold/. Proteins 2015; 83:1436–1449. © 2015 Wiley Periodicals, Inc.

166 citations


Journal ArticleDOI
01 Nov 2015-Methods
TL;DR: A distinct crosslinker length exists for which information content for de novo protein structure prediction is maximized and is demonstrated in this study.

45 citations


Journal ArticleDOI
22 Apr 2015-PLOS ONE
TL;DR: It is demonstrated that fragments presenting different predominant predicted secondary structures should be treated differently during the fragment library generation step and that exhaustive and random search strategies should both be used, which was used to develop a novel method, Flib.
Abstract: Fragment-based approaches are the current standard for de novo protein structure prediction. These approaches rely on accurate and reliable fragment libraries to generate good structural models. In this work, we describe a novel method for structure fragment library generation and its application in fragment-based de novo protein structure prediction. The importance of correct testing procedures in assessing the quality of fragment libraries is demonstrated. In particular, the exclusion of homologs to the target from the libraries to correctly simulate a de novo protein structure prediction scenario, something which surprisingly is not always done. We demonstrate that fragments presenting different predominant predicted secondary structures should be treated differently during the fragment library generation step and that exhaustive and random search strategies should both be used. This information was used to develop a novel method, Flib. On a validation set of 41 structurally diverse proteins, Flib libraries presents both a higher precision and coverage than two of the state-of-the-art methods, NNMake and HHFrag. Flib also achieves better precision and coverage on the set of 275 protein domains used in the two previous experiments of the the Critical Assessment of Structure Prediction (CASP9 and CASP10). We compared Flib libraries against NNMake libraries in a structure prediction context. Of the 13 cases in which a correct answer was generated, Flib models were more accurate than NNMake models for 10. “Flib is available for download at: http://www.stats.ox.ac.uk/research/proteins/resources”.

36 citations


Journal ArticleDOI
01 Mar 2015-Proteins
TL;DR: The quality of CASP10 models throughout the prediction pipeline was analyzed to understand BCL::Fold's ability to sample the native topology, identify native‐like models by scoring and/or clustering approaches, and the authors' ability to add loop regions and side chains to initial SSE‐only models.
Abstract: During CASP10 in summer 2012, we tested BCL::Fold for prediction of free modeling (FM) and template-based modeling (TBM) targets. BCL::Fold assembles the tertiary structure of a protein from predicted secondary structure elements (SSEs) omitting more flexible loop regions early on. This approach enables the sampling of conformational space for larger proteins with more complex topologies. In preparation of CASP11, we analyzed the quality of CASP10 models throughout the prediction pipeline to understand BCL::Fold's ability to sample the native topology, identify native-like models by scoring and/or clustering approaches, and our ability to add loop regions and side chains to initial SSE-only models. The standout observation is that BCL::Fold sampled topologies with a GDT_TS score > 33% for 12 of 18 and with a topology score > 0.8 for 11 of 18 test cases de novo. Despite the sampling success of BCL::Fold, significant challenges still exist in clustering and loop generation stages of the pipeline. The clustering approach employed for model selection often failed to identify the most native-like assembly of SSEs for further refinement and submission. It was also observed that for some β-strand proteins model refinement failed as β-strands were not properly aligned to form hydrogen bonds removing otherwise accurate models from the pool. Further, BCL::Fold samples frequently non-natural topologies that require loop regions to pass through the center of the protein. Proteins 2015; 83:547–563. © 2015 Wiley Periodicals, Inc.

7 citations


Dissertation
01 Jan 2015
TL;DR: Flib has been developed and shown that it generates fragment libraries with higher precision and coverage than two other methods and investigates whether the biological process of cotranslational protein folding can be used to improve de novo protein structure prediction.
Abstract: Fragment-based approaches are the current standard for de novo protein structure prediction. These approaches rely on accurate and reliable fragment libraries to generate good structural models. We demonstrate that fragments presenting different predominant predicted secondary structures should be treated differently during fragment library generation. Using this information, we developed Flib and shown that it generates fragment libraries with higher precision and coverage than two other methods. We explored co-evolution to identify pairs of residues that are in contact, which were then used to improve model generation. We performed a comparative analysis of nine methods in terms of their precision and their usefulness to de novo structure prediction. Our results show that metaPSICOV stage 2 produces the most accurate predictions and that metaPSICOV stage 1 generates the best modelling results. In general, contact predictors are good at identifying contacts between β-strands and bad at identifying contacts between α-helices. We also show that the ratio of satisfied predicted contacts can be used to assess whether correct models were generated for a given target. We also investigated whether the biological process of cotranslational protein folding, the notion that proteins fold as they are being synthesized, can be used to improve de novo protein structure prediction. Our tool for this investigation is SAINT2. SAINT2 differs from conventional fragment-assembly approaches as it is able to perform predictions sequentially from N to C-terminus, starting with a small peptide that is extended as the simulation progresses (SAINT2 Cotranslational). SAINT2 is also able to generate decoys in a standard non-sequential fashion (SAINT2 In Vitro). We compared SAINT2 Cotranslational to SAINT2 In Vitro and shown that SAINT2 Cotranslational generally produces better answers, generating an individual decoy between 1.5 to 2.5 times faster than SAINT2 In Vitro. Our results suggest that biologically inspired structure prediction can improve search heuristics and final model quality.

2 citations