scispace - formally typeset
Search or ask a question

Showing papers on "De novo protein structure prediction published in 2016"


Journal ArticleDOI
TL;DR: It is demonstrated that previously unappreciated hydrogen bonds occur within proteins between the amide proton and carbonyl oxygen of the same residue, and inclusion in computational force fields would improve models of protein folding, function, and dysfunction.
Abstract: Within polypeptides, C5 hydrogen bonds form between the amide proton and carbonyl oxygen of the same residue. This intraresidue interaction stabilizes β-sheets in particular and is widespread throughout structurally characterized proteins. Current limitations in de novo protein structure prediction and design suggest an incomplete understanding of the interactions that govern protein folding. Here we demonstrate that previously unappreciated hydrogen bonds occur within proteins between the amide proton and carbonyl oxygen of the same residue. Quantum calculations, infrared spectroscopy, and nuclear magnetic resonance spectroscopy show that these interactions share hallmark features of canonical hydrogen bonds. Biophysical analyses demonstrate that selective attenuation or enhancement of these C5 hydrogen bonds affects the stability of synthetic β-sheets. These interactions are common, affecting approximately 5% of all residues and 94% of proteins, and their cumulative impact provides several kilocalories per mole of conformational stability to a typical protein. C5 hydrogen bonds especially stabilize the flat β-sheets of the amyloid state, which is linked with Alzheimer's disease and other neurodegenerative disorders. Inclusion of these interactions in computational force fields would improve models of protein folding, function, and dysfunction.

84 citations


Journal ArticleDOI
TL;DR: A novel generative, probabilistic model that simultaneously captures local structural preferences of backbone and side chain conformational space of polypeptide chains in a united-residue representation and performs experimentally motivated conditional conformational sampling that minimizes a composite physics and knowledge-based energy function for de novo protein structure prediction is developed.
Abstract: Motivation: Recent experimental studies have suggested that proteins fold via stepwise assembly of structural units named ‘foldons’ through the process of sequential stabilization. Alongside, latest developments on computational side based on probabilistic modeling have shown promising direction to perform de novo protein conformational sampling from continuous space. However, existing computational approaches for de novo protein structure prediction often randomly sample protein conformational space as opposed to experimentally suggested stepwise sampling. Results: Here, we develop a novel generative, probabilistic model that simultaneously captures local structural preferences of backbone and side chain conformational space of polypeptide chains in a united-residue representation and performs experimentally motivated conditional conformational sampling via stepwise synthesis and assembly of foldon units that minimizes a composite physics and knowledge-based energy function for de novo protein structure prediction. The proposed method, UniCon3D, has been found to (i) sample lower energy conformations with higher accuracy than traditional random sampling in a small benchmark of 6 proteins; (ii) perform comparably with the top five automated methods on 30 difficult target domains from the 11th Critical Assessment of Protein Structure Prediction (CASP) experiment and on 15 difficult target domains from the 10th CASP experiment; and (iii) outperform two state-of-the-art approaches and a baseline counterpart of UniCon3D that performs traditional random sampling for protein modeling aided by predicted residue-residue contacts on 45 targets from the 10th edition of CASP. Availability and Implementation: Source code, executable versions, manuals and example data of UniCon3D for Linux and OSX are freely available to non-commercial users at http://sysbio.rnet.missouri.edu/UniCon3D/. Contact: ude.iruossim@ijgnehc Supplementary information: Supplementary data are available at Bioinformatics online.

49 citations


Journal ArticleDOI
TL;DR: A novel fragment‐library‐construction algorithm, LRFragLib, is introduced to improve the detection of near‐native low‐homology fragments of 7‐10 residues, using a multi‐stage, flexible selection protocol and has a comparable computational efficiency to the fastest existing techniques with substantially reduced memory usage.
Abstract: Motivation The quality of fragment library determines the efficiency of fragment assembly, an approach that is widely used in most de novo protein-structure prediction algorithms. Conventional fragment libraries are constructed mainly based on the identities of amino acids, sometimes facilitated by predicted information including dihedral angles and secondary structures. However, it remains challenging to identify near-native fragment structures with low sequence homology. Results We introduce a novel fragment-library-construction algorithm, LRFragLib, to improve the detection of near-native low-homology fragments of 7-10 residues, using a multi-stage, flexible selection protocol. Based on logistic regression scoring models, LRFragLib outperforms existing techniques by achieving a significantly higher precision and a comparable coverage on recent CASP protein sets in sampling near-native structures. The method also has a comparable computational efficiency to the fastest existing techniques with substantially reduced memory usage. Availability and implementation The source code is available for download at http://166.111.152.91/Downloads.html. Contact hgong@tsinghua.edu.cn. Supplementary information Supplementary data are available at Bioinformatics online.

11 citations


Journal ArticleDOI
TL;DR: FRAGSION is lightning-fast, consuming only few seconds of CPU time to generate fragment library for a protein of typical length (300 residues); can generate dynamic-size fragments of any length (even for the whole protein sequence) and offers ways to handle noise in predicted secondary structure during fragment sampling.
Abstract: MOTIVATION Speed, accuracy and robustness of building protein fragment library have important implications in de novo protein structure prediction since fragment-based methods are one of the most successful approaches in template-free modeling (FM). Majority of the existing fragment detection methods rely on database-driven search strategies to identify candidate fragments, which are inherently time-consuming and often hinder the possibility to locate longer fragments due to the limited sizes of databases. Also, it is difficult to alleviate the effect of noisy sequence-based predicted features such as secondary structures on the quality of fragment. RESULTS Here, we present FRAGSION, a database-free method to efficiently generate protein fragment library by sampling from an Input-Output Hidden Markov Model. FRAGSION offers some unique features compared to existing approaches in that it (i) is lightning-fast, consuming only few seconds of CPU time to generate fragment library for a protein of typical length (300 residues); (ii) can generate dynamic-size fragments of any length (even for the whole protein sequence) and (iii) offers ways to handle noise in predicted secondary structure during fragment sampling. On a FM dataset from the most recent Critical Assessment of Structure Prediction, we demonstrate that FGRAGSION provides advantages over the state-of-the-art fragment picking protocol of ROSETTA suite by speeding up computation by several orders of magnitude while achieving comparable performance in fragment quality. AVAILABILITY AND IMPLEMENTATION Source code and executable versions of FRAGSION for Linux and MacOS is freely available to non-commercial users at http://sysbio.rnet.missouri.edu/FRAGSION/ It is bundled with a manual and example data. CONTACT chengji@missouri.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

11 citations


Book ChapterDOI
01 Jan 2016
TL;DR: The main variants of evolutionary algorithms are presented, and it is shown through the example of EdaRose how population-based sampling methods can be applied to protein structure prediction to effectively search the conformational space while maintaining a balance between exploration and exploitation.
Abstract: Population-based sampling methods, such as evolutionary algorithms, are generally applied to solve large optimization problems with rugged energy landscapes. By monitoring the population and enabling communication, these techniques allow one to control and guide the sampling process. Due to these features, population-based sampling methods can be applied to a wide spectrum of problems. With a huge search space and a funnel-like energy landscape composed of multiple local attraction basins, protein structure prediction falls in the range of problems on which population-based sampling methods typically perform well. Here, we present the main variants of evolutionary algorithms, and show through the example of EdaRose how population-based sampling methods can be applied to protein structure prediction to effectively search the conformational space while maintaining a balance between exploration and exploitation.

8 citations


Journal ArticleDOI
TL;DR: This article briefly reviewed the background and research history of the protein folding problem, and introduced the progresses of protein folding prediction research from four aspects: theprotein folding process prediction (protein folding simulation), the folding process related parameter prediction, the protein fold result prediction ( protein structure prediction), and the folding result related parameter predictions.
Abstract: Protein folding is the process that a protein molecule transforms from the linear polymer of peptides to a three-dimensional native structure with specific biological function. By now, the protein folding problem has been studied for more than 50 years and already became a broad and active research field. To answer the 58th question raised by Science in 2005, in this article we briefly reviewed the background and research history of the protein folding problem, and introduced the progresses of protein folding prediction research from four aspects: the protein folding process prediction (protein folding simulation), the folding process related parameter prediction, the protein folding result prediction (protein structure prediction), and the folding result related parameter prediction. The studies on the protein folding problem began in the 60s of 20th century, with the efforts to seek a solution to the paradox that a protein can actually form a native 3D structure in only several seconds but the time scale estimated by a thermodynamic ergodic hypothesis would be longer than the age of universe. Computer simulation is an important approach for protein folding study. The protein models can be classified into 3 categories: lattice model, off-lattice model and all-atom model. The current knowledge about protein folding mechanism is based on the concept of folding funnel on a free-energy landscape, and the current opinion is that the protein folding mechanism is not unique for the whole protein universe and that there may exist a continuum between the two extreme ends of hierarchical folding and nucleation folding scenarios. The hardware for protein folding simulation was becoming more powerful; distributed systems (e.g, Folding@home), special-purpose machines (e.g, ANTON), and GPU-based platforms have been developed for protein folding simulation. Meanwhile, the folding simulation software was continuously enhanced. An important issue in protein folding simulation is to overcome the local energy barrier to find the global energy minimum; several approaches such as replica-exchange, multi-scale modeling and Modeling Employing Limited Data (MELD) were developed to tackle this issue; human intelligence involvement (e.g, “Foldit” Game) is another interesting effort. During the past two decades, the ability of protein folding simulation was continuously rising. For now, the folding simulation for the proteins with dozens of amino acids can reach a time scale of millisecond, while the protein size able to do effective folding simulation is around 100 amino acids. The targets of protein folding simulation have been largely expanded and now include both the in vitro and the in vivo folding such as co-translational folding, chaperone-assistant folding, small-molecule- induced folding and metal-coupled folding. Folding rate and folding type are two important parameters related with the protein folding process and now they can be predicted by statistical and machine-learning approaches based on different levels of structural features such as the topological properties of tertiary structure, the contents of secondary structure and the amino acid frequencies of primary structure. The result of a protein folding process is the formation of a protein structure. According to the hierarchy of structural organization, the protein structure prediction problem includes secondary structure prediction, tertiary structure prediction and quaternary structure prediction. By now, the secondary structure prediction algorithm has experienced five generations and the current accuracy is about 80% for 3-classes prediction. The tertiary structure prediction approaches mainly include two categories: template-based modeling and free modeling, with the former having higher accuracy and the latter having larger application scope. The quaternary structure prediction includes the prediction of complex structure and the prediction of the possibility of protein-protein interaction, and these predictions can be performed based on protein 3D structure or merely amino acid sequence. Structure related parameter prediction also attracted research interests, including the predictions of protein structural classes, secondary structure contents, disordered regions, solvent accessible surface region and the amino acid contacting pairs in the interface of protein-protein interaction. In the end, some possible development directions worth noticing in the future of protein folding research were suggested and they are: the coupling between protein folding and binding, the fusion of protein folding research with systems biology and the application of deep-learning techniques in the field of protein folding prediction.

2 citations


Proceedings ArticleDOI
01 Dec 2016
TL;DR: A novel method based on learning-to-rank (RRCRank) has been presented to predict protein residue-residue contacts, which shows the proposed method could take advantage of machine-learning and correlated mutations approaches and could provide the state-of-the-art performance.
Abstract: Protein residue-residue contacts dictate the topology of protein structure and play an important role in structural biology, especially in de novo protein structure prediction. Accurate prediction of residue contacts could improve the performance of de novo protein structure prediction methods. In this study, a novel method based on learning-to-rank (RRCRank) has been presented to predict protein residue-residue contacts. The proposed method formulates the contacts prediction problem as a ranking problem. Firstly, the contact probabilities of residue pairs are predicted by ensemble machine-learning classifiers and correlated mutations approaches. And then, the proposed method integrates the complementary outputs of machine-learning and correlated mutations approaches and uses the learning-to-rank algorithm to rank residue pairs based on their probabilities to be contacts. Benchmarked on the CASP11 dataset, the proposed method achieves an improved performance for all three categories of contacts (short-range, medium-range and long-range contacts), which shows the proposed method based on learning-to-rank could take advantage of machine-learning and correlated mutations approaches and could provide the state-of-the-art performance.

1 citations


22 May 2016
TL;DR: This work aims to improve residue-residue contact predictions by improving the underlying mathematical models in a Bayesian framework to reduce random and systematic errors inherent in contact prediction to make protein de novo structure prediction widely applicable.
Abstract: An understanding of protein tertiary structure is important for both basic and translational research, for example to understand molecular mechanisms, engineer new or optimized catalysts, or formulate new cures. Protein tertiary structures are typically determined experimentally, a time-consuming process with average costs in the hundred thousands of US dollars for determining a single protein structure. Consequently, there is much interest in using computational methods for driving down the cost of obtaining new structures. While great successes have been made in transferring structural information from already structurally solved homologous proteins, the sensitivity improvements of methods for detecting homologous proteins have plateaued in recent years and homology-based Protein structure prediction is ultimately limited by the availability of a suitable template that must be determined experimentally. De novo protein structure prediction could theoretically use physical models to determine the native conformation of a protein without Prior structural information but in practice, such approaches are limited by the computational costs of evaluating expensive energy functions for many different points in an enormous search space. An old idea in protein bioinformatics is to use the compensatory mutations observed due to the evolutionary pressure of maintaining a protein fold to predict which residue pairs in a protein structures are interacting in the folded structure. If such interactions can be reliably predicted, they can be used to constrain the search space of de novo protein structure prediction sufficiently so that the lowest-energy conformation can be found. Through recent improvements in the accuracy of such residue-residue interaction predictors, Protein domain structures of typical size could be predicted in a blinded experiment for the first time in 2011. However, the new class of methods is still limited in its applicability in that methods are sensitive to false-positive predictions of interactions and can only provide reliable predictions with low false-positive rates for Protein families that have a high number of homologous sequences. This work aims to improve residue-residue contact predictions by improving the underlying mathematical models in a Bayesian framework. By explicitly modelling noise effects inherent in the underlying data and including priors to reflect the nature of residue-residue interactions, an attempt is made to reduce random and systematic errors inherent in contact prediction to make protein de novo structure prediction widely applicable.

Journal ArticleDOI
TL;DR: The Protein structure (PSP) problem is one of the hardest problems in computational biology and technical research and Dill proposes the HP-model which subsequently became a major tactic to the PSP problem.
Abstract: The Protein structure (PSP) problem is one of the hardest problems in computational biology and technical research. To reduce the complexity of the problem, Dill proposes the HP-model which subsequently became a major tactic to the PSP problem.