scispace - formally typeset
Open AccessPosted Content

3D Protein Structure Predicted from Sequence

TLDR
It is shown that co-variation of residue pairs, observed in a large protein family, provides sufficient information to determine 3D protein structure, which opens the door to a comprehensive survey of protein 3D structures, including many not currently accessible to the experimental methods of structural genomics.
Abstract
The evolutionary trajectory of a protein through sequence space is constrained by function and three-dimensional (3D) structure. Residues in spatial proximity tend to co-evolve, yet attempts to invert the evolutionary record to identify these constraints and use them to computationally fold proteins have so far been unsuccessful. Here, we show that co-variation of residue pairs, observed in a large protein family, provides sufficient information to determine 3D protein structure. Using a data-constrained maximum entropy model of the multiple sequence alignment, we identify pairs of statistically coupled residue positions which are expected to be close in the protein fold, termed contacts inferred from evolutionary information (EICs). To assess the amount of information about the protein fold contained in these coupled pairs, we evaluate the accuracy of predicted 3D structures for proteins of 50-260 residues, from 15 diverse protein families, including a G-protein coupled receptor. These structure predictions are de novo, i.e., they do not use homology modeling or sequence-similar fragments from known structures. The resulting low C{\alpha}-RMSD error range of 2.7-5.1A, over at least 75% of the protein, indicates the potential for predicting essentially correct 3D structures for the thousands of protein families that have no known structure, provided they include a sufficiently large number of divergent sample sequences. With the current enormous growth in sequence information based on new sequencing technology, this opens the door to a comprehensive survey of protein 3D structures, including many not currently accessible to the experimental methods of structural genomics. This advance has potential applications in many biological contexts, such as synthetic biology, identification of functional sites in proteins and interpretation of the functional impact of genetic variants.

read more

Citations
More filters
Journal ArticleDOI

Direct-coupling analysis of residue coevolution captures native contacts across many protein families

TL;DR: The findings suggest that contacts predicted by DCA can be used as a reliable guide to facilitate computational predictions of alternative protein conformations, protein complex formation, and even the de novo prediction of protein domain structures, contingent on the existence of a large number of homologous sequences which are being rapidly made available due to advances in genome sequencing.
Journal ArticleDOI

Toward optimal fragment generations for ab initio protein structure assembly

TL;DR: A gapless‐threading method to generate position‐specific structure fragments is developed and it is found that the optimal fragment length for structural assembly is around 10, and at least 100 fragments at each location are needed to achieve optimal structure assembly.
Journal ArticleDOI

From principal component to direct coupling analysis of coevolution in proteins: low-eigenvalue modes are needed for structure prediction.

TL;DR: The Hopfield-Potts model is introduced, inspired by the statistical physics of disordered systems, and it is shown how the computation of such statistical patterns makes it possible to accurately predict residue-residue contacts with a much smaller number of parameters than DCA.
Journal Article

Mimicking the folding pathway to improve homology-free protein structure prediction

Abstract: Since the demonstration that the sequence of a protein encodes its structure, the prediction of structure from sequence remains an outstanding problem that impacts numerous scientific disciplines, including many genome projects. By iteratively fixing secondary structure assignments of residues during Monte Carlo simulations of folding, our coarse-grained model without information concerning homology or explicit side chains can outperform current homology-based secondary structure prediction methods for many proteins. The computationally rapid algorithm using only single (φ,ψ) dihedral angle moves also generates tertiary structures of accuracy comparable with existing all-atom methods for many small proteins, particularly those with low homology. Hence, given appropriate search strategies and scoring functions, reduced representations can be used for accurately predicting secondary structure and providing 3D structures, thereby increasing the size of proteins approachable by homology-free methods and the accuracy of template methods that depend on a high-quality input secondary structure.
Journal ArticleDOI

[Chaotic artificial bee colony algorithm: a new approach to the problem of minimization of energy of the 3D protein structure].

TL;DR: The Chaotic Artificial Bee Colony (CABC) algorithm was introduced and applied to 3D protein structure prediction and demonstrates that the proposed algorithm provides an effective and high-performance method forprotein structure prediction.
References
More filters
Journal ArticleDOI

The Pfam protein families database

TL;DR: The definition and use of family-specific, manually curated gathering thresholds are explained and some of the features of domains of unknown function (also known as DUFs) are discussed, which constitute a rapidly growing class of families within Pfam.
Journal ArticleDOI

I-TASSER: a unified platform for automated protein structure and function prediction

TL;DR: The iterative threading assembly refinement (I-TASSER) server is an integrated platform for automated protein structure and function prediction based on the sequence- to-structure-to-function paradigm.
Journal ArticleDOI

Weak pairwise correlations imply strongly correlated network states in a neural population.

TL;DR: It is shown, in the vertebrate retina, that weak correlations between pairs of neurons coexist with strongly collective behaviour in the responses of ten or more neurons, and it is found that this collective behaviour is described quantitatively by models that capture the observed pairwise correlations but assume no higher-order interactions.
Journal ArticleDOI

Scoring function for automated assessment of protein structure template quality

TL;DR: A new scoring function, the template modeling score (TM‐score), to assess the quality of protein structure templates and predicted full‐length models by extending the approaches used in Global Distance Test (GDT) 1 and MaxSub, which suggests that the TM‐score is a useful complement to the fully automated assessment ofprotein structure predictions.
Related Papers (5)