Protein 3D structure computed from evolutionary sequence variation.

doi:10.1371/JOURNAL.PONE.0028766

Open AccessJournal ArticleDOI

Protein 3D structure computed from evolutionary sequence variation.

Debora S. Marks, +6 more

- 07 Dec 2011 -

PLOS ONE

- Vol. 6, Iss: 12

TLDR

Surprisingly, it is found that the strength of these inferred couplings is an excellent predictor of residue-residue proximity in folded structures, and the top-scoring residue couplings are sufficiently accurate and well-distributed to define the 3D protein fold with remarkable accuracy.

Abstract:

The evolutionary trajectory of a protein through sequence space is constrained by its function. Collections of sequence homologs record the outcomes of millions of evolutionary experiments in which the protein evolves according to these constraints. Deciphering the evolutionary record held in these sequences and exploiting it for predictive and engineering purposes presents a formidable challenge. The potential benefit of solving this challenge is amplified by the advent of inexpensive high-throughput genomic sequencing. In this paper we ask whether we can infer evolutionary constraints from a set of sequence homologs of a protein. The challenge is to distinguish true co-evolution couplings from the noisy set of observed correlations. We address this challenge using a maximum entropy model of the protein sequence, constrained by the statistics of the multiple sequence alignment, to infer residue pair couplings. Surprisingly, we find that the strength of these inferred couplings is an excellent predictor of residue-residue proximity in folded structures. Indeed, the top-scoring residue couplings are sufficiently accurate and well-distributed to define the 3D protein fold with remarkable accuracy. We quantify this observation by computing, from sequence alone, all-atom 3D structures of fifteen test proteins from different fold classes, ranging in size from 50 to 260 residues., including a G-protein coupled receptor. These blinded inferences are de novo, i.e., they do not use homology modeling or sequence-similar fragments from known structures. The co-evolution signals provide sufficient information to determine accurate 3D protein structure to 2.7–4.8 A Cα-RMSD error relative to the observed structure, over at least two-thirds of the protein (method called EVfold, details at http://EVfold.org). This discovery provides insight into essential interactions constraining protein evolution and will facilitate a comprehensive survey of the universe of protein structures, new strategies in protein and drug design, and the identification of functional genetic variants in normal and disease genomes.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Highly accurate protein structure prediction with AlphaFold

John M. Jumper, +33 more

- 15 Jul 2021 -

Nature

TL;DR: For example, AlphaFold as mentioned in this paper predicts protein structures with an accuracy competitive with experimental structures in the majority of cases using a novel deep learning architecture. But the accuracy is limited by the fact that no homologous structure is available.

...read moreread less

Journal ArticleDOI

HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment

Michael Remmert, +3 more

- 01 Feb 2012 -

Nature Methods

TL;DR: An open-source, general-purpose tool that represents both query and database sequences by profile hidden Markov models (HMMs): 'HMM-HMM–based lightning-fast iterative sequence search' (HHblits; http://toolkit.genzentrum.lmu.de/hhblits/).

...read moreread less

Journal ArticleDOI

Opportunities and obstacles for deep learning in biology and medicine.

Travers Ching, +38 more

- 01 Apr 2018 -

Journal of the Royal Society Interface

TL;DR: It is found that deep learning has yet to revolutionize biomedicine or definitively resolve any of the most pressing challenges in the field, but promising advances have been made on the prior state of the art.

...read moreread less

Journal ArticleDOI

JPred4: a protein secondary structure prediction server

Alexey Drozdetskiy, +3 more

- 01 Jul 2015 -

Nucleic Acids Research

TL;DR: JPred4 as discussed by the authors is the latest version of the popular JPred protein secondary structure prediction server which provides predictions by the JNet algorithm, one of the most accurate methods for secondary structure predictions.

...read moreread less

Journal ArticleDOI

The Protein-Folding Problem, 50 Years On

Ken A. Dill, +1 more

- 23 Nov 2012 -

Science

TL;DR: Progress is reviewed on three broad questions: What is the physical code by which an amino acid sequence dictates a protein’s native structure?

...read moreread less

Collapse

References

PDF

Open Access

More filters

The PyMOL Molecular Graphics System

W. L. Delano

Journal ArticleDOI

Crystallography & NMR System: A New Software Suite for Macromolecular Structure Determination

Axel T. Brunger, +15 more

- 01 Sep 1998 -

Acta Crystallographica Section D-biologi...

TL;DR: The Crystallography & NMR System (CNS) as mentioned in this paper is a software suite for macromolecular structure determination by X-ray crystallography or solution nuclear magnetic resonance (NMR) spectroscopy.

...read moreread less

Journal ArticleDOI

The Pfam protein families database

Marco Punta, +15 more

- 01 Jan 2000 -

Nucleic Acids Research

TL;DR: The definition and use of family-specific, manually curated gathering thresholds are explained and some of the features of domains of unknown function (also known as DUFs) are discussed, which constitute a rapidly growing class of families within Pfam.

...read moreread less

Journal ArticleDOI

Pfam: the protein families database.

Robert D. Finn, +12 more

- 01 Jan 2014 -

Nucleic Acids Research

TL;DR: Pfam as discussed by the authors is a widely used database of protein families, containing 14 831 manually curated entries in the current version, version 27.0, and has been updated several times since 2012.

...read moreread less