PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments.

doi:10.1093/BIOINFORMATICS/BTR638

Open AccessJournal ArticleDOI

PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments.

David T. Jones, +3 more

- 15 Jan 2012 -

Bioinformatics

- Vol. 28, Iss: 2, pp 184-190

TLDR

A novel method, PSICOV, is presented, which introduces the use of sparse inverse covariance estimation to the problem of protein contact prediction and displays a mean precision substantially better than the best performing normalized mutual information approach and Bayesian networks.

Abstract:

Motivation The accurate prediction of residue-residue contacts, critical for maintaining the native fold of a protein, remains an open problem in the field of structural bioinformatics. Interest in this long-standing problem has increased recently with algorithmic improvements and the rapid growth in the sizes of sequence families. Progress could have major impacts in both structure and function prediction to name but two benefits. Sequence-based contact predictions are usually made by identifying correlated mutations within multiple sequence alignments (MSAs), most commonly through the information-theoretic approach of calculating mutual information between pairs of sites in proteins. These predictions are often inaccurate because the true covariation signal in the MSA is often masked by biases from many ancillary indirect-coupling or phylogenetic effects. Here we present a novel method, PSICOV, which introduces the use of sparse inverse covariance estimation to the problem of protein contact prediction. Our method builds on work which had previously demonstrated corrections for phylogenetic and entropic correlation noise and allows accurate discrimination of direct from indirectly coupled mutation correlations in the MSA. Results PSICOV displays a mean precision substantially better than the best performing normalized mutual information approach and Bayesian networks. For 118 out of 150 targets, the L/5 (i.e. top-L/5 predictions for a protein of length L) precision for long-range contacts (sequence separation >23) was ≥ 0.5, which represents an improvement sufficient to be of significant benefit in protein structure prediction or model quality assessment. Availability The PSICOV source code can be downloaded from http://bioinf.cs.ucl.ac.uk/downloads/PSICOV.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Highly accurate protein structure prediction with AlphaFold

John M. Jumper, +33 more

- 15 Jul 2021 -

Nature

TL;DR: For example, AlphaFold as mentioned in this paper predicts protein structures with an accuracy competitive with experimental structures in the majority of cases using a novel deep learning architecture. But the accuracy is limited by the fact that no homologous structure is available.

...read moreread less

Journal ArticleDOI

Improved protein structure prediction using potentials from deep learning

Andrew W. Senior, +19 more

- 15 Jan 2020 -

Nature

TL;DR: It is shown that a neural network can be trained to make accurate predictions of the distances between pairs of residues, which convey more information about the structure than contact predictions, and the resulting potential can be optimized by a simple gradient descent algorithm to generate structures without complex sampling procedures.

...read moreread less

Journal ArticleDOI

The Protein-Folding Problem, 50 Years On

Ken A. Dill, +1 more

- 23 Nov 2012 -

Science

TL;DR: Progress is reviewed on three broad questions: What is the physical code by which an amino acid sequence dictates a protein’s native structure?

...read moreread less

Journal ArticleDOI

A series of PDB related databases for everyday needs.

Robbie P. Joosten, +7 more

- 01 Jan 2011 -

Nucleic Acids Research

TL;DR: A series of databases that run parallel to the Protein Data Bank, used for the analysis of properties of protein structures in areas ranging from structural genomics, to cancer biology and protein design, are presented.

...read moreread less

Journal ArticleDOI

Sparse and Compositionally Robust Inference of Microbial Ecological Networks

Zachary D. Kurtz, +5 more

- 07 May 2015 -

PLOS Computational Biology

TL;DR: SParse InversE Covariance Estimation for Ecological Association Inference is presented, a statistical method for the inference of microbial ecological networks from amplicon sequencing datasets that outperforms state-of-the-art methods to recover edges and network properties on synthetic data under a variety of scenarios.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

The Protein Data Bank

Helen M. Berman, +7 more

- 01 Jan 2000 -

Nucleic Acids Research

TL;DR: The goals of the PDB are described, the systems in place for data deposition and access, how to obtain further information and plans for the future development of the resource are described.

...read moreread less

Journal ArticleDOI

The Pfam protein families database

Marco Punta, +15 more

- 01 Jan 2000 -

Nucleic Acids Research

TL;DR: The definition and use of family-specific, manually curated gathering thresholds are explained and some of the features of domains of unknown function (also known as DUFs) are discussed, which constitute a rapidly growing class of families within Pfam.

...read moreread less

Journal ArticleDOI

Amino acid substitution matrices from protein blocks

Steven Henikoff, +1 more

- 15 Nov 1992 -

Proceedings of the National Academy of S...

TL;DR: This work has derived substitution matrices from about 2000 blocks of aligned sequence segments characterizing more than 500 groups of related proteins, leading to marked improvements in alignments and in searches using queries from each of the groups.

...read moreread less

Journal ArticleDOI

Sparse inverse covariance estimation with the graphical lasso

Jerome H. Friedman, +2 more

- 01 Jul 2008 -

Biostatistics

TL;DR: Using a coordinate descent procedure for the lasso, a simple algorithm is developed that solves a 1000-node problem in at most a minute and is 30-4000 times faster than competing methods.

...read moreread less

Journal ArticleDOI

High-dimensional graphs and variable selection with the Lasso

Nicolai Meinshausen, +1 more

- 01 Jun 2006 -

Annals of Statistics

TL;DR: It is shown that neighborhood selection with the Lasso is a computationally attractive alternative to standard covariance selection for sparse high-dimensional graphs and is hence equivalent to variable selection for Gaussian linear models.

...read moreread less