scispace - formally typeset
Open AccessJournal ArticleDOI

PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments.

TLDR
A novel method, PSICOV, is presented, which introduces the use of sparse inverse covariance estimation to the problem of protein contact prediction and displays a mean precision substantially better than the best performing normalized mutual information approach and Bayesian networks.
Abstract
Motivation The accurate prediction of residue-residue contacts, critical for maintaining the native fold of a protein, remains an open problem in the field of structural bioinformatics. Interest in this long-standing problem has increased recently with algorithmic improvements and the rapid growth in the sizes of sequence families. Progress could have major impacts in both structure and function prediction to name but two benefits. Sequence-based contact predictions are usually made by identifying correlated mutations within multiple sequence alignments (MSAs), most commonly through the information-theoretic approach of calculating mutual information between pairs of sites in proteins. These predictions are often inaccurate because the true covariation signal in the MSA is often masked by biases from many ancillary indirect-coupling or phylogenetic effects. Here we present a novel method, PSICOV, which introduces the use of sparse inverse covariance estimation to the problem of protein contact prediction. Our method builds on work which had previously demonstrated corrections for phylogenetic and entropic correlation noise and allows accurate discrimination of direct from indirectly coupled mutation correlations in the MSA. Results PSICOV displays a mean precision substantially better than the best performing normalized mutual information approach and Bayesian networks. For 118 out of 150 targets, the L/5 (i.e. top-L/5 predictions for a protein of length L) precision for long-range contacts (sequence separation >23) was ≥ 0.5, which represents an improvement sufficient to be of significant benefit in protein structure prediction or model quality assessment. Availability The PSICOV source code can be downloaded from http://bioinf.cs.ucl.ac.uk/downloads/PSICOV.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

Improved protein structure prediction using potentials from deep learning

TL;DR: It is shown that a neural network can be trained to make accurate predictions of the distances between pairs of residues, which convey more information about the structure than contact predictions, and the resulting potential can be optimized by a simple gradient descent algorithm to generate structures without complex sampling procedures.
Journal ArticleDOI

The Protein-Folding Problem, 50 Years On

TL;DR: Progress is reviewed on three broad questions: What is the physical code by which an amino acid sequence dictates a protein’s native structure?
Journal ArticleDOI

A series of PDB related databases for everyday needs.

TL;DR: A series of databases that run parallel to the Protein Data Bank, used for the analysis of properties of protein structures in areas ranging from structural genomics, to cancer biology and protein design, are presented.
Journal ArticleDOI

Sparse and Compositionally Robust Inference of Microbial Ecological Networks

TL;DR: SParse InversE Covariance Estimation for Ecological Association Inference is presented, a statistical method for the inference of microbial ecological networks from amplicon sequencing datasets that outperforms state-of-the-art methods to recover edges and network properties on synthetic data under a variety of scenarios.
References
More filters
Journal ArticleDOI

The Protein Data Bank

TL;DR: The goals of the PDB are described, the systems in place for data deposition and access, how to obtain further information and plans for the future development of the resource are described.
Journal ArticleDOI

The Pfam protein families database

TL;DR: The definition and use of family-specific, manually curated gathering thresholds are explained and some of the features of domains of unknown function (also known as DUFs) are discussed, which constitute a rapidly growing class of families within Pfam.
Journal ArticleDOI

Amino acid substitution matrices from protein blocks

TL;DR: This work has derived substitution matrices from about 2000 blocks of aligned sequence segments characterizing more than 500 groups of related proteins, leading to marked improvements in alignments and in searches using queries from each of the groups.
Journal ArticleDOI

Sparse inverse covariance estimation with the graphical lasso

TL;DR: Using a coordinate descent procedure for the lasso, a simple algorithm is developed that solves a 1000-node problem in at most a minute and is 30-4000 times faster than competing methods.
Journal ArticleDOI

High-dimensional graphs and variable selection with the Lasso

TL;DR: It is shown that neighborhood selection with the Lasso is a computationally attractive alternative to standard covariance selection for sparse high-dimensional graphs and is hence equivalent to variable selection for Gaussian linear models.
Related Papers (5)