PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments.
TLDR
A novel method, PSICOV, is presented, which introduces the use of sparse inverse covariance estimation to the problem of protein contact prediction and displays a mean precision substantially better than the best performing normalized mutual information approach and Bayesian networks.Abstract:
Motivation The accurate prediction of residue-residue contacts, critical for maintaining the native fold of a protein, remains an open problem in the field of structural bioinformatics. Interest in this long-standing problem has increased recently with algorithmic improvements and the rapid growth in the sizes of sequence families. Progress could have major impacts in both structure and function prediction to name but two benefits. Sequence-based contact predictions are usually made by identifying correlated mutations within multiple sequence alignments (MSAs), most commonly through the information-theoretic approach of calculating mutual information between pairs of sites in proteins. These predictions are often inaccurate because the true covariation signal in the MSA is often masked by biases from many ancillary indirect-coupling or phylogenetic effects. Here we present a novel method, PSICOV, which introduces the use of sparse inverse covariance estimation to the problem of protein contact prediction. Our method builds on work which had previously demonstrated corrections for phylogenetic and entropic correlation noise and allows accurate discrimination of direct from indirectly coupled mutation correlations in the MSA. Results PSICOV displays a mean precision substantially better than the best performing normalized mutual information approach and Bayesian networks. For 118 out of 150 targets, the L/5 (i.e. top-L/5 predictions for a protein of length L) precision for long-range contacts (sequence separation >23) was ≥ 0.5, which represents an improvement sufficient to be of significant benefit in protein structure prediction or model quality assessment. Availability The PSICOV source code can be downloaded from http://bioinf.cs.ucl.ac.uk/downloads/PSICOV.read more
Citations
More filters
Journal ArticleDOI
Highly accurate protein structure prediction with AlphaFold
John M. Jumper,Richard O. Evans,Alexander Pritzel,Tim Green,Michael Figurnov,Olaf Ronneberger,Kathryn Tunyasuvunakool,Russell Bates,Augustin Žídek,Anna Potapenko,Alex Bridgland,Clemens Meyer,Simon A. A. Kohl,Andrew J. Ballard,Andrew Cowie,Bernardino Romera-Paredes,Stanislav Nikolov,R. D. Jain,Jonas Adler,Trevor Back,Stig Petersen,David Reiman,Ellen Clancy,Michal Zielinski,Martin Steinegger,Michalina Pacholska,Tamas Berghammer,Sebastian Bodenstein,David L. Silver,Oriol Vinyals,Andrew W. Senior,Koray Kavukcuoglu,Pushmeet Kohli,Demis Hassabis +33 more
TL;DR: For example, AlphaFold as mentioned in this paper predicts protein structures with an accuracy competitive with experimental structures in the majority of cases using a novel deep learning architecture. But the accuracy is limited by the fact that no homologous structure is available.
Journal ArticleDOI
Improved protein structure prediction using potentials from deep learning
Andrew W. Senior,Richard Evans,John M. Jumper,James Kirkpatrick,Laurent Sifre,Tim Green,Chongli Qin,Augustin Žídek,Alexander Nelson,Alex Bridgland,Hugo Penedones,Stig Petersen,Karen Simonyan,Steve Crossan,Pushmeet Kohli,David T. Jones,David T. Jones,David Silver,Koray Kavukcuoglu,Demis Hassabis +19 more
TL;DR: It is shown that a neural network can be trained to make accurate predictions of the distances between pairs of residues, which convey more information about the structure than contact predictions, and the resulting potential can be optimized by a simple gradient descent algorithm to generate structures without complex sampling procedures.
Journal ArticleDOI
The Protein-Folding Problem, 50 Years On
Ken A. Dill,Justin L. MacCallum +1 more
TL;DR: Progress is reviewed on three broad questions: What is the physical code by which an amino acid sequence dictates a protein’s native structure?
Journal ArticleDOI
A series of PDB related databases for everyday needs.
Robbie P. Joosten,Tim A. H. te Beek,Elmar Krieger,Maarten L. Hekkelman,Rob Hooft,Reinhard Schneider,Chris Sander,Gert Vriend +7 more
TL;DR: A series of databases that run parallel to the Protein Data Bank, used for the analysis of properties of protein structures in areas ranging from structural genomics, to cancer biology and protein design, are presented.
Journal ArticleDOI
Sparse and Compositionally Robust Inference of Microbial Ecological Networks
Zachary D. Kurtz,Christian L. Müller,Emily R. Miraldi,Dan R. Littman,Martin J. Blaser,Richard Bonneau +5 more
TL;DR: SParse InversE Covariance Estimation for Ecological Association Inference is presented, a statistical method for the inference of microbial ecological networks from amplicon sequencing datasets that outperforms state-of-the-art methods to recover edges and network properties on synthetic data under a variety of scenarios.
References
More filters
Journal ArticleDOI
The Protein Data Bank
Helen M. Berman,John D. Westbrook,Zukang Feng,Gary L. Gilliland,Talapady N. Bhat,Helge Weissig,Ilya N. Shindyalov,Philip E. Bourne +7 more
TL;DR: The goals of the PDB are described, the systems in place for data deposition and access, how to obtain further information and plans for the future development of the resource are described.
Journal ArticleDOI
The Pfam protein families database
Marco Punta,Penny Coggill,Ruth Y. Eberhardt,Jaina Mistry,John Tate,Chris Boursnell,Ningze Pang,Kristoffer Forslund,Goran Ceric,Jody Clements,Andreas Heger,Liisa Holm,Erik L. L. Sonnhammer,Sean R. Eddy,Alex Bateman,Robert D. Finn +15 more
TL;DR: The definition and use of family-specific, manually curated gathering thresholds are explained and some of the features of domains of unknown function (also known as DUFs) are discussed, which constitute a rapidly growing class of families within Pfam.
Journal ArticleDOI
Amino acid substitution matrices from protein blocks
TL;DR: This work has derived substitution matrices from about 2000 blocks of aligned sequence segments characterizing more than 500 groups of related proteins, leading to marked improvements in alignments and in searches using queries from each of the groups.
Journal ArticleDOI
Sparse inverse covariance estimation with the graphical lasso
TL;DR: Using a coordinate descent procedure for the lasso, a simple algorithm is developed that solves a 1000-node problem in at most a minute and is 30-4000 times faster than competing methods.
Journal ArticleDOI
High-dimensional graphs and variable selection with the Lasso
TL;DR: It is shown that neighborhood selection with the Lasso is a computationally attractive alternative to standard covariance selection for sparse high-dimensional graphs and is hence equivalent to variable selection for Gaussian linear models.