Highly accurate protein structure prediction for the human proteome

doi:10.1038/S41586-021-03828-1

Home
/
Papers
/
Highly accurate protein structure prediction for the human proteome

Journal Article•DOI•

Highly accurate protein structure prediction for the human proteome

Kathryn Tunyasuvunakool, Jonas Adler, Zachary Wu, Tim Green, Michal Zielinski, Augustin Žídek, Alex Bridgland, Andrew Cowie, Clemens Meyer, Agata Laydon, Sameer Velankar¹, Gerard J. Kleywegt¹, Alex Bateman¹, Richard Evans, Alexander Pritzel, Michael Figurnov, Olaf Ronneberger, Russell Bates, Simon A. A. Kohl, Anna Potapenko, Andrew J. Ballard, Bernardino Romera-Paredes, Stanislav Nikolov, R. D. Jain, Ellen Clancy, David Reiman, Stig Petersen, Andrew W. Senior, Koray Kavukcuoglu, Ewan Birney¹, Pushmeet Kohli, John M. Jumper, Demis Hassabis - Show less +29 more•Institutions (1)

European Bioinformatics Institute¹

22 Jul 2021-Nature (Nature Publishing Group)-Vol. 596, Iss: 7873, pp 590-596

TL;DR: The AlphaFold2 dataset as discussed by the authors is a large-scale and high-accuracy structure prediction dataset for protein structures, which is used to evaluate the structural properties of proteins.

read less

Abstract: Protein structures can provide invaluable information, both for reasoning about biological processes and for enabling interventions such as structure-based drug development or targeted mutagenesis. After decades of effort, 17% of the total residues in human protein sequences are covered by an experimentally-determined structure1. Here we dramatically expand structural coverage by applying the state-of-the-art machine learning method, AlphaFold2, at scale to almost the entire human proteome (98.5% of human proteins). The resulting dataset covers 58% of residues with a confident prediction, of which a subset (36% of all residues) have very high confidence. We introduce several metrics developed by building on the AlphaFold model, and use them to interpret the dataset, identifying strong multi-domain predictions as well as regions likely to be disordered. Finally, we provide some case studies illustrating how high-quality predictions may be used to generate biological hypotheses. Importantly, we are making our predictions freely available to the community via a public database (hosted by the European Bioinformatics Institute at https://alphafold.ebi.ac.uk/ ). We anticipate that routine large-scale and high-accuracy structure prediction will become an important tool, allowing new questions to be addressed from a structural perspective.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Highly accurate protein structure prediction with AlphaFold

[...]

John M. Jumper, Richard O. Evans, Alexander Pritzel, Tim Green, Michael Figurnov, Olaf Ronneberger, Kathryn Tunyasuvunakool, Russell Bates, Augustin Žídek, Anna Potapenko, Alex Bridgland, Clemens Meyer, Simon A. A. Kohl, Andrew J. Ballard, Andrew Cowie, Bernardino Romera-Paredes, Stanislav Nikolov, R. D. Jain, Jonas Adler, Trevor Back, Stig Petersen, David Reiman, Ellen Clancy, Michal Zielinski, Martin Steinegger¹, Michalina Pacholska, Tamas Berghammer, Sebastian Bodenstein, David L. Silver, Oriol Vinyals, Andrew W. Senior, Koray Kavukcuoglu, Pushmeet Kohli, Demis Hassabis - Show less +30 more•Institutions (1)

Seoul National University¹

15 Jul 2021-Nature

TL;DR: For example, AlphaFold as mentioned in this paper predicts protein structures with an accuracy competitive with experimental structures in the majority of cases using a novel deep learning architecture. But the accuracy is limited by the fact that no homologous structure is available.

...read moreread less

Abstract: Proteins are essential to life, and understanding their structure can facilitate a mechanistic understanding of their function. Through an enormous experimental effort1–4, the structures of around 100,000 unique proteins have been determined5, but this represents a small fraction of the billions of known protein sequences6,7. Structural coverage is bottlenecked by the months to years of painstaking effort required to determine a single protein structure. Accurate computational approaches are needed to address this gap and to enable large-scale structural bioinformatics. Predicting the three-dimensional structure that a protein will adopt based solely on its amino acid sequence—the structure prediction component of the ‘protein folding problem’8—has been an important open research problem for more than 50 years9. Despite recent progress10–14, existing methods fall far short of atomic accuracy, especially when no homologous structure is available. Here we provide the first computational method that can regularly predict protein structures with atomic accuracy even in cases in which no similar structure is known. We validated an entirely redesigned version of our neural network-based model, AlphaFold, in the challenging 14th Critical Assessment of protein Structure Prediction (CASP14)15, demonstrating accuracy competitive with experimental structures in a majority of cases and greatly outperforming other methods. Underpinning the latest version of AlphaFold is a novel machine learning approach that incorporates physical and biological knowledge about protein structure, leveraging multi-sequence alignments, into the design of the deep learning algorithm. AlphaFold predicts protein structures with an accuracy competitive with experimental structures in the majority of cases using a novel deep learning architecture.

...read moreread less

10,601 citations

Journal Article•DOI•

AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models.

[...]

Mihaly Varadi¹, Stephen Anyango¹, Mandar Deshpande¹, Sreenath Nair¹, Cindy Natassia¹, Galabina Yordanova¹, David Yu Yuan¹, Oana Stroe¹, Gemma Wood¹, Agata Laydon, Augustin Žídek, Tim Green, Kathryn Tunyasuvunakool, Stig Petersen, John M. Jumper, Ellen Clancy, Richard E. Green, Ankur Vora, Mira Lutfi, Michael Figurnov, Andrew Cowie, Nicole Hobbs, Pushmeet Kohli, Gerard J. Kleywegt¹, Ewan Birney¹, Demis Hassabis, Sameer Velankar¹ - Show less +23 more•Institutions (1)

European Bioinformatics Institute¹

17 Nov 2021-Nucleic Acids Research

TL;DR: The AlphaFold Protein Structure Database (AlphaFold DB, https://alphafold.ebi.ac.uk) is an openly accessible, extensive database of high-accuracy protein-structure predictions.

...read moreread less

Abstract: The AlphaFold Protein Structure Database (AlphaFold DB, https://alphafold.ebi.ac.uk) is an openly accessible, extensive database of high-accuracy protein-structure predictions. Powered by AlphaFold v2.0 of DeepMind, it has enabled an unprecedented expansion of the structural coverage of the known protein-sequence space. AlphaFold DB provides programmatic access to and interactive visualization of predicted atomic coordinates, per-residue and pairwise model-confidence estimates and predicted aligned errors. The initial release of AlphaFold DB contains over 360,000 predicted structures across 21 model-organism proteomes, which will soon be expanded to cover most of the (over 100 million) representative sequences from the UniRef90 data set.

...read moreread less

2,008 citations

Journal Article•DOI•

ColabFold: making protein folding accessible to all

[...]

Milot Mirdita¹, Tatiana Valdez Bubnova², Oi Wah Liew³•Institutions (3)

Seoul National University¹, Harvard University², University of Göttingen³

30 May 2022-Nature Methods

TL;DR: ColabFold as discussed by the authors combines the fast homology search of MMseqs2 with AlphaFold2 or RoseTTAFold for protein folding and achieves 40-60fold faster search and optimized model utilization.

...read moreread less

Abstract: ColabFold offers accelerated prediction of protein structures and complexes by combining the fast homology search of MMseqs2 with AlphaFold2 or RoseTTAFold. ColabFold's 40-60-fold faster search and optimized model utilization enables prediction of close to 1,000 structures per day on a server with one graphics processing unit. Coupled with Google Colaboratory, ColabFold becomes a free and accessible platform for protein folding. ColabFold is open-source software available at https://github.com/sokrypton/ColabFold and its novel environmental databases are available at https://colabfold.mmseqs.com .

...read moreread less

1,553 citations

Journal Article•DOI•

PROTAC targeted protein degraders: the past is prologue

[...]

Miklós Békés, David Lange, Craig M. Crews

18 Jan 2022-Nature Reviews Drug Discovery

TL;DR: Targeted protein degradation with proteolysis-targeting chimeras (PROTACs) has the potential to tackle disease-causing proteins that have historically been highly challenging to target with conventional small molecules as mentioned in this paper .

...read moreread less

Abstract: Targeted protein degradation (TPD) is an emerging therapeutic modality with the potential to tackle disease-causing proteins that have historically been highly challenging to target with conventional small molecules. In the 20 years since the concept of a proteolysis-targeting chimera (PROTAC) molecule harnessing the ubiquitin–proteasome system to degrade a target protein was reported, TPD has moved from academia to industry, where numerous companies have disclosed programmes in preclinical and early clinical development. With clinical proof-of-concept for PROTAC molecules against two well-established cancer targets provided in 2020, the field is poised to pursue targets that were previously considered ‘undruggable’. In this Review, we summarize the first two decades of PROTAC discovery and assess the current landscape, with a focus on industry activity. We then discuss key areas for the future of TPD, including establishing the target classes for which TPD is most suitable, expanding the use of ubiquitin ligases to enable precision medicine and extending the modality beyond oncology. Targeted protein degradation with proteolysis-targeting chimeras (PROTACs) has the potential to tackle disease-causing proteins that have historically been highly challenging to target with conventional small molecules. This article summarizes the first two decades of PROTAC discovery and discusses key areas for the future of this therapeutic modality, including establishing the target classes for which it is most suitable and extending its application beyond oncology.

...read moreread less

527 citations

Journal Article•DOI•

Harnessing protein folding neural networks for peptide–protein docking

[...]

Tomer Tsaban, Julia Varga, Orly Avraham, Ziv Ben-Aharon, Alisa Khramushin, Ora Schueler-Furman - Show less +2 more

10 Jan 2022-Nature Communications

TL;DR: For example, AlphaFold2 as discussed by the authors generates peptide-protein complex models without requiring multiple sequence alignment information for the peptide partner, and can handle binding-induced conformational changes of the receptor.

...read moreread less

Abstract: Highly accurate protein structure predictions by deep neural networks such as AlphaFold2 and RoseTTAFold have tremendous impact on structural biology and beyond. Here, we show that, although these deep learning approaches have originally been developed for the in silico folding of protein monomers, AlphaFold2 also enables quick and accurate modeling of peptide-protein interactions. Our simple implementation of AlphaFold2 generates peptide-protein complex models without requiring multiple sequence alignment information for the peptide partner, and can handle binding-induced conformational changes of the receptor. We explore what AlphaFold2 has memorized and learned, and describe specific examples that highlight differences compared to state-of-the-art peptide docking protocol PIPER-FlexPepDock. These results show that AlphaFold2 holds great promise for providing structural insight into a wide range of peptide-protein complexes, serving as a starting point for the detailed characterization and manipulation of these interactions.

...read moreread less

390 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Gene Ontology: tool for the unification of biology

[...]

M Ashburner¹, Catherine A. Ball, Judith A. Blake, David Botstein, Heather Butler, J. M. Cherry, Allan Peter Davis, Kara Dolinski, Selina S. Dwight, J.T. Eppig, Midori A. Harris, David P. Hill, Laurie Issel-Tarver, Andrew Kasarskis, Suzanna E. Lewis, John C. Matese, Joel E. Richardson, M. Ringwald, Gerald M. Rubin, Gavin Sherlock - Show less +16 more•Institutions (1)

Stanford University¹

01 May 2000-Nature Genetics

TL;DR: The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing.

...read moreread less

Abstract: Genomic sequencing has made it clear that a large fraction of the genes specifying the core biological functions are shared by all eukaryotes. Knowledge of the biological role of such shared proteins in one organism can often be transferred to other organisms. The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing. To this end, three independent ontologies accessible on the World-Wide Web (http://www.geneontology.org) are being constructed: biological process, molecular function and cellular component.

...read moreread less

35,225 citations

Journal Article•DOI•

Initial sequencing and analysis of the human genome.

[...]

Eric S. Lander¹, Lauren Linton¹, Bruce W. Birren¹, Chad Nusbaum¹ +245 more•Institutions (29)

15 Feb 2001-Nature

TL;DR: The results of an international collaboration to produce and make freely available a draft sequence of the human genome are reported and an initial analysis is presented, describing some of the insights that can be gleaned from the sequence.

...read moreread less

Abstract: The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.

...read moreread less

22,269 citations

Journal Article•DOI•

AutoDock Vina: Improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading

[...]

Oleg Trott¹, Arthur J. Olson¹•Institutions (1)

Scripps Research Institute¹

04 Jun 2009-Journal of Computational Chemistry

TL;DR: AutoDock Vina achieves an approximately two orders of magnitude speed‐up compared with the molecular docking software previously developed in the lab, while also significantly improving the accuracy of the binding mode predictions, judging by tests on the training set used in AutoDock 4 development.

...read moreread less

Abstract: AutoDock Vina, a new program for molecular docking and virtual screening, is presented. AutoDock Vina achieves an approximately two orders of magnitude speed-up compared with the molecular docking software previously developed in our lab (AutoDock 4), while also significantly improving the accuracy of the binding mode predictions, judging by our tests on the training set used in AutoDock 4 development. Further speed-up is achieved from parallelism, by using multithreading on multicore machines. AutoDock Vina automatically calculates the grid maps and clusters the results in a way transparent to the user.

...read moreread less

20,059 citations

Journal Article•DOI•

AutoDock4 and AutoDockTools4: Automated docking with selective receptor flexibility

[...]

Garrett M. Morris¹, Ruth Huey¹, William Lindstrom¹, Michel F. Sanner¹, Richard K. Belew², David S. Goodsell¹, Arthur J. Olson¹ - Show less +3 more•Institutions (2)

Scripps Research Institute¹, University of California, San Diego²

01 Dec 2009-Journal of Computational Chemistry

TL;DR: AutoDock4 incorporates limited flexibility in the receptor and its utility in analysis of covalently bound ligands is reported, using both a grid‐based docking method and a modification of the flexible sidechain technique.

...read moreread less

Abstract: We describe the testing and release of AutoDock4 and the accompanying graphical user interface AutoDockTools. AutoDock4 incorporates limited flexibility in the receptor. Several tests are reported here, including a redocking experiment with 188 diverse ligand-protein complexes and a cross-docking experiment using flexible sidechains in 87 HIV protease complexes. We also report its utility in analysis of covalently bound ligands, using both a grid-based docking method and a modification of the flexible sidechain technique.

...read moreread less

15,616 citations