The coming of age of de novo protein design

doi:10.1038/NATURE19946

Home
/
Papers
/
The coming of age of de novo protein design

Journal Article•DOI•

The coming of age of de novo protein design

Po-Ssu Huang¹, Scott E. Boyken¹, David Baker¹•Institutions (1)

University of Washington¹

15 Sep 2016-Nature (Nature Publishing Group)-Vol. 537, Iss: 7620, pp 320-327

TL;DR: De novo protein design explores the full sequence space, guided by the physical principles that underlie protein folding, to design new functional proteins from the ground up to tackle current challenges in biomedicine and nanotechnology.

read less

Abstract: There are 20(200) possible amino-acid sequences for a 200-residue protein, of which the natural evolutionary process has sampled only an infinitesimal subset. De novo protein design explores the full sequence space, guided by the physical principles that underlie protein folding. Computational methodology has advanced to the point that a wide range of structures can be designed from scratch with atomic-level accuracy. Almost all protein engineering so far has involved the modification of naturally occurring proteins; it should now be possible to design new functional proteins from the ground up to tackle current challenges in biomedicine and nanotechnology.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Highly accurate protein structure prediction for the human proteome

[...]

Kathryn Tunyasuvunakool, Jonas Adler, Zachary Wu, Tim Green, Michal Zielinski, Augustin Žídek, Alex Bridgland, Andrew Cowie, Clemens Meyer, Agata Laydon, Sameer Velankar¹, Gerard J. Kleywegt¹, Alex Bateman¹, Richard Evans, Alexander Pritzel, Michael Figurnov, Olaf Ronneberger, Russell Bates, Simon A. A. Kohl, Anna Potapenko, Andrew J. Ballard, Bernardino Romera-Paredes, Stanislav Nikolov, R. D. Jain, Ellen Clancy, David Reiman, Stig Petersen, Andrew W. Senior, Koray Kavukcuoglu, Ewan Birney¹, Pushmeet Kohli, John M. Jumper, Demis Hassabis - Show less +29 more•Institutions (1)

European Bioinformatics Institute¹

22 Jul 2021-Nature

TL;DR: The AlphaFold2 dataset as discussed by the authors is a large-scale and high-accuracy structure prediction dataset for protein structures, which is used to evaluate the structural properties of proteins.

...read moreread less

Abstract: Protein structures can provide invaluable information, both for reasoning about biological processes and for enabling interventions such as structure-based drug development or targeted mutagenesis. After decades of effort, 17% of the total residues in human protein sequences are covered by an experimentally-determined structure1. Here we dramatically expand structural coverage by applying the state-of-the-art machine learning method, AlphaFold2, at scale to almost the entire human proteome (98.5% of human proteins). The resulting dataset covers 58% of residues with a confident prediction, of which a subset (36% of all residues) have very high confidence. We introduce several metrics developed by building on the AlphaFold model, and use them to interpret the dataset, identifying strong multi-domain predictions as well as regions likely to be disordered. Finally, we provide some case studies illustrating how high-quality predictions may be used to generate biological hypotheses. Importantly, we are making our predictions freely available to the community via a public database (hosted by the European Bioinformatics Institute at https://alphafold.ebi.ac.uk/ ). We anticipate that routine large-scale and high-accuracy structure prediction will become an important tool, allowing new questions to be addressed from a structural perspective.

...read moreread less

1,238 citations

Journal Article•DOI•

Unified rational protein engineering with sequence-based deep representation learning

[...]

Ethan C. Alley¹, Grigory Khimulya, Surojit Biswas¹, Surojit Biswas², Mohammed AlQuraishi², George M. Church¹, George M. Church² - Show less +3 more•Institutions (2)

Wyss Institute for Biologically Inspired Engineering¹, Harvard University²

21 Oct 2019-Nature Methods

TL;DR: Deep learning is applied to unlabeled amino-acid sequences to distill the fundamental features of a protein into a statistical representation that is semantically rich and structurally, evolutionarily and biophysically grounded and broadly applicable to unseen regions of sequence space.

...read moreread less

Abstract: Rational protein engineering requires a holistic understanding of protein function. Here, we apply deep learning to unlabeled amino-acid sequences to distill the fundamental features of a protein into a statistical representation that is semantically rich and structurally, evolutionarily and biophysically grounded. We show that the simplest models built on top of this unified representation (UniRep) are broadly applicable and generalize to unseen regions of sequence space. Our data-driven approach predicts the stability of natural and de novo designed proteins, and the quantitative function of molecularly diverse mutants, competitively with the state-of-the-art methods. UniRep further enables two orders of magnitude efficiency improvement in a protein engineering task. UniRep is a versatile summary of fundamental protein features that can be applied across protein engineering informatics.

...read moreread less

560 citations

Journal Article•DOI•

Machine-learning-guided directed evolution for protein engineering.

[...]

Kevin K. Yang¹, Zachary Wu¹, Frances H. Arnold¹•Institutions (1)

California Institute of Technology¹

15 Jul 2019-Nature Methods

TL;DR: The steps required to build machine-learning sequence–function models and to use those models to guide engineering are introduced and the underlying principles of this engineering paradigm are illustrated with the help of case studies.

...read moreread less

Abstract: Protein engineering through machine-learning-guided directed evolution enables the optimization of protein functions. Machine-learning approaches predict how sequence maps to function in a data-driven manner without requiring a detailed model of the underlying physics or biological pathways. Such methods accelerate directed evolution by learning from the properties of characterized variants and using that information to select sequences that are likely to exhibit improved properties. Here we introduce the steps required to build machine-learning sequence-function models and to use those models to guide engineering, making recommendations at each stage. This review covers basic concepts relevant to the use of machine learning for protein engineering, as well as the current literature and applications of this engineering paradigm. We illustrate the process with two case studies. Finally, we look to future opportunities for machine learning to enable the discovery of unknown protein functions and uncover the relationship between protein sequence and function.

...read moreread less

527 citations

Journal Article•DOI•

Artificial Metalloenzymes: Reaction Scope and Optimization Strategies.

[...]

Fabian Schwizer¹, Yasunori Okamoto¹, Tillmann Heinisch¹, Yifan Gu², Michela M. Pellizzoni¹, Vincent Lebrun¹, Raphael Reuter¹, Valentin Köhler¹, Jared C. Lewis², Thomas R. Ward¹ - Show less +6 more•Institutions (2)

University of Basel¹, University of Chicago²

10 Jan 2018-Chemical Reviews

TL;DR: The intent is to provide a comprehensive overview of all work in the field up to December 2016, organized according to reaction class, which allows for comparison of similar reactions catalyzed by ArMs constructed using different metallocofactor anchoring strategies, cofactors, protein scaffolds, and mutagenesis strategies.

...read moreread less

Abstract: The incorporation of a synthetic, catalytically competent metallocofactor into a protein scaffold to generate an artificial metalloenzyme (ArM) has been explored since the late 1970’s. Progress in the ensuing years was limited by the tools available for both organometallic synthesis and protein engineering. Advances in both of these areas, combined with increased appreciation of the potential benefits of combining attractive features of both homogeneous catalysis and enzymatic catalysis, led to a resurgence of interest in ArMs starting in the early 2000’s. Perhaps the most intriguing of potential ArM properties is their ability to endow homogeneous catalysts with a genetic memory. Indeed, incorporating a homogeneous catalyst into a genetically encoded scaffold offers the opportunity to improve ArM performance by directed evolution. This capability could, in turn, lead to improvements in ArM efficiency similar to those obtained for natural enzymes, providing systems suitable for practical applications and ...

...read moreread less

504 citations

Journal Article•DOI•

Advances in protein structure prediction and design

[...]

Brian Kuhlman¹, Philip Bradley²•Institutions (2)

University of North Carolina at Chapel Hill¹, Fred Hutchinson Cancer Research Center²

15 Aug 2019-Nature Reviews Molecular Cell Biology

TL;DR: Improvements in computational algorithms and technological advances have dramatically increased the accuracy and speed of protein structure modelling, providing novel opportunities for controlling protein function, with potential applications in biomedicine, industry and research.

...read moreread less

Abstract: The prediction of protein three-dimensional structure from amino acid sequence has been a grand challenge problem in computational biophysics for decades, owing to its intrinsic scientific interest and also to the many potential applications for robust protein structure prediction algorithms, from genome interpretation to protein function prediction. More recently, the inverse problem - designing an amino acid sequence that will fold into a specified three-dimensional structure - has attracted growing attention as a potential route to the rational engineering of proteins with functions useful in biotechnology and medicine. Methods for the prediction and design of protein structures have advanced dramatically in the past decade. Increases in computing power and the rapid growth in protein sequence and structure databases have fuelled the development of new data-intensive and computationally demanding approaches for structure prediction. New algorithms for designing protein folds and protein-protein interfaces have been used to engineer novel high-order assemblies and to design from scratch fluorescent proteins with novel or enhanced properties, as well as signalling proteins with therapeutic potential. In this Review, we describe current approaches for protein structure prediction and design and highlight a selection of the successful applications they have enabled.

...read moreread less

462 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

DNA in a material world

[...]

Nadrian C. Seeman¹•Institutions (1)

New York University¹

23 Jan 2003-Nature

TL;DR: The specific bonding of DNA base pairs provides the chemical foundation for genetics and this powerful molecular recognition system can be used in nanotechnology to direct the assembly of highly structured materials with specific nanoscale features, as well as in DNA computation to process complex information.

...read moreread less

Abstract: The specific bonding of DNA base pairs provides the chemical foundation for genetics. This powerful molecular recognition system can be used in nanotechnology to direct the assembly of highly structured materials with specific nanoscale features, as well as in DNA computation to process complex information. The exploitation of DNA for material purposes presents a new chapter in the history of the molecule.

...read moreread less

2,528 citations

Journal Article•DOI•

The structure and function of G-protein-coupled receptors

[...]

Daniel M. Rosenbaum¹, Søren G. F. Rasmussen¹, Brian K. Kobilka¹•Institutions (1)

Stanford University¹

21 May 2009-Nature

TL;DR: G-protein-coupled receptors mediate most of the authors' physiological responses to hormones, neurotransmitters and environmental stimulants, and so have great potential as therapeutic targets for a broad spectrum of diseases.

...read moreread less

Abstract: G-protein-coupled receptors (GPCRs) mediate most of our physiological responses to hormones, neurotransmitters and environmental stimulants, and so have great potential as therapeutic targets for a broad spectrum of diseases. They are also fascinating molecules from the perspective of membrane-protein structure and biology. Great progress has been made over the past three decades in understanding diverse GPCRs, from pharmacology to functional characterization in vivo. Recent high-resolution structural studies have provided insights into the molecular mechanisms of GPCR activation and constitutive activity.

...read moreread less

1,965 citations

Book Chapter•DOI•

ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules.

[...]

Andrew Leaver-Fay¹, Michael D. Tyka², Steven M. Lewis¹, Oliver F. Lange², James Thompson², Ron Jacak¹, Kristian W. Kaufman³, P. Douglas Renfrew⁴, Colin A. Smith⁵, William Sheffler², Ian W. Davis, Seth Cooper², Adrien Treuille⁶, Daniel J. Mandell⁵, Florian Richter², Yih-En Andrew Ban, Sarel J. Fleishman², Jacob E. Corn², David E. Kim², Sergey Lyskov⁷, Monica Berrondo, Stuart Mentzer, Zoran Popović, James J. Havranek⁸, John Karanicolas⁹, Rhiju Das¹⁰, Jens Meiler³, Tanja Kortemme⁵, Jeffrey J. Gray⁷, Brian Kuhlman¹, David Baker², Philip Bradley¹¹ - Show less +28 more•Institutions (11)

University of North Carolina at Chapel Hill¹, University of Washington², Vanderbilt University³, New York University⁴, University of California, San Francisco⁵, Carnegie Mellon University⁶, Johns Hopkins University⁷, Washington University in St. Louis⁸, University of Kansas⁹, Stanford University¹⁰, Fred Hutchinson Cancer Research Center¹¹

01 Jan 2011-Methods in Enzymology

TL;DR: This chapter describes the requirements for the ROSETTA molecular modeling program's new architecture, justifies the design decisions, sketches out central classes, and highlights a few of the common tasks that the new software can perform.

...read moreread less

Abstract: We have recently completed a full re-architecturing of the ROSETTA molecular modeling program, generalizing and expanding its existing functionality. The new architecture enables the rapid prototyping of novel protocols by providing easy-to-use interfaces to powerful tools for molecular modeling. The source code of this rearchitecturing has been released as ROSETTA3 and is freely available for academic use. At the time of its release, it contained 470,000 lines of code. Counting currently unpublished protocols at the time of this writing, the source includes 1,285,000 lines. Its rapid growth is a testament to its ease of use. This chapter describes the requirements for our new architecture, justifies the design decisions, sketches out central classes, and highlights a few of the common tasks that the new software can perform.

...read moreread less

1,676 citations

Journal Article•DOI•

Design of a Novel Globular Protein Fold with Atomic-Level Accuracy

[...]

Brian Kuhlman¹, Gautam Dantas¹, Gregory C. Ireton², Gabriele Varani¹, Barry L. Stoddard², David Baker¹ - Show less +2 more•Institutions (2)

University of Washington¹, Fred Hutchinson Cancer Research Center²

21 Nov 2003-Science

TL;DR: A general computational strategy that iterates between sequence design and structure prediction to design a 93-residue α/β protein called Top7 with a novel sequence and topology, found experimentally to be folded and extremely stable.

...read moreread less

Abstract: A major challenge of computational protein design is the creation of novel proteins with arbitrarily chosen three-dimensional structures. Here, we used a general computational strategy that iterates between sequence design and structure prediction to design a 93-residue α/β protein called Top7 with a novel sequence and topology. Top7 was found experimentally to be folded and extremely stable, and the x-ray crystal structure of Top7 is similar (root mean square deviation equals 1.2 angstroms) to the design model. The ability to design a new protein fold makes possible the exploration of the large regions of the protein universe not yet observed in nature.

...read moreread less

1,595 citations

Journal Article•DOI•

The packing of α-helices: simple coiled-coils

[...]

Francis Crick

10 Sep 1953-Acta Crystallographica

TL;DR: In this paper, the two-strand rope and three-stranded rope models were described and used to illustrate the diffraction theory already developed, and it was shown that they would give a diffuse pattern.

...read moreread less

Abstract: It is shown in this paper by Crick that when -helices of the same sense pack together they will probably do so about 20° away from parallel. For very long chains this may lead to a coiled-coil. The two simplest models - the two-strand rope and the three-strand rope - are described, and used to illustrate the diffraction theory already developed. It is shown that they would give a diffuse -pattern. Possible examples of these models are briefly discussed.

...read moreread less

1,518 citations