scispace - formally typeset
Search or ask a question
Journal ArticleDOI

The coming of age of de novo protein design

15 Sep 2016-Nature (Nature Publishing Group)-Vol. 537, Iss: 7620, pp 320-327
TL;DR: De novo protein design explores the full sequence space, guided by the physical principles that underlie protein folding, to design new functional proteins from the ground up to tackle current challenges in biomedicine and nanotechnology.
Abstract: There are 20(200) possible amino-acid sequences for a 200-residue protein, of which the natural evolutionary process has sampled only an infinitesimal subset. De novo protein design explores the full sequence space, guided by the physical principles that underlie protein folding. Computational methodology has advanced to the point that a wide range of structures can be designed from scratch with atomic-level accuracy. Almost all protein engineering so far has involved the modification of naturally occurring proteins; it should now be possible to design new functional proteins from the ground up to tackle current challenges in biomedicine and nanotechnology.
Citations
More filters
Journal ArticleDOI
22 Jul 2021-Nature
TL;DR: The AlphaFold2 dataset as discussed by the authors is a large-scale and high-accuracy structure prediction dataset for protein structures, which is used to evaluate the structural properties of proteins.
Abstract: Protein structures can provide invaluable information, both for reasoning about biological processes and for enabling interventions such as structure-based drug development or targeted mutagenesis. After decades of effort, 17% of the total residues in human protein sequences are covered by an experimentally-determined structure1. Here we dramatically expand structural coverage by applying the state-of-the-art machine learning method, AlphaFold2, at scale to almost the entire human proteome (98.5% of human proteins). The resulting dataset covers 58% of residues with a confident prediction, of which a subset (36% of all residues) have very high confidence. We introduce several metrics developed by building on the AlphaFold model, and use them to interpret the dataset, identifying strong multi-domain predictions as well as regions likely to be disordered. Finally, we provide some case studies illustrating how high-quality predictions may be used to generate biological hypotheses. Importantly, we are making our predictions freely available to the community via a public database (hosted by the European Bioinformatics Institute at https://alphafold.ebi.ac.uk/ ). We anticipate that routine large-scale and high-accuracy structure prediction will become an important tool, allowing new questions to be addressed from a structural perspective.

1,238 citations

Journal ArticleDOI
TL;DR: Deep learning is applied to unlabeled amino-acid sequences to distill the fundamental features of a protein into a statistical representation that is semantically rich and structurally, evolutionarily and biophysically grounded and broadly applicable to unseen regions of sequence space.
Abstract: Rational protein engineering requires a holistic understanding of protein function. Here, we apply deep learning to unlabeled amino-acid sequences to distill the fundamental features of a protein into a statistical representation that is semantically rich and structurally, evolutionarily and biophysically grounded. We show that the simplest models built on top of this unified representation (UniRep) are broadly applicable and generalize to unseen regions of sequence space. Our data-driven approach predicts the stability of natural and de novo designed proteins, and the quantitative function of molecularly diverse mutants, competitively with the state-of-the-art methods. UniRep further enables two orders of magnitude efficiency improvement in a protein engineering task. UniRep is a versatile summary of fundamental protein features that can be applied across protein engineering informatics.

560 citations

Journal ArticleDOI
TL;DR: The steps required to build machine-learning sequence–function models and to use those models to guide engineering are introduced and the underlying principles of this engineering paradigm are illustrated with the help of case studies.
Abstract: Protein engineering through machine-learning-guided directed evolution enables the optimization of protein functions. Machine-learning approaches predict how sequence maps to function in a data-driven manner without requiring a detailed model of the underlying physics or biological pathways. Such methods accelerate directed evolution by learning from the properties of characterized variants and using that information to select sequences that are likely to exhibit improved properties. Here we introduce the steps required to build machine-learning sequence-function models and to use those models to guide engineering, making recommendations at each stage. This review covers basic concepts relevant to the use of machine learning for protein engineering, as well as the current literature and applications of this engineering paradigm. We illustrate the process with two case studies. Finally, we look to future opportunities for machine learning to enable the discovery of unknown protein functions and uncover the relationship between protein sequence and function.

527 citations

Journal ArticleDOI
TL;DR: The intent is to provide a comprehensive overview of all work in the field up to December 2016, organized according to reaction class, which allows for comparison of similar reactions catalyzed by ArMs constructed using different metallocofactor anchoring strategies, cofactors, protein scaffolds, and mutagenesis strategies.
Abstract: The incorporation of a synthetic, catalytically competent metallocofactor into a protein scaffold to generate an artificial metalloenzyme (ArM) has been explored since the late 1970’s. Progress in the ensuing years was limited by the tools available for both organometallic synthesis and protein engineering. Advances in both of these areas, combined with increased appreciation of the potential benefits of combining attractive features of both homogeneous catalysis and enzymatic catalysis, led to a resurgence of interest in ArMs starting in the early 2000’s. Perhaps the most intriguing of potential ArM properties is their ability to endow homogeneous catalysts with a genetic memory. Indeed, incorporating a homogeneous catalyst into a genetically encoded scaffold offers the opportunity to improve ArM performance by directed evolution. This capability could, in turn, lead to improvements in ArM efficiency similar to those obtained for natural enzymes, providing systems suitable for practical applications and ...

504 citations

Journal ArticleDOI
TL;DR: Improvements in computational algorithms and technological advances have dramatically increased the accuracy and speed of protein structure modelling, providing novel opportunities for controlling protein function, with potential applications in biomedicine, industry and research.
Abstract: The prediction of protein three-dimensional structure from amino acid sequence has been a grand challenge problem in computational biophysics for decades, owing to its intrinsic scientific interest and also to the many potential applications for robust protein structure prediction algorithms, from genome interpretation to protein function prediction. More recently, the inverse problem - designing an amino acid sequence that will fold into a specified three-dimensional structure - has attracted growing attention as a potential route to the rational engineering of proteins with functions useful in biotechnology and medicine. Methods for the prediction and design of protein structures have advanced dramatically in the past decade. Increases in computing power and the rapid growth in protein sequence and structure databases have fuelled the development of new data-intensive and computationally demanding approaches for structure prediction. New algorithms for designing protein folds and protein-protein interfaces have been used to engineer novel high-order assemblies and to design from scratch fluorescent proteins with novel or enhanced properties, as well as signalling proteins with therapeutic potential. In this Review, we describe current approaches for protein structure prediction and design and highlight a selection of the successful applications they have enabled.

462 citations

References
More filters
Journal ArticleDOI
23 Jan 2003-Nature
TL;DR: The specific bonding of DNA base pairs provides the chemical foundation for genetics and this powerful molecular recognition system can be used in nanotechnology to direct the assembly of highly structured materials with specific nanoscale features, as well as in DNA computation to process complex information.
Abstract: The specific bonding of DNA base pairs provides the chemical foundation for genetics. This powerful molecular recognition system can be used in nanotechnology to direct the assembly of highly structured materials with specific nanoscale features, as well as in DNA computation to process complex information. The exploitation of DNA for material purposes presents a new chapter in the history of the molecule.

2,528 citations

Journal ArticleDOI
21 May 2009-Nature
TL;DR: G-protein-coupled receptors mediate most of the authors' physiological responses to hormones, neurotransmitters and environmental stimulants, and so have great potential as therapeutic targets for a broad spectrum of diseases.
Abstract: G-protein-coupled receptors (GPCRs) mediate most of our physiological responses to hormones, neurotransmitters and environmental stimulants, and so have great potential as therapeutic targets for a broad spectrum of diseases. They are also fascinating molecules from the perspective of membrane-protein structure and biology. Great progress has been made over the past three decades in understanding diverse GPCRs, from pharmacology to functional characterization in vivo. Recent high-resolution structural studies have provided insights into the molecular mechanisms of GPCR activation and constitutive activity.

1,965 citations

Book ChapterDOI
TL;DR: This chapter describes the requirements for the ROSETTA molecular modeling program's new architecture, justifies the design decisions, sketches out central classes, and highlights a few of the common tasks that the new software can perform.
Abstract: We have recently completed a full re-architecturing of the ROSETTA molecular modeling program, generalizing and expanding its existing functionality. The new architecture enables the rapid prototyping of novel protocols by providing easy-to-use interfaces to powerful tools for molecular modeling. The source code of this rearchitecturing has been released as ROSETTA3 and is freely available for academic use. At the time of its release, it contained 470,000 lines of code. Counting currently unpublished protocols at the time of this writing, the source includes 1,285,000 lines. Its rapid growth is a testament to its ease of use. This chapter describes the requirements for our new architecture, justifies the design decisions, sketches out central classes, and highlights a few of the common tasks that the new software can perform.

1,676 citations

Journal ArticleDOI
21 Nov 2003-Science
TL;DR: A general computational strategy that iterates between sequence design and structure prediction to design a 93-residue α/β protein called Top7 with a novel sequence and topology, found experimentally to be folded and extremely stable.
Abstract: A major challenge of computational protein design is the creation of novel proteins with arbitrarily chosen three-dimensional structures. Here, we used a general computational strategy that iterates between sequence design and structure prediction to design a 93-residue α/β protein called Top7 with a novel sequence and topology. Top7 was found experimentally to be folded and extremely stable, and the x-ray crystal structure of Top7 is similar (root mean square deviation equals 1.2 angstroms) to the design model. The ability to design a new protein fold makes possible the exploration of the large regions of the protein universe not yet observed in nature.

1,595 citations

Journal ArticleDOI
TL;DR: In this paper, the two-strand rope and three-stranded rope models were described and used to illustrate the diffraction theory already developed, and it was shown that they would give a diffuse pattern.
Abstract: It is shown in this paper by Crick that when -helices of the same sense pack together they will probably do so about 20° away from parallel. For very long chains this may lead to a coiled-coil. The two simplest models - the two-strand rope and the three-strand rope - are described, and used to illustrate the diffraction theory already developed. It is shown that they would give a diffuse -pattern. Possible examples of these models are briefly discussed.

1,518 citations

Related Papers (5)