AlphaFold: Improved protein structure prediction using1
potentials from deep learning2
Andrew W. Senior
1∗
, Richard Evans
1∗
, John Jumper
1∗
, James Kirkpatrick
1∗
, Laurent Sifre
1∗
, Tim Green
1
,3
Chongli Qin
1
, Augustin
ˇ
Z
´
ıdek
1
, Alexander W. R. Nelson
1
, Alex Bridgland
1
, Hugo Penedones
1
,4
Stig Petersen
1
, Karen Simonyan
1
, Steve Crossan
1
, Pushmeet Kohli
1
, David T. Jones
2,3
, David Silver
1
,5
Koray Kavukcuoglu
1
, Demis Hassabis
1
6
1
DeepMind, London, UK7
2
The Francis Crick Institute, London, UK8
3
University College London, London, UK9
∗
These authors contributed equally to this work.10
Protein structure prediction aims to determine the three-dimensional shape of a protein from11
its amino acid sequence
1
. This problem is of fundamental importance to biology as the struc-12
ture of a protein largely determines its function
2
but can be hard to determine experimen-13
tally. In recent years, considerable progress has been made by leveraging genetic informa-14
tion: analysing the co-variation of homologous sequences can allow one to infer which amino15
acid residues are in contact, which in turn can aid structure prediction
3
. In this work, we16
show that we can train a neural network to accurately predict the distances between pairs17
of residues in a protein which convey more about structure than contact predictions. With18
this information we construct a potential of mean force
4
that can accurately describe the19
shape of a protein. We find that the resulting potential can be optimised by a simple gradient20
descent algorithm, to realise structures without the need for complex sampling procedures.21
The resulting system, named AlphaFold, has been shown to achieve high accuracy, even for22
sequences with relatively few homologous sequences. In the most recent Critical Assessment23
of Protein Structure Prediction
5
(CASP13), a blind assessment of the state of the field of pro-24
tein structure prediction, AlphaFold created high-accuracy structures (with TM-scores
†
of25
0.7 or higher) for 24 out of 43 free modelling domains whereas the next best method, using26
sampling and contact information, achieved such accuracy for only 14 out of 43 domains.27
AlphaFold represents a significant advance in protein structure prediction. We expect the in-28
creased accuracy of structure predictions for proteins to enable insights in understanding the29
function and malfunction of these proteins, especially in cases where no homologous proteins30
have been experimentally determined
7
.31
Proteins are at the core of most biological processes. Since the function of a protein is32
dependent on its structure, understanding protein structure has been a grand challenge in biology33
for decades. While several experimental structure determination techniques have been developed34
†
Template Modelling score
6
, between 0 and 1, measures the degree of match of the overall (backbone) shape of a
proposed structure to a native structure.
1
and improved in accuracy, they remain difficult and time-consuming
2
. As a result, decades of35
theoretical work has attempted to predict protein structure from amino acid sequences.36
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
TM-score Cutoff
0
5
10
15
20
25
30
35
40
45
FM Domain Count
AlphaFold
Other groups
a
T0953s2-D3
T0968s2-D1
T0990-D1
T0990-D2
T0990-D3
T1017s2-D1
Target
0.0
0.2
0.4
0.6
0.8
1.0
TM-score
b
Contact precisions L long L/2 long L/5 long
Set N AF 498 032 AF 498 032 AF 498 032
FM 31 45.5 42.9 39.8 58.0 55.1 51.7 70.1 67.3 61.6
FM/TBM 12 59.1 53.0 48.9 74.2 64.5 64.2 85.3 81.0 79.6
TBM 61 68.3 65.5 61.9 82.4 80.3 76.4 90.6 90.5 87.1
c
Fig. 1 | AlphaFold’s performance in the CASP13 assessment. (a) Number of free modelling
(FM + FM/TBM) domains predicted to a given TM-score threshold for AlphaFold and the other
97 groups. (b) For the six new folds identified by the CASP13 assessors, AlphaFold’s TM-score
compared with the other groups, with native structures. The structure of T1017s2-D1 is unavailable
for publication. (c) Precisions for long-range contact prediction in CASP13 for the most probable
L, L/2 or L/5 contacts, where L is the length of the domain. The distance distributions used by
AlphaFold (AF) in CASP13, thresholded to contact predictions, are compared with submissions
by the two best-ranked contact prediction methods in CASP13: 498 (RaptorX-Contact
8
) and 032
(TripletRes
9
), on “all groups” targets, excluding T0999.
CASP
5
is a biennial blind protein structure prediction assessment run by the structure pre-37
diction community to benchmark progress in accuracy. In 2018, AlphaFold joined 97 groups from38
around the world in entering CASP13. Each group submitted up to 5 structure predictions for39
each of 84 protein sequences whose experimentally-determined structures were sequestered. As-40
sessors divided the proteins into 104 domains for scoring and classified each as being amenable41
to template-based modelling (TBM, where a protein with a similar sequence has a known struc-42
ture, and that homologous structure is modified in accordance with the sequence differences) or43
requiring free modelling (FM, when no homologous structure is available), with an intermediate44
(FM/TBM) category. Figure 1a shows that AlphaFold stands out in performance above the other45
entrants, predicting more FM domains to high accuracy than any other system, particularly in the46
2
0.6–0.7 TM-score range. The assessors ranked the 98 participating groups by the summed, capped47
z-scores of the structures, separated according to category. AlphaFold achieved a summed z-score48
of 52.8 in the FM category (best-of-5) vs 36.6 for the next closest group (322)
‡
. Combining FM49
and TBM/FM categories, AlphaFold scored 68.3 vs 48.2. AlphaFold is able to predict previously50
unknown folds to high accuracy as shown in Figure 1b. Despite using only free modelling tech-51
niques and not using templates, AlphaFold also scored well in the TBM category according to the52
assessors’ formula 0-capped z-score, ranking fourth by the top-1 model or first by the best-of-553
models. Much of the accuracy of AlphaFold is due to the accuracy of the distance predictions,54
which is evident from the high precision of the contact predictions of Table 1c.55
The most successful free modelling approaches so far
10–12
have relied on fragment assembly56
to determine the shape of the protein of interest. In these approaches a structure is created through57
a stochastic sampling process, such as simulated annealing
13
, that minimises a statistical potential58
derived from summary statistics extracted from structures in the Protein Data Bank (PDB
14
). In59
fragment assembly, a structure hypothesis is repeatedly modified, typically by changing the shape60
of a short section, retaining changes which lower the potential, ultimately leading to low potential61
structures. Simulated annealing requires many thousands of such moves and must be repeated62
many times to have good coverage of low-potential structures.63
In recent years, structure prediction accuracy has improved through the use of evolutionary64
covariation data
15
found in sets of related sequences. Sequences similar to the target sequence65
are found by searching large datasets of protein sequences derived from DNA sequencing and66
aligned to the target sequence to make a multiple sequence alignment (MSA). Correlated changes67
in two amino acid residue positions across the sequences of the MSA can be used to infer which68
residues might be in contact. Contacts are typically defined to occur when the β-carbon atoms of69
two residues are within 8
˚
Angstr
¨
om of one another. Several methods have been used to predict70
the probability that a pair of residues is in contact based on features computed from MSAs
16–19
71
including neural networks
20–23
. Contact predictions are incorporated in structure prediction by72
modifying the statistical potential to guide the folding process to structures that satisfy more of the73
predicted contacts
12,24
. Previous work
25,26
has made predictions of the distance between residues,74
particularly for distance geometry approaches
8,27–29
. Neural network distance predictions without75
covariation features were used to make the EPAD potential
26
which was used for ranking struc-76
ture hypotheses and the QUARK pipeline
12
used a template-based distance profile restraint for77
template-based modelling.78
In this work we present a new, deep-learning, approach to protein structure prediction, whose79
stages are illustrated in Figure 2a. We show that it is possible to construct a learned, protein-specific80
potential by training a neural network (Fig. 2b) to make accurate predictions about the structure81
of the protein given its sequence, and to predict the structure itself accurately by minimising the82
‡
Results from http://predictioncenter.org/casp13/zscores_final.cgi?formula=
assessors
3
0
280
600
1200
Sequence
& MSA
features
Distance & torsion
distribution predictions
Gradient descent on
protein-specific potential
Deep neural
network
c
a
LxL 2D Covariation features
Tiled Lx1 1D sequence & profile features
b
220 Residual convolution blocks
64
64
d
j
i
e
64 bins deep
500
Fig. 2 | The folding process illustrated for CASP13 target T0986s2. (Length L = 155) (a)
Steps of structure prediction. (b) The neural network predicts the entire L × L distogram based
on MSA features, accumulating separate predictions for 64 × 64-residue regions. (c) One iteration
of gradient descent (1 200 steps) is shown, with TM-score and RMSD plotted against step number
with five snapshots of the structure. The secondary structure (from SST
30
) is also shown (helix
in blue, strand in red) along with the the native secondary structure (SS), the network’s secondary
structure prediction probabilities and the uncertainty in torsion angle predictions (as κ
−1
of the
von Mises distributions fitted to the predictions for φ and ψ). While each step of gradient descent
greedily lowers the potential, large global conformation changes are effected, resulting in a well-
packed chain. (d) shows the final first submission overlaid on the native structure (in grey). (e)
shows the average (across the test set, n = 377) TM-score of the lowest-potential structure against
the number of repeats of gradient descent (log scale).
4
potential by gradient descent (Fig. 2c). The neural network predictions include backbone torsion83
angles and pairwise distances between residues. Distance predictions provide more specific in-84
formation about the structure than contact predictions and provide a richer training signal for the85
neural network. Predicting distances, rather than contacts as in most prior work, models detailed86
interactions rather than simple binary decisions. By jointly predicting many distances, the network87
can propagate distance information respecting covariation, local structure and residue identities to88
nearby residues. The predicted probability distributions can be combined to form a simple, prin-89
cipled protein-specific potential. We show that with gradient descent, it is simple to find a set of90
torsion angles that minimise this protein-specific potential using only limited sampling. We also91
show that whole chains can be optimised together, avoiding the need for segmenting long proteins92
into hypothesised domains which are modelled independently.93
The central component of AlphaFold is a convolutional neural network which is trained94
on PDB structures to predict the distances d
ij
between the C
β
atoms of pairs, ij, of a protein’s95
residues. Based on a representation of the protein’s amino acid sequence, S, and features derived96
from the sequence’s MSA, the network, similar in structure to those used for image recognition97
tasks
31
, predicts a discrete probability distribution P (d
ij
| S, MSA(S)) for every ij pair in a98
64 × 64 residue region, as shown in Fig. 2b. The full set of distance distribution predictions99
is constructed by averaging predictions for overlapping regions and is termed a distogram (from100
distance histogram). Figure 3 shows an example distogram prediction for one CASP protein,101
T0955. The modes of the distribution (Fig. 3c) can be seen to closely match the true distances102
(Fig. 3b). Example distributions for all distances to one residue (29) are shown in Fig. 3c. Further103
analysis of how the network predicts the distances is shown in Methods Figure 14.104
In order to realise structures that conform to the distance predictions, we construct a smooth105
potential V
distance
by fitting a spline to the negative log probabilities, and summing across all the106
residue pairs. We parameterise protein structures by the backbone torsion angles (φ, ψ) of all107
residues and build a differentiable model of protein geometry x = G(φ, ψ) to compute the C
β
108
coordinates, x, and thus the inter-residue distances, d
ij
= kx
i
− x
j
k, for each structure, and109
express V
distance
as a function of φ and ψ. For a protein with L residues, this potential accumulates110
L
2
terms from marginal distribution predictions. To correct for the over-representation of the111
prior we subtract a reference distribution
32
from the distance potential in the log domain. The112
reference distribution models the distance distributions P (d
ij
| length) independent of the protein113
sequence and is computed by training a small version of the distance prediction neural network on114
the same structures, without sequence or MSA input features. A separate output head of the contact115
prediction network is trained to predict discrete probability distributions of backbone torsion angles116
P (φ
i
, ψ
i
| S, MSA(S)). After fitting a von Mises distribution, this is used to add a smooth torsion117
modelling term V
torsion
= −
P
log p
vonMises
(φ
i
, ψ
i
| S, MSA(S)) to the potential. Finally, to118
prevent steric clashes, we add Rosetta’s V
score2 smooth
10
to the potential, as this incorporates a van119
der Waals term. We used multiplicative weights for each of the three terms in the potential, but no120
weighting noticeably outperformed equal weighting.121
5