Accurate prediction of protein structures and interactions using a three-track neural network
read more
Citations
AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models.
ColabFold: making protein folding accessible to all
Protein complex prediction with AlphaFold-Multimer
PROTAC targeted protein degraders: the past is prologue
Harnessing protein folding neural networks for peptide–protein docking
References
MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability
Clustal W and Clustal X version 2.0
Features and development of Coot.
Phaser crystallographic software
UniProt: the Universal Protein knowledgebase
Related Papers (5)
Highly accurate protein structure prediction with AlphaFold
Improved protein structure prediction using potentials from deep learning
UniProt: the universal protein knowledgebase in 2021
Frequently Asked Questions (17)
Q2. What are the contributions in "Accurate prediction of protein structures and interactions using a 3-track neural network" ?
The authors explored network architectures incorporating related ideas and obtained the best performance with a 3-track network in which information at the 1D sequence level, the 2D distance map level, and the 3D coordinate level is successively transformed and integrated. The 3-track network produces structure predictions with accuracies approaching those of DeepMind in CASP14, enables the rapid solution of challenging X-ray crystallography and cryo-EM structure modeling problems, and provides insights into the functions of proteins of currently unknown structure. The network also enables rapid generation of accurate protein-protein complex models from sequence information alone, short circuiting traditional approaches which require modeling of individual subunits followed by docking. One-Sentence Summary: Accurate protein structure modeling enables the rapid solution of protein structures and provides insights into function.
Q3. What is the way to predict structure?
Since the network can take as input templates of known structures, the authors experimented with a further coupling of 3D structural information and 1D sequence information by iteratively feeding the predicted structures back into the network as templates and random subsampling from the multiple sequence alignments to sample a broader range of models.
Q4. What is the final layer of the end-to-end version of their 3-track network?
The final layer of the end-to-end version of their 3-track network generates 3D structure models by combining features from discontinuous crops of the protein sequence (two segments of the protein with a chain break between them).
Q5. What is the role of TANGO2 in the metabolic process?
Deficiencies in TANGO2 (transport and Golgi organization protein 2) lead to metabolic disorders, and the protein plays an unknown role in Golgi membrane redistribution into the ER (16, 17).
Q6. Why did the authors not train the network directly?
Because of computer hardware memory limitations, the authors could not train models on large proteins directly as the 3-track models have many millions of parameters; instead, the authors presented to the network many discontinuous crops of the input sequence consisting of two discontinuous sequence segments spanning a total of 260 residues.
Q7. What is the method for generating all-atom models?
In the first, thepredicted residue-residue distance and orientation distributions are fed into pyRosetta (5) to generate all-atom models.
Q8. What is the role of TANGO2 in the ER?
Ntn superfamily members with structures similar to the RoseTTAFold model suggest that TANGO2 functions as an enzyme that might hydrolyze a carbon-nitrogen bond in a membrane component (18).
Q9. What is the way to solve the problem of atomic models?
Building atomic models of protein assemblies from cryo-EM maps can be challenging in the absence of homologs with known structures.
Q10. How many GPUs are used to make individual predictions?
DeepMind reported using several GPUs for days to make individual predictions, whereas their predictions are made in a single pass through the network in the same manner that would be used for a server; following sequence and template search (~1.5 hours), the end-to-end version of RoseTTAFold requires ~10 minutes on an RTX2080 GPU to generate backbone coordinates for proteins with less than 400 residues, and the pyRosetta version requires 5 minutes for network calculations on a single RTX2080 GPU and an hour for all-atom structure generation with 15 CPU cores.
Q11. What is the main reason for the increased accuracy of the RoseTTAFold method?
The increased prediction accuracy was critical for success in all cases, as models made with trRosetta did not yield MR solutions.
Q12. What was the method for generating the final 3D structures?
To generate final models, the authors combined and averaged the 1D features and 2D distance and orientation predictions produced for each of the crops and then used two approaches to generate final 3D structures.
Q13. What is the method for generating the backbone coordinates?
In the second, the averaged 1D and 2D features are fed into a final SE(3)-equivariant layer (6), and following end-to-end training from amino acid sequence to 3D coordinates, backbone coordinates are generated directly by the network (see Methods).
Q14. How many structures have a predicted lDDT?
Over one-third of these models have a predicted lDDT > 0.8, which corresponded to an average Cɑ-RMSD of 2.6 Å on CASP14 targets (fig. S8).
Q15. What is the main purpose of the network?
the network enables the direct building of structure models for protein-protein complexes from sequence information, short circuiting the standard procedure of building models for individual subunits and then carrying out rigid-body docking.
Q16. Why is the performance of the 3-track model less than that of the AlphaFold2 model?
Incomplete optimization due to computer memory limitations and neglect of side chain information likely explain the poorer performance of the end-to-end version compared to the pyRosetta version (Fig. 1B; the latter incorporates side chain information at the all-atom relaxation stage); since SE(3)-equivariant layers are used in the main body of the 3- track model, the added gain from the final SE(3) layer is likely less than in the AlphaFold2 case.
Q17. What is the structure of the TANGO2 fold?
The RoseTTAFold model of TANGO2 adopts an N-terminal nucleophile aminohydrolase (Ntn) fold (Fig. 3A) with well-aligned active site residues that are conserved in TANGO2 orthologs (Fig. 3B).