Predicting Binding from Screening Assays with Transformer Network Embeddings.
read more
Citations
Molecular representation learning with language models and domain-relevant auxiliary tasks.
Accurate predictions of aqueous solubility of drug molecules via the multilevel graph convolutional network (MGCN) and SchNet architectures.
Accurate predictions of drugs aqueous solubility via deep learning tools
Geometric Deep Learning on Molecular Representations
References
Adam: A Method for Stochastic Optimization
Long short-term memory
Deep learning
Deep Learning
Dropout: a simple way to prevent neural networks from overfitting
Related Papers (5)
Frequently Asked Questions (16)
Q2. What are the future works in "Predicting binding from screening assays with transformer network embeddings" ?
While overall accuracy was somewhat limited and varied per-target, these results suggest a promising direction for further research into the application of deep learning to direct modeling of assay experiment results as a computational screening aid to existing drug discovery pipelines. Data-driven models trained on the Transformer embeddings can be applied as a quick, inexpensive computational screening method to assist the early drug discovery process for targets where a functional assay has been designed.
Q3. What is the definition of an encoder layer?
An encoder layer consists of a self-attention operation which modifies each character vector based on its relation to other characters in the sequence, followed by a simple matrix multiplication and nonlinearity which is applied on each character vector individually.
Q4. How many epochs did the training take?
The learning rate during optimization begins at 0.001 and decreases two orders of magnitude, following half a period of a cosine function, over the course of a single pass, or epoch, over the 83,000,000 molecule training set.
Q5. What is the purpose of the transformer model?
The operations in the transformer model used to compute molecular embeddings are easily parallelizable on modern computing infrastructure(GPUs), enabling rapid screening of millions of molecules to assist wet-lab screening assays and other drug discovery pipelines.
Q6. How many IUPAC names are sourced from the PubChem database?
To train the Transformer network for translation, pairs of SMILES strings and IUPAC names are sourced directly from the PubChem compound database for 83,000,000 molecules.
Q7. What is the effect of learning a mapping of chemical space?
the authors found that learning a mapping of chemical space via a Transformer network achieved increased accuracy of data-driven models on multiple binding affinity prediction tasks compared to models trained on handdesigned or untrained representations.
Q8. How much time and resources do the authors need to spend on a docking model?
24 Though physics-based molecular docking models are less constrained than wet-lab screening approaches, they can still be computationally expensive and require significant time and/or resources.
Q9. How many vectors are used for each character?
In this case, the SMILES string for each molecule is converted to a random embedding of 512-dimensional vectors for each character.
Q10. How many properties were normalized between 0 and 1?
Numeric property values were normalized between 0 and 1 according to the minimum and maximum values of all screened molecules, on a per-dataset basis.
Q11. How many input neurons are needed to classify binding affinity?
The networks used to classify binding affinity are identical to the Transformer and random embedding networks, except only 20 input neurons are needed in this case.
Q12. How is the learning of the embedding performed?
an unsupervised evaluation of the learned embedding is performed by visualizing how changes in molecular structure compare correspond to embedding changes.
Q13. How did the authors classify the learned molecular embeddings?
To analyze how the learned molecular embeddings encode binding properties, the authors modified molecular sequences and observed changes in binding confidence to HIV-1 Protease from a binary classifier.
Q14. What is the main reason why the application of deep learning to screening assays has been made?
While the application of deep learning to prediction of molecular properties and other tasks has shown promise in aiding drug discovery, the direct application of deep learning to prediction of screening assay results has been made difficult by the limited quantity of available data.
Q15. How many non-binding sets were randomly undersampled?
To account for this, the non-binding sets were randomly undersam-pled to match the count of binding molecules for the purpose of training and evaluating a balanced binding classifier.
Q16. What was the model used in all three experiments?
The same model used in all three reaction experiments was a simple CNN composed of an input layer, two hidden convolutional layers with ReLu, and a fully connected output layer originally trained on the HIV dataset for target binding classification using a random embedding of SMILES strings.