SMILES-based deep generative scaffold decorator for de-novo drug design

doi:10.1186/S13321-020-00441-8

Open AccessJournal ArticleDOI

SMILES-based deep generative scaffold decorator for de-novo drug design

Josep Arús-Pous, +7 more

- 29 May 2020 -

Journal of Cheminformatics

- Vol. 12, Iss: 1, pp 38-38

TLDR

A new SMILES-based molecular generative architecture that generates molecules from scaffolds and can be trained from any arbitrary molecular set and serves as a data augmentation technique and is readily coupled with randomized SMilES to obtain even better results with small sets.

Abstract:

Molecular generative models trained with small sets of molecules represented as SMILES strings can generate large regions of the chemical space. Unfortunately, due to the sequential nature of SMILES strings, these models are not able to generate molecules given a scaffold (i.e., partially-built molecules with explicit attachment points). Herein we report a new SMILES-based molecular generative architecture that generates molecules from scaffolds and can be trained from any arbitrary molecular set. This approach is possible thanks to a new molecular set pre-processing algorithm that exhaustively slices all possible combinations of acyclic bonds of every molecule, combinatorically obtaining a large number of scaffolds with their respective decorations. Moreover, it serves as a data augmentation technique and can be readily coupled with randomized SMILES to obtain even better results with small sets. Two examples showcasing the potential of the architecture in medicinal and synthetic chemistry are described: First, models were trained with a training set obtained from a small set of Dopamine Receptor D2 (DRD2) active modulators and were able to meaningfully decorate a wide range of scaffolds and obtain molecular series predicted active on DRD2. Second, a larger set of drug-like molecules from ChEMBL was selectively sliced using synthetic chemistry constraints (RECAP rules). In this case, the resulting scaffolds with decorations were filtered only to allow those that included fragment-like decorations. This filtering process allowed models trained with this dataset to selectively decorate diverse scaffolds with fragments that were generally predicted to be synthesizable and attachable to the scaffold using known synthetic approaches. In both cases, the models were already able to decorate molecules using specific knowledge without the need to add it with other techniques, such as reinforcement learning. We envision that this architecture will become a useful addition to the already existent architectures for de novo molecular generation.

SMILES-based deep generative scaffold decorator for de-novo drug design

Citations

Deep Generative Models for 3D Linker Design.

REINVENT 2.0: An AI Tool for De Novo Drug Design.

Mapping the space of chemical reactions using attention-based neural networks

Advanced machine-learning techniques in drug discovery.

LigGPT: Molecular Generation using a Transformer-Decoder Model

References

Adam: A Method for Stochastic Optimization

Long short-term memory

Neural Machine Translation by Jointly Learning to Align and Translate

Automatic differentiation in PyTorch

Effective Approaches to Attention-based Neural Machine Translation

Related Papers (5)

Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules

Generating Focused Molecule Libraries for Drug Discovery with Recurrent Neural Networks

Deep reinforcement learning for de novo drug design

SMILES, a chemical language and information system. 1. introduction to methodology and encoding rules

Molecular de-novo design through deep reinforcement learning