SMILES-based deep generative scaffold decorator for de-novo drug design
Josep Arús-Pous,Josep Arús-Pous,Atanas Patronov,Esben Jannik Bjerrum,Christian Tyrchan,Jean-Louis Reymond,Hongming Chen,Ola Engkvist +7 more
TLDR
A new SMILES-based molecular generative architecture that generates molecules from scaffolds and can be trained from any arbitrary molecular set and serves as a data augmentation technique and is readily coupled with randomized SMilES to obtain even better results with small sets.Abstract:
Molecular generative models trained with small sets of molecules represented as SMILES strings can generate large regions of the chemical space. Unfortunately, due to the sequential nature of SMILES strings, these models are not able to generate molecules given a scaffold (i.e., partially-built molecules with explicit attachment points). Herein we report a new SMILES-based molecular generative architecture that generates molecules from scaffolds and can be trained from any arbitrary molecular set. This approach is possible thanks to a new molecular set pre-processing algorithm that exhaustively slices all possible combinations of acyclic bonds of every molecule, combinatorically obtaining a large number of scaffolds with their respective decorations. Moreover, it serves as a data augmentation technique and can be readily coupled with randomized SMILES to obtain even better results with small sets. Two examples showcasing the potential of the architecture in medicinal and synthetic chemistry are described: First, models were trained with a training set obtained from a small set of Dopamine Receptor D2 (DRD2) active modulators and were able to meaningfully decorate a wide range of scaffolds and obtain molecular series predicted active on DRD2. Second, a larger set of drug-like molecules from ChEMBL was selectively sliced using synthetic chemistry constraints (RECAP rules). In this case, the resulting scaffolds with decorations were filtered only to allow those that included fragment-like decorations. This filtering process allowed models trained with this dataset to selectively decorate diverse scaffolds with fragments that were generally predicted to be synthesizable and attachable to the scaffold using known synthetic approaches. In both cases, the models were already able to decorate molecules using specific knowledge without the need to add it with other techniques, such as reinforcement learning. We envision that this architecture will become a useful addition to the already existent architectures for de novo molecular generation.read more
Citations
More filters
Journal ArticleDOI
Deep Generative Models for 3D Linker Design.
Fergus Imrie,Anthony R. Bradley,Mihaela van der Schaar,Mihaela van der Schaar,Charlotte M. Deane +4 more
TL;DR: This is the first molecular generative model to incorporate 3D structural information directly in the design process, and the effectiveness and applicability of this approach on a diverse range of design problems: fragment linking, scaffold hopping, and proteolysis targeting chimera (PROTAC) design.
Journal ArticleDOI
REINVENT 2.0: An AI Tool for De Novo Drug Design.
Thomas Blaschke,Josep Arús-Pous,Josep Arús-Pous,Hongming Chen,Christian Margreitter,Christian Tyrchan,Ola Engkvist,Kostas Papadopoulos,Atanas Patronov +8 more
TL;DR: This application note aims to offer the community a production-ready tool for de novo design, called REINVENT, which can be effectively applied on drug discovery projects that are striving to resolve either exploration or exploitation problems while navigating the chemical space.
Journal ArticleDOI
Mapping the space of chemical reactions using attention-based neural networks
Philippe Schwaller,Philippe Schwaller,Daniel Probst,Alain C. Vaucher,Vishnu H. Nair,David Kreutter,Teodoro Laino,Jean-Louis Reymond +7 more
TL;DR: It is shown that transformer-based models can infer reaction classes from non-annotated, simple text-based representations of chemical reactions, and that the learned representations can be used as reaction fingerprints that capture fine-grained differences between reaction classes better than traditional reaction fingerprints.
Journal ArticleDOI
Advanced machine-learning techniques in drug discovery.
TL;DR: The use of advanced techniques are detailed to circumvent challenges of big data, sparsity in data, and their lack of interpretability, to expand the applicability of ML in drug discovery.
Posted ContentDOI
LigGPT: Molecular Generation using a Transformer-Decoder Model
TL;DR: The model, LigGPT, outperforms other previously proposed modern machine learning frameworks for molecular generation in terms of generating valid, unique and novel molecules and it is demonstrated that the model can be trained conditionally to optimize multiple properties of the generated molecules.
References
More filters
Proceedings Article
Adam: A Method for Stochastic Optimization
Diederik P. Kingma,Jimmy Ba +1 more
TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Journal ArticleDOI
Long short-term memory
TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.
Posted Content
Neural Machine Translation by Jointly Learning to Align and Translate
TL;DR: In this paper, the authors propose to use a soft-searching model to find the parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.
Automatic differentiation in PyTorch
Adam Paszke,Sam Gross,Soumith Chintala,Gregory Chanan,Edward Z. Yang,Zachary DeVito,Zeming Lin,Alban Desmaison,Luca Antiga,Adam Lerer +9 more
TL;DR: An automatic differentiation module of PyTorch is described — a library designed to enable rapid research on machine learning models that focuses on differentiation of purely imperative programs, with a focus on extensibility and low overhead.
Proceedings ArticleDOI
Effective Approaches to Attention-based Neural Machine Translation
TL;DR: A global approach which always attends to all source words and a local one that only looks at a subset of source words at a time are examined, demonstrating the effectiveness of both approaches on the WMT translation tasks between English and German in both directions.