scispace - formally typeset
Search or ask a question
Posted ContentDOI

Multi-Resolution Autoregressive Graph-to-Graph Translation for Molecules

TL;DR: In this paper, a graph-to-graph translation method for molecular optimization is proposed, which realizes coherent multi-resolution representations by interweaving trees over substructures with the atom-level encoding of the original molecular graph.
Abstract: The problem of accelerating drug discovery relies heavily on automatic tools to optimize precursor molecules to afford them with better biochemical properties. Our work in this paper substantially extends prior state-of-the-art on graph-to-graph translation methods for molecular optimization. In particular, we realize coherent multi-resolution representations by interweaving trees over substructures with the atom-level encoding of the original molecular graph. Moreover, our graph decoder is fully autoregressive, and interleaves each step of adding a new substructure with the process of resolving its connectivity to the emerging molecule. We evaluate our model on multiple molecular optimization tasks and show that our model outperforms previous state-of-the-art baselines by a large margin.
Citations
More filters
Journal ArticleDOI
TL;DR: This review presents some of the most popular electronic molecular and macromolecular representations used in drug discovery, many of which are based on graph representations, and describes applications of these representations in AI-driven drug discovery.
Abstract: The technological advances of the past century, marked by the computer revolution and the advent of high-throughput screening technologies in drug discovery, opened the path to the computational analysis and visualization of bioactive molecules. For this purpose, it became necessary to represent molecules in a syntax that would be readable by computers and understandable by scientists of various fields. A large number of chemical representations have been developed over the years, their numerosity being due to the fast development of computers and the complexity of producing a representation that encompasses all structural and chemical characteristics. We present here some of the most popular electronic molecular and macromolecular representations used in drug discovery, many of which are based on graph representations. Furthermore, we describe applications of these representations in AI-driven drug discovery. Our aim is to provide a brief guide on structural representations that are essential to the practice of AI in drug discovery. This review serves as a guide for researchers who have little experience with the handling of chemical representations and plan to work on applications at the interface of these fields.

190 citations

Journal ArticleDOI
TL;DR: A simple approach to the task of focused molecular generation for drug design purposes by constructing a conditional recurrent neural network (cRNN) that aggregate selected molecular descriptors and transform them into the initial memory state of the network before starting the generation of alphanumeric strings that describe molecules.
Abstract: Deep learning has acquired considerable momentum over the past couple of years in the domain of de novo drug design. Here, we propose a simple approach to the task of focused molecular generation for drug design purposes by constructing a conditional recurrent neural network (cRNN). We aggregate selected molecular descriptors and transform them into the initial memory state of the network before starting the generation of alphanumeric strings that describe molecules. We thus tackle the inverse design problem directly, as the cRNNs may generate molecules near the specified conditions. Moreover, we exemplify a novel way of assessing the focus of the conditional output of such a model using negative log-likelihood plots. The output is more focused than traditional unbiased RNNs, yet less focused than autoencoders, thus representing a novel method with intermediate output specificity between well-established methods. Conceptually, our architecture shows promise for the generalized problem of steering of sequential data generation with recurrent neural networks. The rise of deep neural networks allows for new ways to design molecules that interact with biological structures. An approach that uses conditional recurrent neural networks generates molecules with properties near specified conditions.

116 citations

Journal ArticleDOI
02 Mar 2021
TL;DR: This work compares six different GNN-based generative models in GraphINVENT, and shows that ultimately the gated-graph neural network performs best against the metrics considered here.
Abstract: Deep learning methods applied to chemistry can be used to accelerate the discovery of new molecules. This work introduces GraphINVENT, a platform developed for graph-based molecular design using graph neural networks (GNNs). GraphINVENT uses a tiered deep neural network architecture to probabilistically generate new molecules a single bond at a time. All models implemented in GraphINVENT can quickly learn to build molecules resembling the training set molecules without any explicit programming of chemical rules. The models have been benchmarked using the MOSES distribution-based metrics, showing how GraphINVENT models compare well with state-of-the-art generative models. This work is one of the first thorough graph-based molecular design studies, and illustrates how GNN-based models are promising tools for molecular discovery.

87 citations

Posted Content
TL;DR: MIMOSA enables flexible encoding of multiple property- and similarity-constraints and can efficiently generate new molecules that satisfy various property constraints and achieved up to 49.6% relative improvement over the best baseline in terms of success rate.
Abstract: Molecule optimization is a fundamental task for accelerating drug discovery, with the goal of generating new valid molecules that maximize multiple drug properties while maintaining similarity to the input molecule. Existing generative models and reinforcement learning approaches made initial success, but still face difficulties in simultaneously optimizing multiple drug properties. To address such challenges, we propose the MultI-constraint MOlecule SAmpling (MIMOSA) approach, a sampling framework to use input molecule as an initial guess and sample molecules from the target distribution. MIMOSA first pretrains two property agnostic graph neural networks (GNNs) for molecule topology and substructure-type prediction, where a substructure can be either atom or single ring. For each iteration, MIMOSA uses the GNNs' prediction and employs three basic substructure operations (add, replace, delete) to generate new molecules and associated weights. The weights can encode multiple constraints including similarity and drug property constraints, upon which we select promising molecules for next iteration. MIMOSA enables flexible encoding of multiple property- and similarity-constraints and can efficiently generate new molecules that satisfy various property constraints and achieved up to 49.6% relative improvement over the best baseline in terms of success rate.

34 citations

Posted Content
TL;DR: This paper proposes a surprisingly effective self-training approach for iteratively creating additional molecular targets by pre-train the generative model together with a simple property predictor and demonstrates significant gains over strong baselines for both unconditional and conditional molecular design.
Abstract: Generative models in molecular design tend to be richly parameterized, data-hungry neural models, as they must create complex structured objects as outputs. Estimating such models from data may be challenging due to the lack of sufficient training data. In this paper, we propose a surprisingly effective self-training approach for iteratively creating additional molecular targets. We first pre-train the generative model together with a simple property predictor. The property predictor is then used as a likelihood model for filtering candidate structures from the generative model. Additional targets are iteratively produced and used in the course of stochastic EM iterations to maximize the log-likelihood that the candidate structures are accepted. A simple rejection (re-weighting) sampler suffices to draw posterior samples since the generative model is already reasonable after pre-training. We demonstrate significant gains over strong baselines for both unconditional and conditional molecular design. In particular, our approach outperforms the previous state-of-the-art in conditional molecular design by over 10% in absolute gain. Finally, we show that our approach is useful in other domains as well, such as program synthesis.

13 citations