scispace - formally typeset
Open accessJournal ArticleDOI: 10.1016/J.ISCI.2021.102269

PaccMannRL: De novo generation of hit-like anticancer molecules from transcriptomic data via reinforcement learning.

05 Mar 2021-iScience (Elsevier)-Vol. 24, Iss: 4, pp 102269-102269
Abstract: With the advent of deep generative models in computational chemistry, in-silico drug design is undergoing an unprecedented transformation. Although deep learning approaches have shown potential in generating compounds with desired chemical properties, they disregard the cellular environment of target diseases. Bridging systems biology and drug design, we present a reinforcement learning method for de novo molecular design from gene expression profiles. We construct a hybrid Variational Autoencoder that tailors molecules to target-specific transcriptomic profiles, using an anticancer drug sensitivity prediction model (PaccMann) as reward function. Without incorporating information about anticancer drugs, the molecule generation is biased toward compounds with high predicted efficacy against cell lines or cancer types. The generation can be further refined by subsidiary constraints such as toxicity. Our cancer-type-specific candidate drugs are similar to cancer drugs in drug-likeness, synthesizability, and solubility and frequently exhibit the highest structural similarity to compounds with known efficacy against these cancer types.

... read more

Citations
  More

9 results found


Open accessJournal ArticleDOI: 10.1016/J.DRUDIS.2021.05.019
Abstract: Molecular design strategies are integral to therapeutic progress in drug discovery. Computational approaches for de novo molecular design have been developed over the past three decades and, recently, thanks in part to advances in machine learning (ML) and artificial intelligence (AI), the drug discovery field has gained practical experience. Here, we review these learnings and present de novo approaches according to the coarseness of their molecular representation: that is, whether molecular design is modeled on an atom-based, fragment-based, or reaction-based paradigm. Furthermore, we emphasize the value of strong benchmarks, describe the main challenges to using these methods in practice, and provide a viewpoint on further opportunities for exploration and challenges to be tackled in the upcoming years.

... read more

3 Citations


Open accessPosted ContentDOI: 10.1101/2021.07.09.451519
09 Jul 2021-bioRxiv
Abstract: We are interested in generating new small molecules which could act as inhibitors of a biological target, when there is limited prior information on target-specific inhibitors. This form of drug-design is assuming increasing importance with the advent of new disease threats for which known chemicals only provide limited information about target inhibition. In this paper, we propose the combined use of deep neural networks and Inductive Logic Programming (ILP) that allows the use of symbolic domain-knowledge (B) to explore the large space of possible molecules. Assuming molecules and their activities to be instances of random variables X and Y, the problem is to draw instances from the conditional distribution of X, given Y,B (DX|Y,B). We decompose this into the constituent parts of obtaining the distributions DX|B and DY|X,B, and describe the design and implementation of models to approximate the distributions. The design consists of generators (to approximate DX|B and DX|Y,B) and a discriminator (to approximate DY|X,B). We investigate our approach using the well-studied problem of inhibitors for the Janus kinase (JAK) class of proteins. We assume first that if no data on inhibitors are available for a target protein (JAK2), but a small numbers of inhibitors are known for homologous proteins (JAK1, JAK3 and TYK2). We show that the inclusion of relational domain-knowledge results in a potentially more effective generator of inhibitors than simple random sampling from the space of molecules or a generator without access to symbolic relations. The results suggest a way of combining symbolic domain-knowledge and deep generative models to constrain the exploration of the chemical space of molecules, when there is limited information on target-inhibitors. We also show how samples from the conditional generator can be used to identify potentially novel target inhibitors.

... read more

1 Citations


Open accessJournal ArticleDOI: 10.1038/S41598-021-94564-Z
Krzysztof Koras1, Ewa Kizling1, Dilafruz Juraeva2, Eike Staub2  +1 moreInstitutions (2)
06 Aug 2021-Scientific Reports
Abstract: Computational models for drug sensitivity prediction have the potential to significantly improve personalized cancer medicine. Drug sensitivity assays, combined with profiling of cancer cell lines and drugs become increasingly available for training such models. Multiple methods were proposed for predicting drug sensitivity from cancer cell line features, some in a multi-task fashion. So far, no such model leveraged drug inhibition profiles. Importantly, multi-task models require a tailored approach to model interpretability. In this work, we develop DEERS, a neural network recommender system for kinase inhibitor sensitivity prediction. The model utilizes molecular features of the cancer cell lines and kinase inhibition profiles of the drugs. DEERS incorporates two autoencoders to project cell line and drug features into 10-dimensional hidden representations and a feed-forward neural network to combine them into response prediction. We propose a novel interpretability approach, which in addition to the set of modeled features considers also the genes and processes outside of this set. Our approach outperforms simpler matrix factorization models, achieving R $$=$$ 0.82 correlation between true and predicted response for the unseen cell lines. The interpretability analysis identifies 67 biological processes that drive the cell line sensitivity to particular compounds. Detailed case studies are shown for PHA-793887, XMD14-99 and Dabrafenib.

... read more

Topics: Interpretability (53%)

1 Citations


Open accessPosted ContentDOI: 10.33774/CHEMRXIV-2021-XZGST
04 Oct 2021-ChemRxiv
Abstract: Generative chemical language models (CLMs) can be used for de novo molecular structure generation. These CLMs learn from the structural information of known molecules to generate new ones. In this paper, we show that “hybrid” CLMs can additionally leverage the bioactivity information available for the training compounds. To computationally design ligands of phosphoinositide 3-kinase gamma (PI3Kγ), we created a large collection of virtual molecules with a generative CLM. This primary virtual compound library was further refined using a CLM-based classifier for bioactivity prediction. This second hybrid CLM was pretrained with patented molecular structures and fine-tuned with known PI3Kγ binders and non-binders by transfer learning. Several of the computer-generated molecular designs were commercially available, which allowed for fast prescreening and preliminary experimental validation. A new PI3Kγ ligand with sub-micromolar activity was identified. The results positively advocate hybrid CLMs for virtual compound screening and activity-focused molecular design in low-data situations.

... read more

1 Citations


Open accessPosted Content
Yi Zhang1Institutions (1)
Abstract: As a promising tool to navigate in the vast chemical space, artificial intelligence (AI) is leveraged for drug design. From the year 2017 to 2021, the number of applications of several recent AI models (i.e. graph neural network (GNN), recurrent neural network (RNN), variation autoencoder (VAE), generative adversarial network (GAN), flow and reinforcement learning (RL)) in drug design increases significantly. Many relevant literature reviews exist. However, none of them provides an in-depth summary of many applications of the recent AI models in drug design. To complement the existing literature, this survey includes the theoretical development of the previously mentioned AI models and detailed summaries of 42 recent applications of AI in drug design. Concretely, 13 of them leverage GNN for molecular property prediction and 29 of them use RL and/or deep generative models for molecule generation and optimization. In most cases, the focus of the summary is the models, their variants, and modifications for specific tasks in drug design. Moreover, 60 additional applications of AI in molecule generation and optimization are briefly summarized in a table. Finally, this survey provides a holistic discussion of the abundant applications so that the tasks, potential solutions, and challenges in AI-based drug design become evident.

... read more


References
  More

66 results found


Open accessJournal ArticleDOI: 10.1162/089976698300017467
01 Jul 1998-Neural Computation
Abstract: A new method for performing a nonlinear form of principal component analysis is proposed. By the use of integral operator kernel functions, one can efficiently compute principal components in high-dimensional feature spaces, related to input space by some nonlinear map—for instance, the space of all possible five-pixel products in 16 × 16 images. We give the derivation of the method and present experimental results on polynomial feature extraction for pattern recognition.

... read more

Topics: Kernel principal component analysis (69%), Kernel method (63%), Polynomial kernel (63%) ... read more

7,611 Citations


Open accessJournal ArticleDOI: 10.1038/NG.2764
01 Oct 2013-Nature Genetics
Abstract: The Cancer Genome Atlas (TCGA) Research Network has profiled and analyzed large numbers of human tumors to discover molecular aberrations at the DNA, RNA, protein and epigenetic levels. The resulting rich data provide a major opportunity to develop an integrated picture of commonalities, differences and emergent themes across tumor lineages. The Pan-Cancer initiative compares the first 12 tumor types profiled by TCGA. Analysis of the molecular aberrations and their functional roles across tumor types will teach us how to extend therapies effective in one cancer type to others with a similar genomic profile.

... read more

Topics: Genomics (50%)

4,022 Citations


Journal ArticleDOI: 10.1021/CI100050T
David Rogers1, Mathew Hahn1Institutions (1)
Abstract: Extended-connectivity fingerprints (ECFPs) are a novel class of topological fingerprints for molecular characterization. Historically, topological fingerprints were developed for substructure and similarity searching. ECFPs were developed specifically for structure−activity modeling. ECFPs are circular fingerprints with a number of useful qualities: they can be very rapidly calculated; they are not predefined and can represent an essentially infinite number of different molecular features (including stereochemical information); their features represent the presence of particular substructures, allowing easier interpretation of analysis results; and the ECFP algorithm can be tailored to generate different types of circular fingerprints, optimized for different uses. While the use of ECFPs has been widely adopted and validated, a description of their implementation has not previously been presented in the literature.

... read more

2,865 Citations


Open accessJournal ArticleDOI: 10.1093/NAR/GKX1037
Abstract: DrugBank (www.drugbank.ca) is a web-enabled database containing comprehensive molecular information about drugs, their mechanisms, their interactions and their targets. First described in 2006, DrugBank has continued to evolve over the past 12 years in response to marked improvements to web standards and changing needs for drug research and development. This year's update, DrugBank 5.0, represents the most significant upgrade to the database in more than 10 years. In many cases, existing data content has grown by 100% or more over the last update. For instance, the total number of investigational drugs in the database has grown by almost 300%, the number of drug-drug interactions has grown by nearly 600% and the number of SNP-associated drug effects has grown more than 3000%. Significant improvements have been made to the quantity, quality and consistency of drug indications, drug binding data as well as drug-drug and drug-food interactions. A great deal of brand new data have also been added to DrugBank 5.0. This includes information on the influence of hundreds of drugs on metabolite levels (pharmacometabolomics), gene expression levels (pharmacotranscriptomics) and protein expression levels (pharmacoprotoemics). New data have also been added on the status of hundreds of new drug clinical trials and existing drug repurposing trials. Many other important improvements in the content, interface and performance of the DrugBank website have been made and these should greatly enhance its ease of use, utility and potential applications in many areas of pharmacological research, pharmaceutical science and drug education.

... read more

Topics: DrugBank (81%)

2,626 Citations


Journal ArticleDOI: 10.1038/NRC1951
Robert H. Shoemaker1Institutions (1)
Abstract: The US National Cancer Institute (NCI) 60 human tumour cell line anticancer drug screen (NCI60) was developed in the late 1980s as an in vitro drug-discovery tool intended to supplant the use of transplantable animal tumours in anticancer drug screening. This screening model was rapidly recognized as a rich source of information about the mechanisms of growth inhibition and tumour-cell kill. Recently, its role has changed to that of a service screen supporting the cancer research community. Here I review the development, use and productivity of the screen, highlighting several outcomes that have contributed to advances in cancer chemotherapy.

... read more

2,005 Citations


Performance
Metrics
No. of citations received by the Paper in previous years
YearCitations
20219