scispace - formally typeset
Search or ask a question
Journal ArticleDOI

PaccMannRL: De novo generation of hit-like anticancer molecules from transcriptomic data via reinforcement learning.

05 Mar 2021-iScience (Elsevier)-Vol. 24, Iss: 4, pp 102269-102269
TL;DR: In this article, a hybrid VAE was used to generate drugs with high predicted efficacy against cell lines or cancer types, using an anticancer drug sensitivity prediction model as reward function.
About: This article is published in iScience.The article was published on 2021-03-05 and is currently open access. It has received 38 citations till now.
Citations
More filters
Journal ArticleDOI
TL;DR: In this paper, the authors present de novo approaches according to the coarseness of their molecular representation: that is, whether molecular design is modeled on an atom-based, fragment-based or reaction-based paradigm.

55 citations

Journal ArticleDOI
Anna Weber1, Anna Weber2, Jannis Born1, Jannis Born2, María Rodríguez Martínez2 
TL;DR: Mann et al. as mentioned in this paper proposed a bimodal neural network that explicitly encodes both TCR sequences and epitopes to enable the independent study of generalization capabilities to unseen TCRs and/or epitopes.
Abstract: Motivation The activity of the adaptive immune system is governed by T-cells and their specific T-cell receptors (TCR), which selectively recognize foreign antigens. Recent advances in experimental techniques have enabled sequencing of TCRs and their antigenic targets (epitopes), allowing to research the missing link between TCR sequence and epitope binding specificity. Scarcity of data and a large sequence space make this task challenging, and to date only models limited to a small set of epitopes have achieved good performance. Here, we establish a k-nearest-neighbor (K-NN) classifier as a strong baseline and then propose Tcr epITope bimodal Attention Networks (TITAN), a bimodal neural network that explicitly encodes both TCR sequences and epitopes to enable the independent study of generalization capabilities to unseen TCRs and/or epitopes. Results By encoding epitopes at the atomic level with SMILES sequences, we leverage transfer learning and data augmentation to enrich the input data space and boost performance. TITAN achieves high performance in the prediction of specificity of unseen TCRs (ROC-AUC 0.87 in 10-fold CV) and surpasses the results of the current state-of-the-art (ImRex) by a large margin. Notably, our Levenshtein-based K-NN classifier also exhibits competitive performance on unseen TCRs. While the generalization to unseen epitopes remains challenging, we report two major breakthroughs. First, by dissecting the attention heatmaps, we demonstrate that the sparsity of available epitope data favors an implicit treatment of epitopes as classes. This may be a general problem that limits unseen epitope performance for sufficiently complex models. Second, we show that TITAN nevertheless exhibits significantly improved performance on unseen epitopes and is capable of focusing attention on chemically meaningful molecular structures. Availability and implementation The code as well as the dataset used in this study is publicly available at https://github.com/PaccMann/TITAN. Supplementary information Supplementary data are available at Bioinformatics online.

41 citations

Journal ArticleDOI
TL;DR: This work proposes a deep learning-based method, where the knowledge of the active site structure of the target protein is sufficient to design new molecules, and validated it against two well-studied proteins.
Abstract: In recent years, deep learning-based methods have emerged as promising tools for de novo drug design. Most of these methods are ligand-based, where an initial target-specific ligand data set is necessary to design potent molecules with optimized properties. Although there have been attempts to develop alternative ways to design target-specific ligand data sets, availability of such data sets remains a challenge while designing molecules against novel target proteins. In this work, we propose a deep learning-based method, where the knowledge of the active site structure of the target protein is sufficient to design new molecules. First, a graph attention model was used to learn the structure and features of the amino acids in the active site of proteins that are experimentally known to form protein-ligand complexes. Next, the learned active site features were used along with a pretrained generative model for conditional generation of new molecules. A bioactivity prediction model was then used in a reinforcement learning framework to optimize the conditional generative model. We validated our method against two well-studied proteins, Janus kinase 2 (JAK2) and dopamine receptor D2 (DRD2), where we produce molecules similar to the known inhibitors. The graph attention model could identify the probable key active site residues, which influenced the conditional molecule generator to design new molecules with pharmacophoric features similar to the known inhibitors.

20 citations

Journal ArticleDOI
TL;DR: DeepTTA as discussed by the authors is a novel end-to-end deep learning model that utilizes transformer for drug representation learning and a multilayer neural network for transcriptomic data prediction of the anti-cancer drug responses.
Abstract: Identifying new lead molecules to treat cancer requires more than a decade of dedicated effort. Before selected drug candidates are used in the clinic, their anti-cancer activity is generally validated by in vitro cellular experiments. Therefore, accurate prediction of cancer drug response is a critical and challenging task for anti-cancer drugs design and precision medicine. With the development of pharmacogenomics, the combination of efficient drug feature extraction methods and omics data has made it possible to use computational models to assist in drug response prediction. In this study, we propose DeepTTA, a novel end-to-end deep learning model that utilizes transformer for drug representation learning and a multilayer neural network for transcriptomic data prediction of the anti-cancer drug responses. Specifically, DeepTTA uses transcriptomic gene expression data and chemical substructures of drugs for drug response prediction. Compared to existing methods, DeepTTA achieved higher performance in terms of root mean square error, Pearson correlation coefficient and Spearman's rank correlation coefficient on multiple test sets. Moreover, we discovered that anti-cancer drugs bortezomib and dactinomycin provide a potential therapeutic option with multiple clinical indications. With its excellent performance, DeepTTA is expected to be an effective method in cancer drug design.

18 citations

Journal ArticleDOI
TL;DR: A systematic literature review of experimental articles and reviews over the last five years, machine learning models, challenges associated with computational molecule design along with proposed solutions, and molecular encoding methods are discussed in this article .

18 citations

References
More filters
Journal ArticleDOI
TL;DR: A new method for performing a nonlinear form of principal component analysis by the use of integral operator kernel functions is proposed and experimental results on polynomial feature extraction for pattern recognition are presented.
Abstract: A new method for performing a nonlinear form of principal component analysis is proposed. By the use of integral operator kernel functions, one can efficiently compute principal components in high-dimensional feature spaces, related to input space by some nonlinear map—for instance, the space of all possible five-pixel products in 16 × 16 images. We give the derivation of the method and present experimental results on polynomial feature extraction for pattern recognition.

8,175 citations

Journal ArticleDOI
TL;DR: The Pan-Cancer initiative compares the first 12 tumor types profiled by TCGA with a major opportunity to develop an integrated picture of commonalities, differences and emergent themes across tumor lineages.
Abstract: The Cancer Genome Atlas (TCGA) Research Network has profiled and analyzed large numbers of human tumors to discover molecular aberrations at the DNA, RNA, protein and epigenetic levels. The resulting rich data provide a major opportunity to develop an integrated picture of commonalities, differences and emergent themes across tumor lineages. The Pan-Cancer initiative compares the first 12 tumor types profiled by TCGA. Analysis of the molecular aberrations and their functional roles across tumor types will teach us how to extend therapies effective in one cancer type to others with a similar genomic profile.

5,294 citations

Journal ArticleDOI
TL;DR: This year’s update, DrugBank 5.0, represents the most significant upgrade to the database in more than 10 years and significant improvements have been made to the quantity, quality and consistency of drug indications, drug binding data as well as drug-drug and drug-food interactions.
Abstract: DrugBank (www.drugbank.ca) is a web-enabled database containing comprehensive molecular information about drugs, their mechanisms, their interactions and their targets. First described in 2006, DrugBank has continued to evolve over the past 12 years in response to marked improvements to web standards and changing needs for drug research and development. This year's update, DrugBank 5.0, represents the most significant upgrade to the database in more than 10 years. In many cases, existing data content has grown by 100% or more over the last update. For instance, the total number of investigational drugs in the database has grown by almost 300%, the number of drug-drug interactions has grown by nearly 600% and the number of SNP-associated drug effects has grown more than 3000%. Significant improvements have been made to the quantity, quality and consistency of drug indications, drug binding data as well as drug-drug and drug-food interactions. A great deal of brand new data have also been added to DrugBank 5.0. This includes information on the influence of hundreds of drugs on metabolite levels (pharmacometabolomics), gene expression levels (pharmacotranscriptomics) and protein expression levels (pharmacoprotoemics). New data have also been added on the status of hundreds of new drug clinical trials and existing drug repurposing trials. Many other important improvements in the content, interface and performance of the DrugBank website have been made and these should greatly enhance its ease of use, utility and potential applications in many areas of pharmacological research, pharmaceutical science and drug education.

4,797 citations

Journal ArticleDOI
TL;DR: A description of their implementation has not previously been presented in the literature, and ECFPs can be very rapidly calculated and can represent an essentially infinite number of different molecular features.
Abstract: Extended-connectivity fingerprints (ECFPs) are a novel class of topological fingerprints for molecular characterization. Historically, topological fingerprints were developed for substructure and similarity searching. ECFPs were developed specifically for structure−activity modeling. ECFPs are circular fingerprints with a number of useful qualities: they can be very rapidly calculated; they are not predefined and can represent an essentially infinite number of different molecular features (including stereochemical information); their features represent the presence of particular substructures, allowing easier interpretation of analysis results; and the ECFP algorithm can be tailored to generate different types of circular fingerprints, optimized for different uses. While the use of ECFPs has been widely adopted and validated, a description of their implementation has not previously been presented in the literature.

4,173 citations

Journal ArticleDOI
TL;DR: The development, use and productivity of the NCI60 screen are reviewed, highlighting several outcomes that have contributed to advances in cancer chemotherapy.
Abstract: The US National Cancer Institute (NCI) 60 human tumour cell line anticancer drug screen (NCI60) was developed in the late 1980s as an in vitro drug-discovery tool intended to supplant the use of transplantable animal tumours in anticancer drug screening. This screening model was rapidly recognized as a rich source of information about the mechanisms of growth inhibition and tumour-cell kill. Recently, its role has changed to that of a service screen supporting the cancer research community. Here I review the development, use and productivity of the screen, highlighting several outcomes that have contributed to advances in cancer chemotherapy.

2,257 citations