PaccMannRL: De novo generation of hit-like anticancer molecules from transcriptomic data via reinforcement learning.

doi:10.1016/J.ISCI.2021.102269

Home
/
Papers
/
PaccMannRL: De novo generation of hit-like anticancer molecules from transcriptomic data via reinforcement learning.

Journal Article•DOI•

PaccMannRL: De novo generation of hit-like anticancer molecules from transcriptomic data via reinforcement learning.

Jannis Born¹, Jannis Born², Matteo Manica², Ali Oskooei², Joris Cadow², Greta Markert¹, Greta Markert², María Rodríguez Martínez² - Show less +4 more•Institutions (2)

ETH Zurich¹, IBM²

05 Mar 2021-iScience (Elsevier)-Vol. 24, Iss: 4, pp 102269-102269

TL;DR: In this article, a hybrid VAE was used to generate drugs with high predicted efficacy against cell lines or cancer types, using an anticancer drug sensitivity prediction model as reward function.

read less

About: This article is published in iScience.The article was published on 2021-03-05 and is currently open access. It has received 38 citations till now.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

De novo molecular design and generative models

[...]

Joshua Meyers, Benedek Fabian, Nathan J. Brown

01 Jun 2021-Drug Discovery Today

TL;DR: In this paper, the authors present de novo approaches according to the coarseness of their molecular representation: that is, whether molecular design is modeled on an atom-based, fragment-based or reaction-based paradigm.

...read moreread less

55 citations

Journal Article•DOI•

TITAN: T-cell receptor specificity prediction with bimodal attention networks

[...]

Anna Weber¹, Anna Weber², Jannis Born¹, Jannis Born², María Rodríguez Martínez² - Show less +1 more•Institutions (2)

ETH Zurich¹, IBM²

01 Jul 2021-Bioinformatics

TL;DR: Mann et al. as mentioned in this paper proposed a bimodal neural network that explicitly encodes both TCR sequences and epitopes to enable the independent study of generalization capabilities to unseen TCRs and/or epitopes.

...read moreread less

Abstract: Motivation The activity of the adaptive immune system is governed by T-cells and their specific T-cell receptors (TCR), which selectively recognize foreign antigens. Recent advances in experimental techniques have enabled sequencing of TCRs and their antigenic targets (epitopes), allowing to research the missing link between TCR sequence and epitope binding specificity. Scarcity of data and a large sequence space make this task challenging, and to date only models limited to a small set of epitopes have achieved good performance. Here, we establish a k-nearest-neighbor (K-NN) classifier as a strong baseline and then propose Tcr epITope bimodal Attention Networks (TITAN), a bimodal neural network that explicitly encodes both TCR sequences and epitopes to enable the independent study of generalization capabilities to unseen TCRs and/or epitopes. Results By encoding epitopes at the atomic level with SMILES sequences, we leverage transfer learning and data augmentation to enrich the input data space and boost performance. TITAN achieves high performance in the prediction of specificity of unseen TCRs (ROC-AUC 0.87 in 10-fold CV) and surpasses the results of the current state-of-the-art (ImRex) by a large margin. Notably, our Levenshtein-based K-NN classifier also exhibits competitive performance on unseen TCRs. While the generalization to unseen epitopes remains challenging, we report two major breakthroughs. First, by dissecting the attention heatmaps, we demonstrate that the sparsity of available epitope data favors an implicit treatment of epitopes as classes. This may be a general problem that limits unseen epitope performance for sufficiently complex models. Second, we show that TITAN nevertheless exhibits significantly improved performance on unseen epitopes and is capable of focusing attention on chemically meaningful molecular structures. Availability and implementation The code as well as the dataset used in this study is publicly available at https://github.com/PaccMann/TITAN. Supplementary information Supplementary data are available at Bioinformatics online.

...read moreread less

41 citations

Journal Article•DOI•

De Novo Structure-Based Drug Design Using Deep Learning.

[...]

Sowmya Ramaswamy Krishnan¹, Navneet Bung¹, Sarveswara Rao Vangala¹, Rajgopal Srinivasan¹, Gopalakrishnan Bulusu¹, Arijit Roy¹ - Show less +2 more•Institutions (1)

Tata Consultancy Services¹

18 Nov 2021-Journal of Chemical Information and Modeling

TL;DR: This work proposes a deep learning-based method, where the knowledge of the active site structure of the target protein is sufficient to design new molecules, and validated it against two well-studied proteins.

...read moreread less

Abstract: In recent years, deep learning-based methods have emerged as promising tools for de novo drug design. Most of these methods are ligand-based, where an initial target-specific ligand data set is necessary to design potent molecules with optimized properties. Although there have been attempts to develop alternative ways to design target-specific ligand data sets, availability of such data sets remains a challenge while designing molecules against novel target proteins. In this work, we propose a deep learning-based method, where the knowledge of the active site structure of the target protein is sufficient to design new molecules. First, a graph attention model was used to learn the structure and features of the amino acids in the active site of proteins that are experimentally known to form protein-ligand complexes. Next, the learned active site features were used along with a pretrained generative model for conditional generation of new molecules. A bioactivity prediction model was then used in a reinforcement learning framework to optimize the conditional generative model. We validated our method against two well-studied proteins, Janus kinase 2 (JAK2) and dopamine receptor D2 (DRD2), where we produce molecules similar to the known inhibitors. The graph attention model could identify the probable key active site residues, which influenced the conditional molecule generator to design new molecules with pharmacophoric features similar to the known inhibitors.

...read moreread less

20 citations

Journal Article•DOI•

OUP accepted manuscript

[...]

25 Mar 2022-Briefings in Bioinformatics

TL;DR: DeepTTA as discussed by the authors is a novel end-to-end deep learning model that utilizes transformer for drug representation learning and a multilayer neural network for transcriptomic data prediction of the anti-cancer drug responses.

...read moreread less

Abstract: Identifying new lead molecules to treat cancer requires more than a decade of dedicated effort. Before selected drug candidates are used in the clinic, their anti-cancer activity is generally validated by in vitro cellular experiments. Therefore, accurate prediction of cancer drug response is a critical and challenging task for anti-cancer drugs design and precision medicine. With the development of pharmacogenomics, the combination of efficient drug feature extraction methods and omics data has made it possible to use computational models to assist in drug response prediction. In this study, we propose DeepTTA, a novel end-to-end deep learning model that utilizes transformer for drug representation learning and a multilayer neural network for transcriptomic data prediction of the anti-cancer drug responses. Specifically, DeepTTA uses transcriptomic gene expression data and chemical substructures of drugs for drug response prediction. Compared to existing methods, DeepTTA achieved higher performance in terms of root mean square error, Pearson correlation coefficient and Spearman's rank correlation coefficient on multiple test sets. Moreover, we discovered that anti-cancer drugs bortezomib and dactinomycin provide a potential therapeutic option with multiple clinical indications. With its excellent performance, DeepTTA is expected to be an effective method in cancer drug design.

...read moreread less

18 citations

Journal Article•DOI•

Generative machine learning for de novo drug discovery: A systematic review

[...]

Dominic D. Martinelli

01 Mar 2022-Computers in Biology and Medicine

TL;DR: A systematic literature review of experimental articles and reviews over the last five years, machine learning models, challenges associated with computational molecule design along with proposed solutions, and molecular encoding methods are discussed in this article .

...read moreread less

18 citations

1
2
3
4
…
5
6
7
8

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Nonlinear component analysis as a kernel eigenvalue problem

[...]

Bernhard Schölkopf¹, Alexander J. Smola, Klaus-Robert Müller•Institutions (1)

Max Planck Society¹

01 Jul 1998-Neural Computation

TL;DR: A new method for performing a nonlinear form of principal component analysis by the use of integral operator kernel functions is proposed and experimental results on polynomial feature extraction for pattern recognition are presented.

...read moreread less

Abstract: A new method for performing a nonlinear form of principal component analysis is proposed. By the use of integral operator kernel functions, one can efficiently compute principal components in high-dimensional feature spaces, related to input space by some nonlinear map—for instance, the space of all possible five-pixel products in 16 × 16 images. We give the derivation of the method and present experimental results on polynomial feature extraction for pattern recognition.

...read moreread less

8,175 citations

Journal Article•DOI•

The cancer genome atlas pan-cancer analysis project

[...]

John N. Weinstein¹, John N. Weinstein², Eric A. Collisson³, Gordon B. Mills² +376 more•Institutions (31)

01 Oct 2013-Nature Genetics

TL;DR: The Pan-Cancer initiative compares the first 12 tumor types profiled by TCGA with a major opportunity to develop an integrated picture of commonalities, differences and emergent themes across tumor lineages.

...read moreread less

Abstract: The Cancer Genome Atlas (TCGA) Research Network has profiled and analyzed large numbers of human tumors to discover molecular aberrations at the DNA, RNA, protein and epigenetic levels. The resulting rich data provide a major opportunity to develop an integrated picture of commonalities, differences and emergent themes across tumor lineages. The Pan-Cancer initiative compares the first 12 tumor types profiled by TCGA. Analysis of the molecular aberrations and their functional roles across tumor types will teach us how to extend therapies effective in one cancer type to others with a similar genomic profile.

...read moreread less

5,294 citations

Journal Article•DOI•

DrugBank 5.0: a major update to the DrugBank database for 2018

[...]

David S. Wishart, Yannick Djoumbou Feunang¹, An Chi Guo¹, Elvis J. Lo¹, Ana Marcu¹, Jason R. Grant¹, Tanvir Sajed¹, Daniel Johnson¹, Carin Li¹, Zinat Sayeeda¹, Nazanin Assempour¹, Ithayavani Iynkkaran¹, Yifeng Liu¹, Adam Maciejewski¹, Nicola Gale, Alex Wilson, Lucy Chin, Ryan Cummings, Diana Le, Allison Pon¹, Craig Knox¹, Michael Wilson¹ - Show less +18 more•Institutions (1)

University of Alberta¹

04 Jan 2018-Nucleic Acids Research

TL;DR: This year’s update, DrugBank 5.0, represents the most significant upgrade to the database in more than 10 years and significant improvements have been made to the quantity, quality and consistency of drug indications, drug binding data as well as drug-drug and drug-food interactions.

...read moreread less

Abstract: DrugBank (www.drugbank.ca) is a web-enabled database containing comprehensive molecular information about drugs, their mechanisms, their interactions and their targets. First described in 2006, DrugBank has continued to evolve over the past 12 years in response to marked improvements to web standards and changing needs for drug research and development. This year's update, DrugBank 5.0, represents the most significant upgrade to the database in more than 10 years. In many cases, existing data content has grown by 100% or more over the last update. For instance, the total number of investigational drugs in the database has grown by almost 300%, the number of drug-drug interactions has grown by nearly 600% and the number of SNP-associated drug effects has grown more than 3000%. Significant improvements have been made to the quantity, quality and consistency of drug indications, drug binding data as well as drug-drug and drug-food interactions. A great deal of brand new data have also been added to DrugBank 5.0. This includes information on the influence of hundreds of drugs on metabolite levels (pharmacometabolomics), gene expression levels (pharmacotranscriptomics) and protein expression levels (pharmacoprotoemics). New data have also been added on the status of hundreds of new drug clinical trials and existing drug repurposing trials. Many other important improvements in the content, interface and performance of the DrugBank website have been made and these should greatly enhance its ease of use, utility and potential applications in many areas of pharmacological research, pharmaceutical science and drug education.

...read moreread less

4,797 citations

Journal Article•DOI•

Extended-Connectivity Fingerprints

[...]

David Rogers¹, Mathew Hahn¹•Institutions (1)

Symyx Technologies¹

28 Apr 2010-Journal of Chemical Information and Modeling

TL;DR: A description of their implementation has not previously been presented in the literature, and ECFPs can be very rapidly calculated and can represent an essentially infinite number of different molecular features.

...read moreread less

Abstract: Extended-connectivity fingerprints (ECFPs) are a novel class of topological fingerprints for molecular characterization. Historically, topological fingerprints were developed for substructure and similarity searching. ECFPs were developed specifically for structure−activity modeling. ECFPs are circular fingerprints with a number of useful qualities: they can be very rapidly calculated; they are not predefined and can represent an essentially infinite number of different molecular features (including stereochemical information); their features represent the presence of particular substructures, allowing easier interpretation of analysis results; and the ECFP algorithm can be tailored to generate different types of circular fingerprints, optimized for different uses. While the use of ECFPs has been widely adopted and validated, a description of their implementation has not previously been presented in the literature.

...read moreread less

4,173 citations

Journal Article•DOI•

The NCI60 human tumour cell line anticancer drug screen

[...]

Robert H. Shoemaker¹•Institutions (1)

National Institutes of Health¹

01 Oct 2006-Nature Reviews Cancer

TL;DR: The development, use and productivity of the NCI60 screen are reviewed, highlighting several outcomes that have contributed to advances in cancer chemotherapy.

...read moreread less

Abstract: The US National Cancer Institute (NCI) 60 human tumour cell line anticancer drug screen (NCI60) was developed in the late 1980s as an in vitro drug-discovery tool intended to supplant the use of transplantable animal tumours in anticancer drug screening. This screening model was rapidly recognized as a rich source of information about the mechanisms of growth inhibition and tumour-cell kill. Recently, its role has changed to that of a service screen supporting the cancer research community. Here I review the development, use and productivity of the screen, highlighting several outcomes that have contributed to advances in cancer chemotherapy.

...read moreread less

2,257 citations