scispace - formally typeset
Search or ask a question
Author

Krzysztof Koras

Bio: Krzysztof Koras is an academic researcher from University of Warsaw. The author has contributed to research in topics: Gene & Druggability. The author has an hindex of 1, co-authored 4 publications receiving 10 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: Standard, data-driven feature selection approaches to feature selection driven by prior knowledge of drug targets, target pathways, and gene expression signatures are compared, finding small feature sets selected using prior knowledge are more predictive for drugs targeting specific genes and pathways, while models with wider feature sets perform better for drugs affecting general cellular mechanisms.
Abstract: Drug sensitivity prediction constitutes one of the main challenges in personalized medicine. Critically, the sensitivity of cancer cells to treatment depends on an unknown subset of a large number of biological features. Here, we compare standard, data-driven feature selection approaches to feature selection driven by prior knowledge of drug targets, target pathways, and gene expression signatures. We asses these methodologies on Genomics of Drug Sensitivity in Cancer (GDSC) dataset, evaluating 2484 unique models. For 23 drugs, better predictive performance is achieved when the features are selected according to prior knowledge of drug targets and pathways. The best correlation of observed and predicted response using the test set is achieved for Linifanib (r = 0.75). Extending the drug-dependent features with gene expression signatures yields the most predictive models for 60 drugs, with the best performing example of Dabrafenib. For many compounds, even a very small subset of drug-related features is highly predictive of drug sensitivity. Small feature sets selected using prior knowledge are more predictive for drugs targeting specific genes and pathways, while models with wider feature sets perform better for drugs affecting general cellular mechanisms. Appropriate feature selection strategies facilitate the development of interpretable models that are indicative for therapy design.

27 citations

Journal ArticleDOI
TL;DR: DEERS as mentioned in this paper is a neural network recommender system for kinase inhibitor sensitivity prediction, which combines two autoencoders to project cell line and drug features into 10-dimensional hidden representations and combines them into response prediction.
Abstract: Computational models for drug sensitivity prediction have the potential to significantly improve personalized cancer medicine. Drug sensitivity assays, combined with profiling of cancer cell lines and drugs become increasingly available for training such models. Multiple methods were proposed for predicting drug sensitivity from cancer cell line features, some in a multi-task fashion. So far, no such model leveraged drug inhibition profiles. Importantly, multi-task models require a tailored approach to model interpretability. In this work, we develop DEERS, a neural network recommender system for kinase inhibitor sensitivity prediction. The model utilizes molecular features of the cancer cell lines and kinase inhibition profiles of the drugs. DEERS incorporates two autoencoders to project cell line and drug features into 10-dimensional hidden representations and a feed-forward neural network to combine them into response prediction. We propose a novel interpretability approach, which in addition to the set of modeled features considers also the genes and processes outside of this set. Our approach outperforms simpler matrix factorization models, achieving R $$=$$ 0.82 correlation between true and predicted response for the unseen cell lines. The interpretability analysis identifies 67 biological processes that drive the cell line sensitivity to particular compounds. Detailed case studies are shown for PHA-793887, XMD14-99 and Dabrafenib.

4 citations

Posted ContentDOI
29 Nov 2019-bioRxiv
TL;DR: Based on GDSC drug data, it is found that feature selection driven by prior knowledge tends to yield better results for drugs targeting specific genes and pathways, while models with the genome-wide features perform better for drugs affecting general mechanisms such as metabolism and DNA replication.
Abstract: Drug sensitivity prediction constitutes one of the main challenges in personalized medicine. The major difficulty of this problem stems from the fact that the sensitivity of cancer cells to treatment depends on an unknown subset of a large number of biological features. Although feature selection is the key to interpretable results and identification of potential biomarkers, a comprehensive assessment of feature selection methods for drug sensitivity prediction has so far not been performed. We propose feature selection approaches driven by prior knowledge of drug targets, target pathways, and gene expression signatures. We asses these methodologies on Genomics of Drug Sensitivity in Cancer (GDSC) dataset, a panel of around 1000 cell lines screened against multiple anticancer compounds. We compare our results with a baseline model utilizing genome-wide gene expression features and common data-driven feature selection techniques. Together, 2484 unique models were evaluated, providing a comprehensive study of feature selection strategies for the drug response prediction problem. For 23 drugs, the models achieve better predictive performance when the features are selected according to prior knowledge of drug targets and pathways. The best correlation of observed and predicted response using the test set is achieved for Linifanib (r=0.75). Extending the drug-dependent features with gene expression signatures yields models that are most predictive of drug response for 60 drugs, with the best performing example of Dabrafenib. Examples of how pre-selection of features benefits the model interpretability are given for Dabrafenib, Linifanib and Quizartinib. Based on GDSC drug data, we find that feature selection driven by prior knowledge tends to yield better results for drugs targeting specific genes and pathways, while models with the genome-wide features perform better for drugs affecting general mechanisms such as metabolism and DNA replication. For a significant group of the compounds, even a very small number of features based on simple drug properties is often highly predictive of drug sensitivity, can explain the mechanism of drug action and be used as guidelines for their prescription. In general, choosing appropriate feature selection strategies has the potential to develop interpretable models that are indicative for therapy design.
Posted ContentDOI
27 Jan 2021-bioRxiv
TL;DR: In this paper, a neural network recommender system for kinase inhibitor sensitivity prediction called DEERS is proposed. But, the model does not consider genes and processes that were not included in the set of modeled features, and it requires a tailored approach to model interpretability.
Abstract: Computational models for drug sensitivity prediction have the potential to revolutionise personalized cancer medicine. Drug sensitivity assays, as well as profiling of cancer cell lines and drugs becomes increasingly available for training such models. Machine learning methods for drug sensitivity prediction must be optimized for: (i) leveraging the wealth of information about both cancer cell lines and drugs, (ii) predictive performance and (iii) interpretability. Multiple methods were proposed for predicting drug sensitivity from cancer cell line features, some in a multi-task fashion. So far, no such model leveraged drug inhibition profiles. Recent neural network-based recommender systems arise as models capable of predicting cancer cell line response to drugs from their biological features with high prediction accuracy. These models, however, require a tailored approach to model interpretability. In this work, we develop a neural network recommender system for kinase inhibitor sensitivity prediction called DEERS. The model utilizes molecular features of the cancer cell lines and kinase inhibition profiles of the drugs. DEERS incorporates two autoencoders to project cell line and drug features into 10-dimensional hidden representations and a feed-forward neural network to combine them into response prediction. We propose a novel model interpretability approach offering the widest possible assessment of the specific genes and biological processes that underlie the action of the drugs on the cell lines. The approach considers also such genes and processes that were not included in the set of modeled features. Our approach outperforms simpler matrix factorization models, achieving R=0.82 correlation between true and predicted response for the unseen cell lines. Using the interpretability analysis, we evaluate correlation of all human genes with each of the hidden cell line dimensions. Subsequently, we identify 67 biological processes associated with these dimensions. Combined with drug response data, these associations point at the processes that drive the cell line sensitivity to particular compounds. Detailed case studies are shown for PHA-793887, XMD14-99 and Dabrafenib. Our framework provides an expressive, multitask neural network model with a custom interpretability approach for inferring underlying biological factors and explaining cancer cell response to drugs.
Journal ArticleDOI
TL;DR: VADEERS offers a comprehensive model of drugs’ and cell lines’ properties and relationships between them, as well as a pre-computed clustering of the drugs by their inhibitory profiles.
Abstract: Recent emergence of high-throughput drug screening assays sparkled an intensive development of machine learning methods, including models for prediction of sensitivity of cancer cell lines to anti-cancer drugs, as well as methods for generation of potential drug candidates. However, a concept of generation of compounds with specific properties and simultaneous modeling of their efficacy against cancer cell lines has not been comprehensively explored. To address this need, we present VADEERS, a Variational Autoencoder-based Drug Efficacy Estimation Recommender System. The generation of compounds is performed by a novel variational autoencoder with a semi-supervised Gaussian Mixture Model (GMM) prior. The prior defines a clustering in the latent space, where the clusters are associated with specific drug properties. In addition, VADEERS is equipped with a cell line autoencoder and a sensitivity prediction network. The model combines data for SMILES string representations of anti-cancer drugs, their inhibition profiles against a panel of protein kinases, cell lines biological features and measurements of the sensitivity of the cell lines to the drugs. The evaluated variants of VADEERS achieve a high r=0.87 Pearson correlation between true and predicted drug sensitivity estimates. We train the GMM prior in such a way that the clusters in the latent space correspond to a pre-computed clustering of the drugs by their inhibitory profiles. We show that the learned latent representations and new generated data points accurately reflect the given clustering. In summary, VADEERS offers a comprehensive model of drugs and cell lines properties and relationships between them, as well as a guided generation of novel compounds.

Cited by
More filters
25 Apr 2017
TL;DR: This presentation is a case study taken from the travel and holiday industry and describes the effectiveness of various techniques as well as the performance of Python-based libraries such as Python Data Analysis Library (Pandas), and Scikit-learn (built on NumPy, SciPy and matplotlib).
Abstract: This presentation is a case study taken from the travel and holiday industry. Paxport/Multicom, based in UK and Sweden, have recently adopted a recommendation system for holiday accommodation bookings. Machine learning techniques such as Collaborative Filtering have been applied using Python (3.5.1), with Jupyter (4.0.6) as the main framework. Data scale and sparsity present significant challenges in the case study, and so the effectiveness of various techniques are described as well as the performance of Python-based libraries such as Python Data Analysis Library (Pandas), and Scikit-learn (built on NumPy, SciPy and matplotlib). The presentation is suitable for all levels of programmers.

1,338 citations

Journal ArticleDOI
TL;DR: In this paper, the authors focus on the use of comprehensive target activity profiles that enable a systematic repurposing process by extending the target profile of drugs to include potent off-targets with therapeutic potential for a new indication.
Abstract: Introduction: Drug repurposing provides a cost-effective strategy to re-use approved drugs for new medical indications. Several machine learning (ML) and artificial intelligence (AI) approaches have been developed for systematic identification of drug repurposing leads based on big data resources, hence further accelerating and de-risking the drug development process by computational means. Areas covered: The authors focus on supervised ML and AI methods that make use of publicly available databases and information resources. While most of the example applications are in the field of anticancer drug therapies, the methods and resources reviewed are widely applicable also to other indications including COVID-19 treatment. A particular emphasis is placed on the use of comprehensive target activity profiles that enable a systematic repurposing process by extending the target profile of drugs to include potent off-targets with therapeutic potential for a new indication. Expert opinion: The scarcity of clinical patient data and the current focus on genetic aberrations as primary drug targets may limit the performance of anticancer drug repurposing approaches that rely solely on genomics-based information. Functional testing of cancer patient cells exposed to a large number of targeted therapies and their combinations provides an additional source of repurposing information for tissue-aware AI approaches.

54 citations

Journal ArticleDOI
TL;DR: In this paper, the authors provide an overview of recent advancements in therapeutic response prediction using machine learning, which is the most widely used branch of artificial intelligence, and highlight the current challenges in therapy response prediction for clinical practice.
Abstract: Resistance to therapy remains a major cause of cancer treatment failures, resulting in many cancer-related deaths. Resistance can occur at any time during the treatment, even at the beginning. The current treatment plan is dependent mainly on cancer subtypes and the presence of genetic mutations. Evidently, the presence of a genetic mutation does not always predict the therapeutic response and can vary for different cancer subtypes. Therefore, there is an unmet need for predictive models to match a cancer patient with a specific drug or drug combination. Recent advancements in predictive models using artificial intelligence have shown great promise in preclinical settings. However, despite massive improvements in computational power, building clinically useable models remains challenging due to a lack of clinically meaningful pharmacogenomic data. In this review, we provide an overview of recent advancements in therapeutic response prediction using machine learning, which is the most widely used branch of artificial intelligence. We describe the basics of machine learning algorithms, illustrate their use, and highlight the current challenges in therapy response prediction for clinical practice.

29 citations

Journal ArticleDOI
TL;DR: This study proposed a 13-DNA repair gene-signature as a prognostic factor for LUAD patients, which can independently predict clinical outcomes by complement of the stage and characterized two LUAD subtypes with distinct clinical outcomes, somatic gene mutations, and drug sensitivity in cancer based on DNA repair genes.
Abstract: Objective: To conduct a robust prognostic gene expression signature and characterize molecular subtypes with distinct clinical characteristics for lung adenocarcinoma (LUAD). Methods: Based on DNA repair genes from the GSEA database, a prognostic signature was conducted in the TCGA-LUAD training set via univariate and multivariate cox regression analysis. Its prediction power was validated by overall survival analysis, relative operating characteristic (ROC) curves and stratification analysis in the GSE72094 verification set. Involved pathways in the high- and low-risk groups were analyzed by GSEA. A nomogram was built based on the signature and clinical features and its performance was assessed by calibration plots. LUAD samples were clustered via the ConsensusClusterPlus package. The differences in clinical outcomes, single nucleotide polymorphism (SNP) and sensitivity to chemotherapy drugs between molecular subtypes were analyzed. Results: A 13-DNA repair gene-signature was constructed for LUAD prognosis. Following validation, it can robustly and independently predict patients' clinical outcomes. The GSEA results exhibited the differences in pathways between high- and low- risk groups. A nomogram combining the signature and stage could accurately predict 1-, 3-, and 5-year survival probability. Two distinct molecular subtypes were characterized based on DNA repair genes. Patients in the Cluster 2 exhibited a worse prognosis and were more sensitive to common chemotherapy than those in the Cluster 1. Conclusion:This study proposed a 13-DNA repair gene-signature as a prognostic factor for LUAD patients, which can independently predict clinical outcomes by complement of the stage. Moreover, we characterized two LUAD subtypes with distinct clinical outcomes, somatic gene mutations, and drug sensitivity in cancer based on DNA repair genes.

10 citations

Journal ArticleDOI
TL;DR: PharmacoDB 2.0 as mentioned in this paper integrates multiple cancer pharmacogenomics datasets profiling approved and investigational drugs across cell lines from diverse tissue types, enabling users to efficiently navigate across datasets, view and compare drug dose-response data for a specific drug-cell line pair.
Abstract: Cancer pharmacogenomics studies provide valuable insights into disease progression and associations between genomic features and drug response. PharmacoDB integrates multiple cancer pharmacogenomics datasets profiling approved and investigational drugs across cell lines from diverse tissue types. The web-application enables users to efficiently navigate across datasets, view and compare drug dose-response data for a specific drug-cell line pair. In the new version of PharmacoDB (version 2.0, https://pharmacodb.ca/), we present (i) new datasets such as NCI-60, the Profiling Relative Inhibition Simultaneously in Mixtures (PRISM) dataset, as well as updated data from the Genomics of Drug Sensitivity in Cancer (GDSC) and the Genentech Cell Line Screening Initiative (gCSI); (ii) implementation of FAIR data pipelines using ORCESTRA and PharmacoDI; (iii) enhancements to drug-response analysis such as tissue distribution of dose-response metrics and biomarker analysis; and (iv) improved connectivity to drug and cell line databases in the community. The web interface has been rewritten using a modern technology stack to ensure scalability and standardization to accommodate growing pharmacogenomics datasets. PharmacoDB 2.0 is a valuable tool for mining pharmacogenomics datasets, comparing and assessing drug-response phenotypes of cancer models.

8 citations