scispace - formally typeset
Open AccessPosted ContentDOI

Velodrome: Out-of-Distribution Generalization from Labeled and Unlabeled Gene Expression Data for Drug Response Prediction

Reads0
Chats0
TLDR
The experimental results demonstrate that Velodrome outperforms state-of-the-art pharmacogenomics and transfer learning baselines on cell lines, patient-derived xenografts, and patients and therefore, may guide precision oncology more accurately.
Abstract
Data discrepancy between preclinical and clinical datasets poses a major challenge for accurate drug response prediction based on gene expression data. Different methods of transfer learning have been proposed to address this data discrepancy. These methods generally use cell lines as source domains and patients, patient-derived xenografts, or other cell lines as target domains. However, they assume that they have access to the target domain during training or fine-tuning and they can only take labeled source domains as input. The former is a strong assumption that is not satisfied during deployment of these models in the clinic. The latter means these methods rely on labeled source domains which are of limited size. To avoid these assumptions, we formulate drug response prediction as an out-of-distribution generalization problem which does not assume that the target domain is accessible during training. Moreover, to exploit unlabeled source domain data, which tends to be much more plentiful than labeled data, we adopt a semi-supervised approach. We propose Velodrome, a semi-supervised method of out-of-distribution generalization that takes labeled and unlabeled data from different resources as input and makes generalizable predictions. Velodrome achieves this goal by introducing an objective function that combines a supervised loss for accurate prediction, an alignment loss for generalization, and a consistency loss to incorporate unlabeled samples. Our experimental results demonstrate that Velodrome outperforms state-of-the-art pharmacogenomics and transfer learning baselines on cell lines, patient-derived xenografts, and patients. Finally, we showed that Velodrome models generalize to different tissue types that were well-represented, under-represented, or completely absent in the training data. Overall, our results suggest that Velodrome may guide precision oncology more accurately.

read more

Content maybe subject to copyright    Report

Velodrome: Out-of-Distribution Generalization from Labeled and
Unlabeled Gene Expression Data for Drug Response Prediction
Hossein Sharifi-Noghabi
1,2
, Parsa Alamzadeh Harjandi
1
, Olga Zolotareva
3,4
, Colin C. Collins
2,5
,
Martin Ester
1,2,*
1
School of Computing Science, Simon Fraser University, Burnaby, British Columbia, Canada.
2
Vancouver Prostate Center, Vancouver, British Columbia, Canada.
3
Chair of Experimental Bioinformatics, School of Life Sciences, Technical University of
Munich, Germany.
4
Chair of Computational Systems Biology, University of Hamburg, Hamburg, Germany.
5
Department of Urologic Sciences, University of British Columbia, Vancouver, BC, Canada.
*Corresponding author: ester@sfu.ca
Abstract
Data discrepancy between preclinical and clinical datasets poses a major challenge for
accurate drug response prediction based on gene expression data. Different methods of
transfer learning have been proposed to address this data discrepancy. These methods
generally use cell lines as source domains and patients, patient-derived xenografts, or other
cell lines as target domains. However, they assume that they have access to the target
domain during training or fine-tuning and they can only take labeled source domains as
input. The former is a strong assumption that is not satisfied during deployment of these
models in the clinic. The latter means these methods rely on labeled source domains which
are of limited size. To avoid these assumptions, we formulate drug response prediction as
an out-of-distribution generalization problem which does not assume that the target
domain is accessible during training. Moreover, to exploit unlabeled source domain data,
which tends to be much more plentiful than labeled data, we adopt a semi-supervised
approach. We propose Velodrome, a semi-supervised method of out-of-distribution
generalization that takes labeled and unlabeled data from different resources as input and
makes generalizable predictions. Velodrome achieves this goal by introducing an objective
function that combines a supervised loss for accurate prediction, an alignment loss for
generalization, and a consistency loss to incorporate unlabeled samples. Our experimental
results demonstrate that Velodrome outperforms state-of-the-art pharmacogenomics and
transfer learning baselines on cell lines, patient-derived xenografts, and patients and
therefore, may guide precision oncology more accurately.
1
.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted June 21, 2021. ; https://doi.org/10.1101/2021.05.25.445658doi: bioRxiv preprint

Introduction
The goal of drug response prediction based on the genomic profile of a patient (also known
as pharmacogenomics) -- a crucial task of precision oncology -- is to utilize the omics
features of a patient to predict response to a given drug (Garraway, Verweij, and Ballman
2013; Cronin et al. 2018; Marquart, Chen, and Prasad 2018; Pal et al. 2019). Unfortunately,
patient datasets with drug response are often small or not publicly available which
motivated the creation of large-scale preclinical resources such as patient-derived
xenografts (PDX) (Gao et al. 2015) or cancer cell lines (Garnett et al. 2012; Barretina et al.
2012; Basu et al. 2013; Seashore-Ludlow et al. 2015; Klijn et al. 2015; Iorio et al. 2016;
Haverty et al. 2016) as proxies for patients.
Although preclinical datasets are viable proxies for patients, they differ in important ways
from patients due to basic biological differences such as the lack of tumor
microenvironment/the immune system (Mourragui et al. 2019; Sharifi-Noghabi et al. 2020)
-- even two preclinical datasets may have discrepancies with each other (Haibe-Kains et al.
2013; Safikhani et al. 2016; Haverty et al. 2016; Mpindi et al. 2016; Geeleher et al. 2016).
Transfer learning has emerged as a machine learning paradigm for such scenarios (Pan and
Yang 2010; Neyshabur, Sedghi, and Zhang 2020), where we have access to different datasets
from multiple resources (known as source domains) and want to make predictions for a
dataset of interest (known as target domain) and it has been employed in different
problems (Taroni et al. 2019; Raghu et al. 2019; Holmberg et al. 2020; Hu et al. 2020).
Various methods of transfer learning have been proposed in the context of drug response
prediction. These methods either address these discrepancies implicitly (Sharifi-Noghabi et
al. 2019; Snow et al. 2020; Kuenzi et al. 2020), or explicitly which means they assume that
the model has access to the desired labeled or unlabeled target domain during training
(Sharifi-Noghabi et al. 2020; Mourragui et al. 2019, 2020; Ma et al. 2021; Zhu et al. 2020;
Warren et al. 2020; Peres da Silva, Suphavilai, and Nagarajan 2021).
However, in the real-world we do not have access to the target domain(s) during training
the model on the source domain, e.g., we do not know future patients that may walk into a
clinic. Nevertheless, the trained model should generalize to the target domain and be able
to make predictions for samples encountered during the deployment time. Since generating
large high-quality labeled preclinical datasets is an expensive and time-consuming process
and we do not know response to a given drug in the target domain (e.g., future patients),
there is a need for a computational method that takes not only labeled but also unlabeled
source domain data as input and learns a representation that generalizes to a future target
domain. This problem is known as out-of-distribution generalization or domain
generalization, where the target domain is not accessible during training (Gulrajani and
2
.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted June 21, 2021. ; https://doi.org/10.1101/2021.05.25.445658doi: bioRxiv preprint

Lopez-Paz 2020; J. Wang et al. 2021; Zhou et al. 2021). Out-of-distribution generalization is
particularly important for biomedical applications (Zhang et al. 2021).
There are two main approaches to out-of-distribution generalization: 1) generalizing via
learning domain-invariant features (J. Wang et al. 2021), and 2) generalizing via learning
hypothesis-invariant features (Zhao et al. 2020; Z. Wang, Loog, and van Gemert 2021). In
the first approach, the goal is to map the input domains to a shared feature space in which
the features of all domains are aligned, i.e. look similar to each other. However, forcing
different domains to have very similar features is not always feasible because different
domains may have unique characteristics, and completely aligning them ignores these
unique characteristics. The second approach does not align the features but rather the
predictions across domains. The idea is that if the extracted features of input domains are
similar enough for an accurate predictor to make similar predictions, forcing the features to
be more similar is not required anymore. We note that there is no existing method for
out-of-distribution generalization, for either of the two approaches, that can exploit both
labeled and unlabeled source domains.
In this paper, we propose Velodrome, a deep neural network method that combines the two
above approaches and exploits both labeled and unlabeled samples. Velodrome takes gene
expression from cell line (labeled) and patient (unlabeled) datasets as input domains and
predicts the drug response (measured as area above dose-response curve, AAC) via a
shared (between cell lines and patients) feature extractor and domain-specific predictors.
The feature extractor and the predictors are trained using a novel loss function with three
components: 1) a standard supervised loss to make the features predictive of drug
response, 2) a consistency loss to exploit unlabeled samples in learning the feature
representation, and 3) an alignment loss to make the features generalizable. We designed
the loss function to balance between learning domain-invariant and hypothesis-invariant
features. To the best of our knowledge, Velodrome is the first method for semi-supervised
out-of-distribution generalization from labeled cell lines and unlabeled patients to different
preclinical and clinical datasets.
We evaluated the performance of Velodrome and state-of-the-art methods of supervised
out-of-distribution generalization, domain adaptation, and semi-supervised learning in
terms of a diverse range of metrics including Pearson and Spearman correlation, the Area
Under the Receiver Operating Characteristic curve (AUROC), and the Area Under the
Precision-Recall curve (AUPR). We observed that Velodrome achieved substantially better
performance across different clinical and preclinical pharmacogenomics datasets for
multiple drugs, demonstrating the potential of semi-supervised out-of-distribution
generalization for drug response prediction, a crucial task of precision oncology. Moreover,
we showed that the responses predicted by Velodrome for TCGA patients (unlabeled, i.e.
without drug response) with prostate and kidney cancers had statistically significant
associations with the expression values of the target genes of the studied drugs. This shows
3
.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted June 21, 2021. ; https://doi.org/10.1101/2021.05.25.445658doi: bioRxiv preprint

that Velodrome captures biological aspects of drug response. Finally, although Velodrome
was trained only on solid tissue types, we showed that it made accurate predictions for cell
lines originating from non-solid tissue types, showcasing the out-of-distribution
capabilities of the Velodrome model.
Results
Datasets. We employed the following resources throughout this paper:
1. Patients without drug response: more than 3,000 samples obtained from TCGA
(Cancer Genome Atlas Research Network et al. 2013) breast (TCGA-BRCA), lung
(TCGA-LUAD), pancreatic (TCGA-PAAD), kidney (TCGA-KIRC), and prostate
(TCGA-PRAD) cohorts with RNA-seq data.
2. Cell lines with drug response: The Cancer Therapeutics Response Portal (CTRPv2)
(Basu et al. 2013; Seashore-Ludlow et al. 2015), The Genomics of Drug Sensitivity in
Cancer (GDSCv2) (Garnett et al. 2012; Iorio et al. 2016), and The Genentech Cell Line
Screening Initiative (gCSI) (Haverty et al. 2016; Klijn et al. 2015) pan-cancer
datasets with a total of more than 2000 samples with RNA-seq data and AAC as the
measure of the drug response across 11 drugs (in common for the three datasets).
We focused on the following drugs for this paper: Erlotinib, Docetaxel, Paclitaxel,
and Gemcitabine.
3. PDX samples with drug response: PDX Encyclopedia (PDXE) dataset (Gao et al.
2015) is a collection of more than 300 PDX samples with RNA-seq data screened
with 34 drugs. We use the reported measure of response in RECIST (Schwartz et al.
2016) for Gemcitabine, Erlotinib, and Paclitaxel obtained from supplementary
material of (Gao et al. 2015).
4. Patients with drug response: 2 cancer-specific datasets with microarray data and
RECIST as the measure of drug response for Docetaxel (Hatzis et al. 2011), Paclitaxel
(Hatzis et al. 2011), and Erlotinib (Byers et al. 2013). Plus, a pan-cancer dataset
obtained from TCGA patients treated with Gemcitabine (Ding, Zu, and Gu 2016). We
use clinical annotations of the drug response for some patients which were obtained
from supplementary material of (Ding, Zu, and Gu 2016).
Table S1 presents characteristics of these datasets and indicates whether they were used as
source domain for training or target domain for test.
Velodrome Overview. The proposed Velodrome method takes gene expression and AAC of
cell line datasets (CTRPv2 and GDSCv2) as well as gene expression of patients without drug
response (TCGA dataset) and learns a predictive and generalizable representation. To
achieve this, Velodrome employs a shared feature extractor, which takes the gene
expression of CTRPv2 and GDSCv2 samples and maps them to a shared feature space, and
4
.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted June 21, 2021. ; https://doi.org/10.1101/2021.05.25.445658doi: bioRxiv preprint

domain-specific predictors (e.g. one for CTRPv2 and one for GDSCv2), which take the
feature representation of the gene expression and predict the drug response.
The parameters are optimized using a novel objective function consisting of three loss
components. 1) a standard supervised loss to make the representation predictive of drug
response, 2) a consistency loss to exploit unlabeled samples in learning the representation,
and 3) an alignment loss to make the representation generalizable.
The idea of the standard supervised loss is to make the representation predictive of the
drug response via a mean squared loss.
To incorporate unlabeled patient samples, we add a consistency loss. The idea is to first
extract features from patient samples using the feature extractor and then assign
pseudo-labels to them by utilizing the predictors associated with CTRPv2 and GDSCv2. The
consistency loss takes the pseudo-labels (i.e., predictions) from the predictors and
regularizes the parameters of the feature extractor and the predictors by the distance
𝑙
2
between the predictions of CTRPv2 predictor and those of the GDSCv2 predictor.
Finally, to make the feature representation generalizable, we add an alignment loss that
regularizes the parameters of the feature extractor. This alignment loss takes the extracted
features of any two input domains (eg., CTRPv2 and TCGA or CTRPv2 and GDSCv2) and
minimizes the difference between the covariance matrices of those domains.
Figure 1 illustrates the schematic overview of the Velodrome method.
Evaluation. Drug response prediction using multiple labeled and unlabeled domains can be
viewed in three approaches: 1) under the assumption that there is no data discrepancy, it
can be viewed as a semi-supervised learning problem, 2) under the assumption that
unlabeled patient samples are proxies to future patients, it can be viewed as an
unsupervised domain adaptation problem, and 3) under the assumption that a
generalizable representation can be obtained via only labeled domains, it can be viewed as
a supervised domain generalization problem. It is important to note that the main
contribution of the Velodrome method is that it is the first semi-supervised domain
generalization method for drug response prediction.
To evaluate the performance of Velodrome, we compared it against the state-of-the-art
methods of each approach. For the first approach, we compared Velodrome to Mean
Teacher (Tarvainen and Valpola 2017) which is the state-of-the-art deep neural network for
semi-supervised learning (Yang and Xu 2020). For The second approach, we compared
Velodrome to PRECISE as a non-deep learning method based on subspace alignment and
(Saito et al. 2018) as a deep learning method based on adversarial domain adaptation via
disagreement between predictors. Finally, for the third approach, we compared Velodrome
to Ridge-ERM (Ridge Regression) as a non-deep learning baseline and DeepAll-ERM as a
deep learning baseline. Both of them are categorized as methods of Empirical Risk
Minimization (ERM). ERM methods achieve state-of-the-art performance for
5
.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted June 21, 2021. ; https://doi.org/10.1101/2021.05.25.445658doi: bioRxiv preprint

Citations
More filters
Journal ArticleDOI

A context-aware deconfounding autoencoder for robust prediction of personalized clinical drug response from cell-line compound screening

TL;DR: In this article , a context-aware deconfounding autoencoder (CODE-AE) was proposed to extract intrinsic biological signals masked by context-specific patterns and confounding factors.
Journal ArticleDOI

Out-of-distribution generalization from labelled and unlabelled gene expression data for drug response prediction

TL;DR: In this paper, a semi-supervised method of out-of-distribution generalization is proposed to predict drug response from limited data that can generalize successfully to different tissue types.
Journal ArticleDOI

Deep learning methods for drug response prediction in cancer: Predominant and emerging trends

TL;DR: A survey of deep learning-based approaches for predicting cancer response to drug treatments can be found in this article , where the authors conduct an extensive search and analysis on deep learning models that predict the response to single drug treatments.

Multi-Dimensional Ability Diagnosis for Machine Learning Algorithms

TL;DR: Camilla as discussed by the authors proposes a task-agnostic evaluation framework, where a multi-dimensional diagnostic metric Ability is defined for collaboratively measuring the multifaceted strength of each machine learning algorithm.
References
More filters
Journal ArticleDOI

A Survey on Transfer Learning

TL;DR: The relationship between transfer learning and other related machine learning techniques such as domain adaptation, multitask learning and sample selection bias, as well as covariate shift are discussed.
Journal ArticleDOI

The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity

TL;DR: The results indicate that large, annotated cell-line collections may help to enable preclinical stratification schemata for anticancer agents and the generation of genetic predictions of drug response in the preclinical setting and their incorporation into cancer clinical trial design could speed the emergence of ‘personalized’ therapeutic regimens.
Journal ArticleDOI

The cancer genome atlas pan-cancer analysis project

John N. Weinstein, +379 more
- 01 Oct 2013 - 
TL;DR: The Pan-Cancer initiative compares the first 12 tumor types profiled by TCGA with a major opportunity to develop an integrated picture of commonalities, differences and emergent themes across tumor lineages.
Journal Article

The Cancer Genome Atlas Pan-Cancer analysis project

Kyle Chang, +337 more
- 01 Sep 2013 - 
TL;DR: The Cancer Genome Atlas (TCGA) Research Network has profiled and analyzed large numbers of human tumors to discover molecular aberrations at the DNA, RNA, protein and epigenetic levels as mentioned in this paper.
Proceedings ArticleDOI

Adversarial Discriminative Domain Adaptation

TL;DR: Adversarial Discriminative Domain Adaptation (ADDA) as mentioned in this paper combines discriminative modeling, untied weight sharing, and a generative adversarial network (GAN) loss.
Related Papers (5)
Frequently Asked Questions (13)
Q1. What have the authors contributed in "Velodrome: out-of-distribution generalization from labeled and unlabeled gene expression data for drug response prediction" ?

The authors propose Velodrome, a semi-supervised method of out-of-distribution generalization that takes labeled and unlabeled data from different resources as input and makes generalizable predictions. 1. CC-BY-NC 4. 0 International license available under a ( which was not certified by peer review ) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. 

Especially promising are proteomics data (Ali et al. 2018) and germline variants (Menden et al. 2018), due to their predictive power. 

Velodrome, PRECISE, and Mean Teacher were designed to take both labeled and unlabeled samples and, therefore, were expected to achieve better performance on patients than DeepAll-ERM and Ridge-ERM. 

To the best of their knowledge, Velodrome is the first method for semi-supervised out-of-distribution generalization from labeled cell lines and unlabeled patients to different preclinical and clinical datasets. 

Transfer learning has emerged as a machine learning paradigm for such scenarios (Pan and Yang 2010; Neyshabur, Sedghi, and Zhang 2020), where the authors have access to different datasets from multiple resources (known as source domains) and want to make predictions for a dataset of interest (known as target domain) and it has been employed in different problems (Taroni et al. 

A recent method adjusts for this output space discrepancy and improves the prediction performance (Sharifi-Noghabi et al. 2020), but this method requires access to the target domain during training which violates the assumption of out-of-distribution generalization. 

For that, the authors tested the trained Velodrome models for the studied drugs on samples originated from non-solid tissues in the gCSI cell line dataset and evaluated the performance in terms of Pearson correlation between the predictions and the actual AAC values and reported two-tailed p-value as well. 

Prostate cancer progression and lethal outcome have been associated with metabolic signaling pathways and CMPK1 (it mediates the mechanism of action for Gemcitabine) was shown to be highly expressed in prostate cancer patients (Kelly et al. 2016). 

Although the authors observed that the average performance (over the studied drugs) of all methods decreased, Velodrome still achieved the best performance on patients in terms of both AUROC and AUPR, and also the best performance in terms of both Pearson and Spearman correlation on cell lines. 

These(𝑃 > 0. 05) (𝑃 = 4 × 10−3) results suggest that Velodrome is as accurate (and even more accurate in the case of Erlotinib) as a non-solid predictor on these tissues even though it did not utilize them during training. 

Their results on patients demonstrate that on average std (over all drugs for 10 independent runs), the± complete version of Velodrome outperforms its variants which indicates the added value of both alignment and consistency losses (Figure 3-B). 

BCL2 can also act as an oncoprotein in kidney cancer (Paraf et al. 1995) and therapeutics roles (Adams and Cory 2007; Delbridge et al. 2016). 

Similar to the Velodrome results, this predictor also achieved significant correlations of 0.34 and 0.39(𝑃 = 10−2) for Erlotinib and Gemcitabine and negative correlations of -0.11(𝑃 = 5 × 10−3) and -0.4 for Docetaxel and Paclitaxel, respectively.