What is the promising aspect of the study?

Especially promising are proteomics data (Ali et al. 2018) and germline variants (Menden et al. 2018), due to their predictive power.

What methods were designed to take both labeled and unlabeled samples?

Velodrome, PRECISE, and Mean Teacher were designed to take both labeled and unlabeled samples and, therefore, were expected to achieve better performance on patients than DeepAll-ERM and Ridge-ERM.

What is the recent method to adjust for the output space discrepancy?

A recent method adjusts for this output space discrepancy and improves the prediction performance (Sharifi-Noghabi et al. 2020), but this method requires access to the target domain during training which violates the assumption of out-of-distribution generalization.

What was the significance of Velodrome on non-solid tissues?

For that, the authors tested the trained Velodrome models for the studied drugs on samples originated from non-solid tissues in the gCSI cell line dataset and evaluated the performance in terms of Pearson correlation between the predictions and the actual AAC values and reported two-tailed p-value as well.

What is the relationship between CMPK1 and EGFR?

Prostate cancer progression and lethal outcome have been associated with metabolic signaling pathways and CMPK1 (it mediates the mechanism of action for Gemcitabine) was shown to be highly expressed in prostate cancer patients (Kelly et al. 2016).

What is the average performance of Velodrome over the studied drugs?

Although the authors observed that the average performance (over the studied drugs) of all methods decreased, Velodrome still achieved the best performance on patients in terms of both AUROC and AUPR, and also the best performance in terms of both Pearson and Spearman correlation on cell lines.

What is the significance of Velodrome on non-solid tissues?

These(𝑃 > 0. 05) (𝑃 = 4 × 10−3) results suggest that Velodrome is as accurate (and even more accurate in the case of Erlotinib) as a non-solid predictor on these tissues even though it did not utilize them during training.

What is the version of Velodrome?

Their results on patients demonstrate that on average std (over all drugs for 10 independent runs), the± complete version of Velodrome outperforms its variants which indicates the added value of both alignment and consistency losses (Figure 3-B).

What is the role of BCL2 in kidney cancer?

BCL2 can also act as an oncoprotein in kidney cancer (Paraf et al. 1995) and therapeutics roles (Adams and Cory 2007; Delbridge et al. 2016).

What was the correlation between Velodrome and the predicted drugs?

Similar to the Velodrome results, this predictor also achieved significant correlations of 0.34 and 0.39(𝑃 = 10−2) for Erlotinib and Gemcitabine and negative correlations of -0.11(𝑃 = 5 × 10−3) and -0.4 for Docetaxel and Paclitaxel, respectively.

(Open Access) Velodrome: Out-of-Distribution Generalization from Labeled and Unlabeled Gene Expression Data for Drug Response Prediction (2021) | Hossein Sharifi-Noghabi

Q: What have the authors contributed in "Velodrome: out-of-distribution generalization from labeled and unlabeled gene expression data for drug response prediction" ?

The authors propose Velodrome, a semi-supervised method of out-of-distribution generalization that takes labeled and unlabeled data from different resources as input and makes generalizable predictions. 1. CC-BY-NC 4. 0 International license available under a ( which was not certified by peer review ) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.

Q: What is the description of Velodrome?

To the best of their knowledge, Velodrome is the first method for semi-supervised out-of-distribution generalization from labeled cell lines and unlabeled patients to different preclinical and clinical datasets.

Velodrome: Out-of-Distribution Generalization from Labeled and

Unlabeled Gene Expression Data for Drug Response Prediction

Hossein Sharifi-Noghabi

1,2

, Parsa Alamzadeh Harjandi

, Olga Zolotareva

3,4

, Colin C. Collins

2,5

Martin Ester

1,2,*

School of Computing Science, Simon Fraser University, Burnaby, British Columbia, Canada.

Vancouver Prostate Center, Vancouver, British Columbia, Canada.

Chair of Experimental Bioinformatics, School of Life Sciences, Technical University of

Munich, Germany.

Chair of Computational Systems Biology, University of Hamburg, Hamburg, Germany.

Department of Urologic Sciences, University of British Columbia, Vancouver, BC, Canada.

*Corresponding author: ester@sfu.ca

Abstract

Data discrepancy between preclinical and clinical datasets poses a major challenge for

accurate drug response prediction based on gene expression data. Different methods of

transfer learning have been proposed to address this data discrepancy. These methods

generally use cell lines as source domains and patients, patient-derived xenografts, or other

cell lines as target domains. However, they assume that they have access to the target

domain during training or fine-tuning and they can only take labeled source domains as

input. The former is a strong assumption that is not satisfied during deployment of these

models in the clinic. The latter means these methods rely on labeled source domains which

are of limited size. To avoid these assumptions, we formulate drug response prediction as

an out-of-distribution generalization problem which does not assume that the target

domain is accessible during training. Moreover, to exploit unlabeled source domain data,

which tends to be much more plentiful than labeled data, we adopt a semi-supervised

approach. We propose Velodrome, a semi-supervised method of out-of-distribution

generalization that takes labeled and unlabeled data from different resources as input and

makes generalizable predictions. Velodrome achieves this goal by introducing an objective

function that combines a supervised loss for accurate prediction, an alignment loss for

generalization, and a consistency loss to incorporate unlabeled samples. Our experimental

results demonstrate that Velodrome outperforms state-of-the-art pharmacogenomics and

transfer learning baselines on cell lines, patient-derived xenografts, and patients and

therefore, may guide precision oncology more accurately.

.CC-BY-NC 4.0 International licenseavailable under a

(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted June 21, 2021. ; https://doi.org/10.1101/2021.05.25.445658doi: bioRxiv preprint

Introduction

The goal of drug response prediction based on the genomic profile of a patient (also known

as pharmacogenomics) -- a crucial task of precision oncology -- is to utilize the omics

features of a patient to predict response to a given drug (Garraway, Verweij, and Ballman

2013; Cronin et al. 2018; Marquart, Chen, and Prasad 2018; Pal et al. 2019). Unfortunately,

patient datasets with drug response are often small or not publicly available which

motivated the creation of large-scale preclinical resources such as patient-derived

xenografts (PDX) (Gao et al. 2015) or cancer cell lines (Garnett et al. 2012; Barretina et al.

2012; Basu et al. 2013; Seashore-Ludlow et al. 2015; Klijn et al. 2015; Iorio et al. 2016;

Haverty et al. 2016) as proxies for patients.

Although preclinical datasets are viable proxies for patients, they differ in important ways

from patients due to basic biological differences such as the lack of tumor

microenvironment/the immune system (Mourragui et al. 2019; Sharifi-Noghabi et al. 2020)

-- even two preclinical datasets may have discrepancies with each other (Haibe-Kains et al.

2013; Safikhani et al. 2016; Haverty et al. 2016; Mpindi et al. 2016; Geeleher et al. 2016).

Transfer learning has emerged as a machine learning paradigm for such scenarios (Pan and

Yang 2010; Neyshabur, Sedghi, and Zhang 2020), where we have access to different datasets

from multiple resources (known as source domains) and want to make predictions for a

dataset of interest (known as target domain) and it has been employed in different

problems (Taroni et al. 2019; Raghu et al. 2019; Holmberg et al. 2020; Hu et al. 2020).

Various methods of transfer learning have been proposed in the context of drug response

prediction. These methods either address these discrepancies implicitly (Sharifi-Noghabi et

al. 2019; Snow et al. 2020; Kuenzi et al. 2020), or explicitly which means they assume that

the model has access to the desired labeled or unlabeled target domain during training

(Sharifi-Noghabi et al. 2020; Mourragui et al. 2019, 2020; Ma et al. 2021; Zhu et al. 2020;

Warren et al. 2020; Peres da Silva, Suphavilai, and Nagarajan 2021).

However, in the real-world we do not have access to the target domain(s) during training

the model on the source domain, e.g., we do not know future patients that may walk into a

clinic. Nevertheless, the trained model should generalize to the target domain and be able

to make predictions for samples encountered during the deployment time. Since generating

large high-quality labeled preclinical datasets is an expensive and time-consuming process

and we do not know response to a given drug in the target domain (e.g., future patients),

there is a need for a computational method that takes not only labeled but also unlabeled

source domain data as input and learns a representation that generalizes to a future target

domain. This problem is known as out-of-distribution generalization or domain

generalization, where the target domain is not accessible during training (Gulrajani and

.CC-BY-NC 4.0 International licenseavailable under a

(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted June 21, 2021. ; https://doi.org/10.1101/2021.05.25.445658doi: bioRxiv preprint

Lopez-Paz 2020; J. Wang et al. 2021; Zhou et al. 2021). Out-of-distribution generalization is

particularly important for biomedical applications (Zhang et al. 2021).

There are two main approaches to out-of-distribution generalization: 1) generalizing via

learning domain-invariant features (J. Wang et al. 2021), and 2) generalizing via learning

hypothesis-invariant features (Zhao et al. 2020; Z. Wang, Loog, and van Gemert 2021). In

the first approach, the goal is to map the input domains to a shared feature space in which

the features of all domains are aligned, i.e. look similar to each other. However, forcing

different domains to have very similar features is not always feasible because different

domains may have unique characteristics, and completely aligning them ignores these

unique characteristics. The second approach does not align the features but rather the

predictions across domains. The idea is that if the extracted features of input domains are

similar enough for an accurate predictor to make similar predictions, forcing the features to

be more similar is not required anymore. We note that there is no existing method for

out-of-distribution generalization, for either of the two approaches, that can exploit both

labeled and unlabeled source domains.

In this paper, we propose Velodrome, a deep neural network method that combines the two

above approaches and exploits both labeled and unlabeled samples. Velodrome takes gene

expression from cell line (labeled) and patient (unlabeled) datasets as input domains and

predicts the drug response (measured as area above dose-response curve, AAC) via a

shared (between cell lines and patients) feature extractor and domain-specific predictors.

The feature extractor and the predictors are trained using a novel loss function with three

components: 1) a standard supervised loss to make the features predictive of drug

response, 2) a consistency loss to exploit unlabeled samples in learning the feature

representation, and 3) an alignment loss to make the features generalizable. We designed

the loss function to balance between learning domain-invariant and hypothesis-invariant

features. To the best of our knowledge, Velodrome is the first method for semi-supervised

out-of-distribution generalization from labeled cell lines and unlabeled patients to different

preclinical and clinical datasets.

We evaluated the performance of Velodrome and state-of-the-art methods of supervised

out-of-distribution generalization, domain adaptation, and semi-supervised learning in

terms of a diverse range of metrics including Pearson and Spearman correlation, the Area

Under the Receiver Operating Characteristic curve (AUROC), and the Area Under the

Precision-Recall curve (AUPR). We observed that Velodrome achieved substantially better

performance across different clinical and preclinical pharmacogenomics datasets for

multiple drugs, demonstrating the potential of semi-supervised out-of-distribution

generalization for drug response prediction, a crucial task of precision oncology. Moreover,

we showed that the responses predicted by Velodrome for TCGA patients (unlabeled, i.e.

without drug response) with prostate and kidney cancers had statistically significant

associations with the expression values of the target genes of the studied drugs. This shows

.CC-BY-NC 4.0 International licenseavailable under a

(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted June 21, 2021. ; https://doi.org/10.1101/2021.05.25.445658doi: bioRxiv preprint

that Velodrome captures biological aspects of drug response. Finally, although Velodrome

was trained only on solid tissue types, we showed that it made accurate predictions for cell

lines originating from non-solid tissue types, showcasing the out-of-distribution

capabilities of the Velodrome model.

Results

Datasets. We employed the following resources throughout this paper:

1. Patients without drug response: more than 3,000 samples obtained from TCGA

(Cancer Genome Atlas Research Network et al. 2013) breast (TCGA-BRCA), lung

(TCGA-LUAD), pancreatic (TCGA-PAAD), kidney (TCGA-KIRC), and prostate

(TCGA-PRAD) cohorts with RNA-seq data.

2. Cell lines with drug response: The Cancer Therapeutics Response Portal (CTRPv2)

(Basu et al. 2013; Seashore-Ludlow et al. 2015), The Genomics of Drug Sensitivity in

Cancer (GDSCv2) (Garnett et al. 2012; Iorio et al. 2016), and The Genentech Cell Line

Screening Initiative (gCSI) (Haverty et al. 2016; Klijn et al. 2015) pan-cancer

datasets with a total of more than 2000 samples with RNA-seq data and AAC as the

measure of the drug response across 11 drugs (in common for the three datasets).

We focused on the following drugs for this paper: Erlotinib, Docetaxel, Paclitaxel,

and Gemcitabine.

3. PDX samples with drug response: PDX Encyclopedia (PDXE) dataset (Gao et al.

2015) is a collection of more than 300 PDX samples with RNA-seq data screened

with 34 drugs. We use the reported measure of response in RECIST (Schwartz et al.

2016) for Gemcitabine, Erlotinib, and Paclitaxel obtained from supplementary

material of (Gao et al. 2015).

4. Patients with drug response: 2 cancer-specific datasets with microarray data and

RECIST as the measure of drug response for Docetaxel (Hatzis et al. 2011), Paclitaxel

(Hatzis et al. 2011), and Erlotinib (Byers et al. 2013). Plus, a pan-cancer dataset

obtained from TCGA patients treated with Gemcitabine (Ding, Zu, and Gu 2016). We

use clinical annotations of the drug response for some patients which were obtained

from supplementary material of (Ding, Zu, and Gu 2016).

Table S1 presents characteristics of these datasets and indicates whether they were used as

source domain for training or target domain for test.

Velodrome Overview. The proposed Velodrome method takes gene expression and AAC of

cell line datasets (CTRPv2 and GDSCv2) as well as gene expression of patients without drug

response (TCGA dataset) and learns a predictive and generalizable representation. To

achieve this, Velodrome employs a shared feature extractor, which takes the gene

expression of CTRPv2 and GDSCv2 samples and maps them to a shared feature space, and

.CC-BY-NC 4.0 International licenseavailable under a

(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted June 21, 2021. ; https://doi.org/10.1101/2021.05.25.445658doi: bioRxiv preprint

domain-specific predictors (e.g. one for CTRPv2 and one for GDSCv2), which take the

feature representation of the gene expression and predict the drug response.

The parameters are optimized using a novel objective function consisting of three loss

components. 1) a standard supervised loss to make the representation predictive of drug

response, 2) a consistency loss to exploit unlabeled samples in learning the representation,

and 3) an alignment loss to make the representation generalizable.

The idea of the standard supervised loss is to make the representation predictive of the

drug response via a mean squared loss.

To incorporate unlabeled patient samples, we add a consistency loss. The idea is to first

extract features from patient samples using the feature extractor and then assign

pseudo-labels to them by utilizing the predictors associated with CTRPv2 and GDSCv2. The

consistency loss takes the pseudo-labels (i.e., predictions) from the predictors and

regularizes the parameters of the feature extractor and the predictors by the distance

𝑙

between the predictions of CTRPv2 predictor and those of the GDSCv2 predictor.

Finally, to make the feature representation generalizable, we add an alignment loss that

regularizes the parameters of the feature extractor. This alignment loss takes the extracted

features of any two input domains (eg., CTRPv2 and TCGA or CTRPv2 and GDSCv2) and

minimizes the difference between the covariance matrices of those domains.

Figure 1 illustrates the schematic overview of the Velodrome method.

Evaluation. Drug response prediction using multiple labeled and unlabeled domains can be

viewed in three approaches: 1) under the assumption that there is no data discrepancy, it

can be viewed as a semi-supervised learning problem, 2) under the assumption that

unlabeled patient samples are proxies to future patients, it can be viewed as an

unsupervised domain adaptation problem, and 3) under the assumption that a

generalizable representation can be obtained via only labeled domains, it can be viewed as

a supervised domain generalization problem. It is important to note that the main

contribution of the Velodrome method is that it is the first semi-supervised domain

generalization method for drug response prediction.

To evaluate the performance of Velodrome, we compared it against the state-of-the-art

methods of each approach. For the first approach, we compared Velodrome to Mean

Teacher (Tarvainen and Valpola 2017) which is the state-of-the-art deep neural network for

semi-supervised learning (Yang and Xu 2020). For The second approach, we compared

Velodrome to PRECISE as a non-deep learning method based on subspace alignment and

(Saito et al. 2018) as a deep learning method based on adversarial domain adaptation via

disagreement between predictors. Finally, for the third approach, we compared Velodrome

to Ridge-ERM (Ridge Regression) as a non-deep learning baseline and DeepAll-ERM as a

deep learning baseline. Both of them are categorized as methods of Empirical Risk

Minimization (ERM). ERM methods achieve state-of-the-art performance for

.CC-BY-NC 4.0 International licenseavailable under a

(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted June 21, 2021. ; https://doi.org/10.1101/2021.05.25.445658doi: bioRxiv preprint

Velodrome: Out-of-Distribution Generalization from Labeled and Unlabeled Gene Expression Data for Drug Response Prediction

Figures

Citations

A context-aware deconfounding autoencoder for robust prediction of personalized clinical drug response from cell-line compound screening

Out-of-distribution generalization from labelled and unlabelled gene expression data for drug response prediction

Deep learning methods for drug response prediction in cancer: Predominant and emerging trends

Multi-Dimensional Ability Diagnosis for Machine Learning Algorithms

References

A Survey on Transfer Learning

The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity

The cancer genome atlas pan-cancer analysis project

The Cancer Genome Atlas Pan-Cancer analysis project

Adversarial Discriminative Domain Adaptation

Related Papers (5)

Positive Unlabeled Link Prediction via Transfer Learning for Gene Network Reconstruction

Using unlabeled data to improve text classification

Semi-supervised Learning with Weakly-Related Unlabeled Data : Towards Better Text Categorization

Modeling multiple annotator expertise in the semi-supervised learning scenario

An uncertainty and density based active semi-supervised learning scheme for positive unlabeled multivariate time series classification

Frequently Asked Questions (13)

Q1. What have the authors contributed in "Velodrome: out-of-distribution generalization from labeled and unlabeled gene expression data for drug response prediction" ?

Q2. What is the promising aspect of the study?

Q3. What methods were designed to take both labeled and unlabeled samples?

Q4. What is the description of Velodrome?

Q5. What is the purpose of transfer learning?

Q6. What is the recent method to adjust for the output space discrepancy?

Q7. What was the significance of Velodrome on non-solid tissues?

Q8. What is the relationship between CMPK1 and EGFR?

Q9. What is the average performance of Velodrome over the studied drugs?

Q10. What is the significance of Velodrome on non-solid tissues?

Q11. What is the version of Velodrome?

Q12. What is the role of BCL2 in kidney cancer?

Q13. What was the correlation between Velodrome and the predicted drugs?