scispace - formally typeset
Open AccessPosted ContentDOI

A multitask transfer learning framework for novel virus-human protein interactions

Reads0
Chats0
TLDR
This work exploits powerful statistical protein representations derived from a corpus of around 24 Million protein sequences in a multi task framework to overcome the scarcity of training data as well input information of the viral proteins.
Abstract
AO_SCPLOWBSTRACTC_SCPLOWUnderstanding the interaction patterns between a particular virus and human proteins plays a crucial role in unveiling the underlying mechanism of viral infection. This could further help in developing treatments of viral diseases. The main issues in tackling it as a machine learning problem is the scarcity of training data as well input information of the viral proteins. We overcome these limitations by exploiting powerful statistical protein representations derived from a corpus of around 24 Million protein sequences in a multi task framework. Our experiments on 7 varied benchmark datasets support the superiority of our approach.

read more

Content maybe subject to copyright    Report

To be presented at the ICLR Workshop on AI for Public Health 2021
A MULTITASK TRANSFER LEARNING FRAMEWORK FOR
NOVEL VIRUS-HUMAN PROTEIN INTERACTIONS
Ngan Thi Dong & Megha Khosla
L3S Research Center, Leibniz University Hannover, Germany
ABSTRACT
Understanding the interaction patterns between a particular virus and human pro-
teins plays a crucial role in unveiling the underlying mechanism of viral infection.
This could further help in developing treatments of viral diseases. The main issues
in tackling it as a machine learning problem is the scarcity of training data as well
input information of the viral proteins. We overcome these limitations by exploit-
ing powerful statistical protein representations derived from a corpus of around 24
Million protein sequences in a multi task framework. Our experiments on 7 varied
benchmark datasets support the superiority of our approach.
1 INTRODUCTION
Viral infections most have been increasingly burdening the healthcare systems. Biologically the viral
infection involves many protein-protein interactions (PPIs) between the virus and its host. These
interactions range from the initial biding of viral coat proteins to the host membrane receptor to
the hijacking of the host transcription machinery by viral proteins. In this work we develop a deep
learning based computational model for predicting interactions between a novel virus (a completely
new one) and human proteins.
One of the key challenges in tackling the current learning task with novel unseen viruses is the
limited training data. Often, some known interactions of related viruses are used to train supervised
models. These data is usually collected by wet lab experiments and are usually too little to ensure
generalizability of trained models. In effect, the trained models might overfit the training data and
would give inaccurate predictions for the novel virus.
Moreover, viral proteins are substantially different from human or bacterial proteins. They are
structurally dynamic so that they cannot be easily detected by common sequence-structure com-
parison (Requi
˜
ao et al., 2020). Virus protein sequences of different species share only little in com-
mon (Eid et al., 2016). Therefore, models trained for other human PPI (Li & Ilie, 2020; Sun et al.,
2017; Li, 2020; Chen et al., 2019; Sarkar & Saha, 2019) or for other pathogen-human PPI (Sudhakar
et al., 2020; Mei & Zhang, 2020; Dick et al., 2020; Li et al., 2014; Guven-Maiorov et al., 2019; Ba-
sit et al., 2018)(for which more data might be available) cannot be directly used for predictions for
novel viral-human protein interactions.
While for human proteins, features related to their function, semantic annotation, domain, structure,
pathway, etc. can be extracted from public databases, such information is not readily available for
viral proteins. The only reliable source of viral protein information is its amino acid sequence.
Learning effective representations of the viral proteins is thus an important step towards building
the prediction model. Heuristics such as K-mer composition usually used for protein representations
are bound to fail as it is known that viral proteins with completely different sequences might show
similar interaction patterns.
Other existing works also employed additional features to represent viral proteins such as protein
functional information (or GO annotation) (Wang, 2020), proteins domain-domain associations in-
formation as in (Barman et al., 2014), protein structure information as in (Lasso et al., 2019; Guven-
Maiorov et al., 2019), and the disease phenotype of clinical symptoms as in (Wang, 2020). A major
limitation of these approaches is that they cannot generalize to novel viruses where such information
is not available or lack experimentally supported evidence.
1
.CC-BY 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted March 26, 2021. ; https://doi.org/10.1101/2021.03.25.437037doi: bioRxiv preprint

To be presented at the ICLR Workshop on AI for Public Health 2021
Figure 1: MULTITASK TRANSFER (MTT) model for pathogen-human PPI.
In this work we tackle the above limitations by exploiting powerful statistical protein representations
derived from a corpus of around 24 Million protein sequences in a multitask framework. Noting the
fact that virus tends to mimic humans towards building interactions with its proteins, we use the
prediction of human PPI as a side task to further regularize our model and improve generalization.
Our large scale experiments on a number of datasets showcase the superiority of our approach.
2 OUR APPROACH
The schematic diagram of our proposed model is presented in Figure 1. We use human and virus raw
protein sequences as input. As side or domain information we use human protein-protein interaction
network of around 20,000 proteins and over 22M interactions from STRING (Szklarczyk et al.,
2015) database.
We note that the protein sequence determines the protein’s structural conformation (fold), which
further determines its function and its interaction pattern with other proteins. However, the underly-
ing mechanism of the sequence-to-structure matching process is very complex and cannot be easily
specified by hand crafted rules. Therefore, rather than using handcrafted features extracted from
amino acid sequences we employ the pre-trained UNIREP model (Alley et al., 2019) to generate
latent representations or protein embeddings. The protein representations extracted from UNIREP
model are empirically shown to preserve fundamental properties of the proteins and are hypothe-
sized to be statistically more powerful and generalizable than hand crafted sequence features.
We further fine-tune these representations by training 2 simple neural networks (single layer MLP
with ReLu activation) using an additional objective of predicting human PPI in addition to the main
task. We use Logistic Regression networks to predict likelihood of having interaction between virus-
human proteins or human-human proteins. The two networks’ parameters are not shared among
tasks allowing them to extract more task-specific representation.
The rationale behind using human PPI task is that viruses have been shown to mimic and compete
with human proteins in their binding and interaction patterns with other human proteins (Mei &
Zhang, 2020). Therefore, we believe that the patterns learned from the human interactome (or
human PPI network) should be a rich source of knowledge to guide our virus-human PPI task and
further helps to regularize our model.
Let Θ, Φ denote the set of learnable parameters corresponding to representation tuning components,
i.e., the Multilayer Perceptrons (MLP) corresponding to the virus and human proteins, respectively.
Let W
1
, W
2
denote the two learnable weight matrices (parameters) for the logistic regression mod-
ules for the virus-human and human-human PPI prediction tasks. We use V H, and HH to denote
the training set of virus-human, human-human PPI, correspondingly.
We use binary cross entropy loss for virus-human PPI predictions as given below
L
1
=
X
(v ,h )V H
z
v h
log y
v h
, Φ, W
1
) (1 z
v h
) log(1 y
v h
, Φ, W
1
)), (1)
2
.CC-BY 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted March 26, 2021. ; https://doi.org/10.1101/2021.03.25.437037doi: bioRxiv preprint

To be presented at the ICLR Workshop on AI for Public Health 2021
where variables z
v h
is the corresponding binary target variable and y
v h
is the predicted probability
of virus-human PPI or the output of the Logistic regression (LR) module. The input to the LR
module is the element wise product of fine-tuned representations (output of the MLP) of virus and
human protein.
For human PPI, the target variables (z
hh
0
) are the normalized confidence scores which can be in-
terpreted as the probability of observing an interaction. We use binary cross entropy loss as below
where y
hh
0
is the element wise product of fine-tuned representations (output of the second MLP ) of
human and human protein.
L
2
=
X
(h,h
0
)MP
z
hh
0
log y
hh
0
, W
2
) (1 z
hh
0
) log(1 y
hh
0
, W
2
)) (2)
We use a linear combination of the two loss functions to train our model, i.e., L = L
1
+ α · L
2
,
where α is the human PPI weight factor. We set it to 10
3
in our experiments.
3 EXPERIMENTAL EVALUATION
We compare our method with following six baseline methods and two simper variants of our model.
(1) GENERALIZED (Zhou et al., 2018): It is a generalized SVM model trained on hand crafted fea-
tures extracted from protein sequence for the novel virus-human PPI task.
(2) HYBRID (Deng et al., 2020): It is a complex deep model with convolutional and LSTM layers
for extracting latent representation of virus and human proteins from their input sequence features
and is trained using L1 regularized Logistic regression.
(3) DOC2VEC (Yang et al., 2020): It employs the doc2vec (Le & Mikolov, 2014) approach to gener-
ate protein embeddings from the corpus of protein sequences. A random forest model is then trained
for the PPI prediction.
(4) MOTIFTRANSFORMER (Lanchantin et al., 2020): It first generates protein embeddings using
supervised protein structure and function prediction tasks. Those embeddings were later passed as
input to a an order-independent classifier to do the PPI prediction task.
(5) DENOVO(Eid et al., 2016): It trained a SVM classifier on a hand crafted feature set extracted
from the K-mer amino acid composition information using a novel negative sampling strategy.
(6) BARMAN(Barman et al., 2014): It used a SVM model trained on feature set consisting of the
protein domain-domain association and methionine, serine, and valine amino acid composition of
viral proteins.
(7) 2 simpler variants of MTT: Towards ablation study we evaluate two simpler variants: (i) SIN-
GLETASK TRANSFER (STT), which is trained on a single objective of predicting pathogen-human
PPI and (ii) NAIVE BASELINE, which is a Logistic regression model using concatenated human and
pathogen protein UNIREP representations as input.
3.1 BENCHMARK DATASETS AND RESULTS
We evaluate our approach on 7 benchmark datasets. As several of our competitors do not release
their code, we use the reported performance scores (using the same evaluation metrics) in the original
papers giving them full advantage. Besides, as many of the methods use hand crafted features which
might not be available for other benchmark datasets not evaluated in their original papers. Detailed
data statistics can be found in the Appendix A.1.
Novel Viral-Human PPI. We use the benchmark datasets for human H1N1 and human Ebola
viruses as released by Zhou et al. (2018). The dataset is prepared for testing predictions for a novel
virus. The known PPIs between virus and human were retrieved from four databases: APID, IntAct,
Metha, and UniProt. The training data for the human-H1N1 dataset includes PPIs between human
and all viruses except H1N1. Similarly, the training data for the human-Ebola dataset includes PPIs
between human and all viruses except Ebola. The statistics for both datasets are presented in Ta-
ble 4 in the Appendix. The results (Area under curve (AUC) and Area under Precision Recall curve
(AUPR) scores) are given in Table 1.
Viral-Human PPI prediction on Datasets with Rich Viral information. We use the datasets
from DeNovo(Eid et al., 2016) and Barman (Barman et al., 2014) studies. DeNovo’s SLIM dataset
3
.CC-BY 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted March 26, 2021. ; https://doi.org/10.1101/2021.03.25.437037doi: bioRxiv preprint

To be presented at the ICLR Workshop on AI for Public Health 2021
H1N1 EBOLA
MODEL AUC AUPR AUC AUPR
GENERALIZED 0.886 - 0.867 -
HYBRID 0.937 - - -
MOTIFTRANSFORMER 0.945 0.948 0.968 0.974
DOC2VEC 0.817 0.542 - -
MULTITASK TRANSFER(MTT) (our) 0.957 0.966 0.976 0.981
SINGLETASK TRANSFER(STT) (our) 0.950 0.962 0.963 0.974
NAIVE BASELINE (our) 0.834 0.806 0.893 0.870
Table 1: Comparison on novel virus-human PPI prediction task. “-” denotes that the corresponding
score is not reported in the original paper.
encapsulated viral proteins based on presence of Short Linear Motif (SLiM) (short recurring protein
sequences with specific biological function). Barman’s dataset was retrieved from Virus-MINT
database by removing interacting protein pairs that did not have any “InterPro” domain hit. Barman
dataset is evaluated using 5-fold cross validation using the original data splits.
DENOVO SLIM BARMANS DATASET
MODEL SN SP ACC AUC SN SP ACC AUC
GENERALIZED 80.00 88.94 84.47 0.897 76.14 83.77 79.95 0.858
DENOVO Eid et al. (2016) 82.59 81.65 83.53 - - - - -
MTT (our) 88.00 87.76 87.88 0.955 90.05 89.57 89.81 0.958
STT (our) 86.12 85.88 86.00 0.941 90.14 89.66 89.9 0.957
NAIVE BASELINE (our) 84.00 83.76 83.88 0.885 74.20 73.72 73.96 0.809
Table 2: Comparison on datasets with rich feature information. SN, SP, ACC refer to Sensitivity,
Specificity, and Accuracy, respectively. “-” denotes that the corresponding score is not reported in
the original paper.
Additional results on novel bacteria-human PPI prediciton. We further demonstrate our model
effectiveness on the novel bacteria-human PPI prediction task. We compare our method with Denovo
on the three datasets for three human bacteria: BACILLUS ANTHRACIS (B1), YERSINIA PESTIS
(B2), and FRANCISELLA TULARENSIS (B3), obtained from (Eid et al., 2016). The results are
shown in Table 3. MTT clearly outperforms the baseline method (Denovo).
BACILLUS ANTHRACIS YERSINIA PESTIS FRANCISELLA TULARENSIS
MODEL SN SP ACC SN SP ACC SN SP ACC
DENOVO 94 97.2 96.42 94.8 98.3 97.47 94.9 98.3 97.32
MTT(our) 93.46 97.83 96.74 96.93 98.99 98.49 98.22 99.27 98.98
Table 3: Comparison for the novel bacteria-human PPI prediction task. SN, SP, ACC refer to
Sensitivity, Specificity, and Accuracy, respectively.
3.2 DISCUSSION AND FUTURE WORK
Our methods shows superior performance on a wide range of tested datasets. Note that this is despite
the fact that each of our baselines have been proposed to exploit certain specific kind of information
which was in the first place used to construct the dataset. MTT also outperforms its simpler variants
developed with single task objective. Note that our naive baseline which directly trains a logistic
regression classifier with pretrained embeddings already outperforms several methods. This points
to the superiority of these representations as compared to hand-crafted features. As future work
We will enhance our multi task approach by incorporating more domain information as well as
exploiting more sophisticated multi task model architectures.
4
.CC-BY 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted March 26, 2021. ; https://doi.org/10.1101/2021.03.25.437037doi: bioRxiv preprint

To be presented at the ICLR Workshop on AI for Public Health 2021
REFERENCES
Ethan C Alley, Grigory Khimulya, Surojit Biswas, Mohammed AlQuraishi, and George M Church.
Unified rational protein engineering with sequence-based deep representation learning. Nature
methods, 16(12):1315–1322, 2019.
Mais G Ammari, Cathy R Gresham, Fiona M McCarthy, and Bindu Nanduri. Hpidb 2.0: a curated
database for host–pathogen interactions. Database, 2016, 2016.
Ranjan Kumar Barman, Sudipto Saha, and Santasabuj Das. Prediction of interactions between viral
and host proteins using supervised machine learning methods. PloS one, 9(11):e112034, 2014.
Abdul Hannan Basit, Wajid Arshad Abbasi, Amina Asif, Sadaf Gull, and Fayyaz Ul Amir Afsar
Minhas. Training host-pathogen protein–protein interaction predictors. Journal of bioinformatics
and computational biology, 16(04):1850014, 2018.
Alberto Calderone, Luana Licata, and Gianni Cesareni. Virusmentha: a new resource for virus-host
protein interactions. Nucleic acids research, 43(D1):D588–D592, 2015.
Andrew Chatr-Aryamontri, Arnaud Ceol, Daniele Peluso, Aurelio Nardozza, Simona Panni,
Francesca Sacco, Michele Tinti, Alex Smolyar, Luisa Castagnoli, Marc Vidal, et al. Virusmint: a
viral protein interaction database. Nucleic acids research, 37(suppl 1):D669–D673, 2009.
Kuan-Hsi Chen, Tsai-Feng Wang, and Yuh-Jyh Hu. Protein-protein interaction prediction using a
hybrid feature representation and a stacked generalization scheme. BMC bioinformatics, 20(1):
1–17, 2019.
Lei Deng, Jiaojiao Zhao, and Jingpu Zhang. Predict the protein-protein interaction between virus
and host through hybrid deep neural network. In 2020 IEEE International Conference on Bioin-
formatics and Biomedicine (BIBM), pp. 11–16. IEEE, 2020.
Kevin Dick, Bahram Samanfar, Bradley Barnes, Elroy R Cober, Benjamin Mimee, Stephen J Mol-
nar, Kyle K Biggar, Ashkan Golshani, Frank Dehne, James R Green, et al. Pipe4: Fast ppi
predictor for comprehensive inter-and cross-species interactomes. Scientific reports, 10(1):1–15,
2020.
Francesca Diella, Niall Haslam, Claudia Chica, Aidan Budd, Sushama Michael, Nigel P Brown,
Gilles Trav
´
e, and Toby J Gibson. Understanding eukaryotic linear motifs and their role in cell
signaling and regulation. Front Biosci, 13(6580):603, 2008.
Fatma-Elzahraa Eid, Mahmoud ElHefnawi, and Lenwood S Heath. Denovo: virus-host sequence-
based protein–protein interaction prediction. Bioinformatics, 32(8):1144–1150, 2016.
Emine Guven-Maiorov, Chung-Jung Tsai, Buyong Ma, and Ruth Nussinov. Interface-based struc-
tural prediction of novel host-pathogen interactions. In Computational Methods in Protein Evo-
lution, pp. 317–335. Springer, 2019.
Jack Lanchantin, Arshdeep Sekhon, Clint Miller, and Yanjun Qi. Transfer learning with motiftrans-
formers for predicting protein-protein interactions between a novel virus and humans. bioRxiv,
2020.
Gorka Lasso, Sandra V Mayer, Evandro R Winkelmann, Tim Chu, Oliver Elliot, Juan Angel Patino-
Galindo, Kernyu Park, Raul Rabadan, Barry Honig, and Sagi D Shapira. A structure-informed
atlas of human-virus interactions. Cell, 178(6):1526–1541, 2019.
Quoc Le and Tomas Mikolov. Distributed representations of sentences and documents. In Interna-
tional conference on machine learning, pp. 1188–1196. PMLR, 2014.
Benjamin Yee Shing Li, Lam Fat Yeung, and Genke Yang. Pathogen host interaction prediction via
matrix factorization. In 2014 IEEE International Conference on Bioinformatics and Biomedicine
(BIBM), pp. 357–362. IEEE, 2014.
Yiwei Li. Computational methods for predicting protein-protein interactions and binding sites. 2020.
5
.CC-BY 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted March 26, 2021. ; https://doi.org/10.1101/2021.03.25.437037doi: bioRxiv preprint

Citations
More filters
Journal ArticleDOI

A multitask transfer learning framework for the prediction of virus-human protein-protein interactions

TL;DR: Dong et al. as discussed by the authors developed a multitask transfer learning approach that exploits the information of around 24 million protein sequences and the interaction patterns from the human interactome to counter the problem of small training datasets.
Posted Content

Application of Sequence Embedding in Protein Sequence-Based Predictions

TL;DR: In sequence-based predictions, conventionally an input sequence is represented by a multiple sequence alignment (MSA) or a representation derived from MSA, such as a position-specific scoring matrix as discussed by the authors.
Book ChapterDOI

Application of Sequence Embedding in Protein Sequence-Based Predictions

TL;DR: In sequence-based predictions, conventionally an input sequence is represented by a multiple sequence alignment (MSA) or a representation derived from MSA, such as a position-specific scoring matrix as mentioned in this paper .
Posted Content

Sharing to learn and learning to share - Fitting together Meta-Learning, Multi-Task Learning, and Transfer Learning : A meta review.

TL;DR: A literature review of transfer learning, meta learning, and multi-task learning can be found in this article, where the strengths of a learning algorithm turns out to be the strength of another, and thereby merging them is a prevalent trait in the literature.
Posted Content

A multitask transfer learning framework for the prediction of virus-human protein-protein interactions.

TL;DR: Dong et al. as mentioned in this paper developed a multitask transfer learning approach that exploits the information of around 24 million protein sequences and the interaction patterns from the human interactome to counter the problem of small training datasets.
References
More filters
Journal ArticleDOI

STRING v10: protein–protein interaction networks, integrated over the tree of life

TL;DR: H hierarchical and self-consistent orthology annotations are introduced for all interacting proteins, grouping the proteins into families at various levels of phylogenetic resolution in the STRING database.
Proceedings Article

Distributed Representations of Sentences and Documents

TL;DR: Paragraph Vector is an unsupervised algorithm that learns fixed-length feature representations from variable-length pieces of texts, such as sentences, paragraphs, and documents, and its construction gives the algorithm the potential to overcome the weaknesses of bag-of-words models.
Journal ArticleDOI

Unified rational protein engineering with sequence-based deep representation learning

TL;DR: Deep learning is applied to unlabeled amino-acid sequences to distill the fundamental features of a protein into a statistical representation that is semantically rich and structurally, evolutionarily and biophysically grounded and broadly applicable to unseen regions of sequence space.
Journal ArticleDOI

Synthesis of a Vocal Sound from the 3,000 year old Mummy, Nesyamun ‘True of Voice’

TL;DR: The sound of a 3,000 year old mummified individual has been accurately reproduced as a vowel-like sound based on measurements of the precise dimensions of his extant vocal tract following Computed Tomography (CT) scanning, enabling the creation of a3-D printed vocal tract.
Journal ArticleDOI

Understanding eukaryotic linear motifs and their role in cell signaling and regulation.

TL;DR: The current state of linear motif biology is summarized, which uses low affinity interactions to create cooperative, combinatorial and highly dynamic regulatory protein complexes, which suggest that models for cell regulatory networks in systems biology should neither be overly dependent on stochastic nor on smooth deterministic approximations.
Frequently Asked Questions (17)
Q1. What have the authors contributed in "A multitask transfer learning framework for novel virus-human protein interactions" ?

The authors overcome these limitations by exploiting powerful statistical protein representations derived from a corpus of around 24 Million protein sequences in a multi task framework. Their experiments on 7 varied benchmark datasets support the superiority of their approach. 

As future work 

The input to the LR module is the element wise product of fine-tuned representations (output of the MLP) of virus and human protein. 

It used a SVM model trained on feature set consisting of the protein domain-domain association and methionine, serine, and valine amino acid composition of viral proteins. 

Noting the fact that virus tends to mimic humans towards building interactions with its proteins, the authors use the prediction of human PPI as a side task to further regularize their model and improve generalization. 

The rationale behind using human PPI task is that viruses have been shown to mimic and compete with human proteins in their binding and interaction patterns with other human proteins (Mei & Zhang, 2020). 

The authors will enhance their multi task approach by incorporating more domain information as well as exploiting more sophisticated multi task model architectures. 

Heuristics such as K-mer composition usually used for protein representations are bound to fail as it is known that viral proteins with completely different sequences might show similar interaction patterns. 

The protein representations extracted from UNIREP model are empirically shown to preserve fundamental properties of the proteins and are hypothesized to be statistically more powerful and generalizable than hand crafted sequence features. 

For human PPI, the target variables (zhh′ ) are the normalized confidence scores which can be interpreted as the probability of observing an interaction. 

DeNovo’s SLIM datasetTo be presented at the ICLR Workshop on AI for Public Health 2021encapsulated viral proteins based on presence of Short Linear Motif (SLiM) (short recurring protein sequences with specific biological function). 

L2 = ∑(h,h′)∈MP−zhh′ log yhh′(Φ,W2)− (1− zhh′) log(1− yhh′(Φ,W2)) (2)The authors use a linear combination of the two loss functions to train their model, i.e., L = L1 + α · L2, where α is the human PPI weight factor. 

These interactions range from the initial biding of viral coat proteins to the host membrane receptor to the hijacking of the host transcription machinery by viral proteins. 

The authors further fine-tune these representations by training 2 simple neural networks (single layer MLP with ReLu activation) using an additional objective of predicting human PPI in addition to the main task. 

Let Θ,Φ denote the set of learnable parameters corresponding to representation tuning components, i.e., the Multilayer Perceptrons (MLP) corresponding to the virus and human proteins, respectively. 

In this work the authors tackle the above limitations by exploiting powerful statistical protein representations derived from a corpus of around 24 Million protein sequences in a multitask framework. 

the authors believe that the patterns learned from the human interactome (or human PPI network) should be a rich source of knowledge to guide their virus-human PPI task and further helps to regularize their model.