scispace - formally typeset
Open AccessJournal ArticleDOI

Transfer learning using computational intelligence

TLDR
This paper systematically examines computational intelligence-based transfer learning techniques and clusters related technique developments into four main categories and provides state-of-the-art knowledge that will directly support researchers and practice-based professionals to understand the developments in computational Intelligence- based transfer learning research and applications.
Abstract
Transfer learning aims to provide a framework to utilize previously-acquired knowledge to solve new but similar problems much more quickly and effectively. In contrast to classical machine learning methods, transfer learning methods exploit the knowledge accumulated from data in auxiliary domains to facilitate predictive modeling consisting of different data patterns in the current domain. To improve the performance of existing transfer learning methods and handle the knowledge transfer process in real-world systems, computational intelligence has recently been applied in transfer learning. This paper systematically examines computational intelligence-based transfer learning techniques and clusters related technique developments into four main categories: (a) neural network-based transfer learning; (b) Bayes-based transfer learning; (c) fuzzy transfer learning, and (d) applications of computational intelligence-based transfer learning. By providing state-of-the-art knowledge, this survey will directly support researchers and practice-based professionals to understand the developments in computational intelligence-based transfer learning research and applications.

read more

Content maybe subject to copyright    Report

1
Transfer Learning using Computational Intelligence: A Survey
Jie Lu, Vahid Behbood, Peng Hao, Hua Zuo, Shan Xue, Guangquan Zhang
Decision Systems & e-Service Intelligence (DeSI) Lab, Centre for Quantum Computation & Intelligent
Systems (QCIS), Faculty of Engineering and Information Technology,
University of Technology Sydney, POBOX123, Broadway, NSW2007, Australia
jie.lu@uts.edu.au, vahid.behbood@uts.edu.au, peng.hao@student.uts.edu.au, hua.zuo@student.uts.edu.au,
shan.xue@student.uts.edu.au, guangquan.zhang@uts.edu.au
Abstract
Transfer learning aims to provide a framework to utilize previously-acquired knowledge to solve new but similar
problems much more quickly and effectively. In contrast to classical machine learning methods, transfer learning
methods exploit the knowledge accumulated from data in auxiliary domains to facilitate predictive modeling consisting
of different data patterns in the current domain. To improve the performance of existing transfer learning methods and
handle the knowledge transfer process in real-world systems, computational intelligence has recently been applied in
transfer learning. This paper systematically examines computational intelligence-based transfer learning techniques and
clusters related technique developments into four main categories: a) neural network-based transfer learning; b)
Bayes-based transfer learning; c) fuzzy transfer learning, and d) applications of computational intelligence-based transfer
learning. By providing state-of-the-art knowledge, this survey will directly support researchers and practice-based
professionals to understand the developments in computational intelligence-based transfer learning research and
applications.
Keywords: Transfer learning, computational intelligence, neural network, Bayes, fuzzy sets and systems, genetic
algorithm.
1. Introduction
Although machine learning technologies have attracted a remarkable level of attention from researchers in different
computational fields, most of these technologies work under the common assumption that the training data (source
domain) and the test data (target domain) have identical feature spaces with underlying distribution. As a result, once the
feature space or the feature distribution of the test data changes, the prediction models cannot be used and must be rebuilt
and retrained from scratch using newly-collected training data, which is very expensive and sometimes not practically
possible. Similarly, since learning-based models need adequate labeled data for training, it is nearly impossible to
establish a learning-based model for a target domain which has very few labeled data available for supervised learning. If
we can transfer and exploit the knowledge from an existing similar but not identical source domain with plenty of labeled
data, however, we can pave the way for construction of the learning-based model for the target domain. In real world
scenarios, there are many situations in which very few labeled data are available, and collecting new labeled training data
and forming a particular model are practically impossible.
Transfer learning has emerged in the computer science literature as a means of transferring knowledge from a source
domain to a target domain. Unlike traditional machine learning and semi-supervised algorithms [1-4], transfer learning
considers that the domains of the training data and the test data may be different [5]. Traditional machine learning
algorithms make predictions on the future data using mathematical models that are trained on previously collected
labeled or unlabeled training data which is the same as future data [6-8]. Transfer learning, in contrast, allows the
domains, tasks, and distributions used in training and testing to be different. In the real world, we observe many examples
of transfer learning. We may find that learning to recognize apples might help us to recognize pears, or learning to play
the electronic organ may facilitate learning the piano. The study of transfer learning has been inspired by the fact that

2
human beings can utilize previously-acquired knowledge to solve new but similar problems much more quickly and
effectively. The fundamental motivation for transfer learning in the field of machine learning focuses on the need for
lifelong machine learning methods that retain and reuse previously learned knowledge. Research on transfer learning has
been undertaken since 1995 under a variety of names: learning to learn; life-long learning; knowledge transfer; meta
learning; inductive transfer; knowledge consolidation; context sensitive learning and multi-task learning [9]. In 2005, the
Broad Agency Announcement of the Defense Advanced Research Projects Agencys Information Processing Technology
Office gave a new mission to transfer learning: the ability of a system to recognize and apply knowledge and skills
learned in previous tasks to novel tasks. In this definition, transfer learning aims to extract the knowledge from one or
more source tasks and then apply the knowledge to a target task. Traditional machine learning techniques only try to
learn each task from scratch, while transfer learning techniques try to transfer the knowledge from other tasks and/or
domains to a target task when the latter has few high-quality training data.
Several survey papers on transfer learning have been published in the last few years. For example, the paper by [9]
presented an extensive overview of transfer learning and different categories. However, these papers focus on transfer
learning techniques and approaches only; none of them discusses how the computational intelligence approach can be
used in transfer learning. Since the computational intelligence approach has been applied in transfer learning more
recently and has already demonstrated its advantage, this survey is timely.
There are three main types of articles being reviewed in this survey: Type 1 articles on transfer learning
techniques (including related methods and approaches) and Type 2 articles on transfer learning using computational
intelligence techniques. Type 3 articles on related computational intelligence techniques. The search and selection of
these articles were performed according to the following five steps:
Step 1. Publication database identification and determination: The eminent publication databases such as Science
Direct, ACM Digital Library, IEEE Xplore and SpringerLink, were searched to provide a comprehensive bibliography of
research papers on transfer learning and transfer learning using computational intelligence.
Step 2. Type 1 article selection: These papers were selected according to the two criteria: 1) novelty; 2) impact-
published in high quality (high impact factor) journals, or in conference proceedings or book chapters but with high
citations
1
. These types of article are mainly used in Section 2.
Step 3. Preliminary screening of Type 2 articles: The search was first performed based on related keywords of
computational intelligence in transfer learning.
Step 4. Result filtering for Type 2 articles: The keywords of the preliminary references were extracted and clustered
manually. Based on the keywords related to application domain, these papers were divided, using “topic clustering”, into
four groups: a) Neural Network in transfer learning; b) Bayes in transfer learning; c) fuzzy and genetic algorithm in
transfer learning and d) application of transfer learning. This article selection process was based on the following criteria:
1) novelty published within the last few years; 2) impact see Step 2; 3) coverage reported a new or particular
application domain; 4) typicality only the most typical methodology and applications were retained.
Step 5. Type 3 article selection: These papers were selected according to the requirement of Step 4, aiming to
introduce related concepts of computational intelligence techniques.
The main contributions of this paper are: 1) it comprehensively and perceptively summarizes research achievements
on transfer learning from the point of view of applications of computational intelligence, and strategically clusters the
transfer learning into four computational intelligence application domains; 2) for each computational intelligence
technique it carefully analyses typical transfer learning frameworks and effectively identifies the specific requirements of

3
computational intelligence techniques in transfer learning. This will directly support researchers and practitioners to
promote the popularization and application of computational intelligence in transfer learning in different domains; 3) it
also covers several very new transfer learning techniques with computational intelligence, and reveals their successful
applications.
The remainder of this paper is structured as follows. In Section 2, the transfer learning techniques are reviewed and
analyzed. Sections 3 to 5 respectively present the 4 main application domains of transfer learning. Section 6 discusses the
applications of computational intelligence-based transfer learning methods. Section 7 presents our analysis and main
findings.
2. Basic transfer learning techniques
To understand and analyze the application developments of transfer learning by using computational intelligence, this
section first reviews the main transfer learning techniques. The notations and definitions that will be used throughout the
section are introduced. According to the definitions, we then categorize the various settings of transfer learning methods
that exist in the literature of machine learning.
Definition 2.1 (Domain) [9] A domain, which is denoted by 󰇝
󰇛
󰇜
󰇞, consists of two components:
(1) Feature space χ; and
(2) Marginal probability distribution
󰇛
󰇜
, where
󰇝
󰇞
.
Definition 2.2 (Task) [9] A task, which is denoted by
󰇝
󰇛
󰇜
󰇞
, consists of two components:
(1) A label space 󰇝
󰇞; and
(2) An objective predictive function
󰇛
󰇜
which is not observed and is to be learned by pairs󰇝
󰇞.
The function
󰇛
󰇜
can be used to predict the corresponding label,
󰇛
󰇜
, of a new instance
. From a probabilistic
viewpoint,
󰇛
󰇜
can be written as󰇛

). In the bank failure prediction example, which is a binary prediction task,
can be the label of failed or survived. More specifically, the source domain can be denoted as
󰇝
 
󰇞 where
is the source instance or bank in the bank failure prediction example and

is the corresponding class label which can be failed or survived for bank failure prediction. Similarly, the target
domain can be denoted as
󰇝
 
󰇞 where
is the target instance and

is the
corresponding class label and in most scenarios
.
Definition 2.3 (Transfer learning) [9] Given a source domain
and learning task
, a target domain
and
learning task
, transfer learning aims to improve the learning of the target predictive function
󰇛
󰇜
in
using the
knowledge in
and
where
or
.
In the above definition, the condition
implies that either
or
󰇛
󰇜
󰇛
󰇜
. Similarly, the condition
implies that either
or
󰇛
󰇜
󰇛
󰇜
. In addition, there are some explicit or implicit relationships
between the feature spaces of two domains such that we imply that the source domain and target domain are related. It
should be mentioned that when the target and source domains are the same (
) and their learning tasks are also the
same (
), the learning problem becomes a traditional machine learning problem.
According to the uniform definition of transfer learning introduced by Definition 2.3, transfer learning techniques can be
divided into three main categories [9]: 1) Inductive transfer learning, in which the learning task in the target domain is
different from the target task in the source domain (
󰇜; 2) Unsupervised transfer learning which is similar to
inductive transfer learning but focuses on solving unsupervised learning tasks in the target domain such as clustering,
dimensionality reduction and density estimation (
󰇜; and 3) Transductive transfer learning, in which the learning
tasks are the same in both domains, while the source and target domains are different (
). In the
literature, transductive transfer learning, domain adaptation, covariate shift, sample selection bias, transfer learning,

4
multi-task learning, robust learning, and concept drift are all terms which have been used to handle the related scenarios.
More specifically, when the method aims to optimize the performance on multiple tasks or domains simultaneously, it is
considered to be multi-task learning. If it optimizes performance on one domain, given training data that is from a
different but related domain, it is considered to be transductive transfer learning or domain adaptation. Transfer learning
and transductive transfer learning have often been used interchangeably with domain adaptation. Concept drift refers to a
scenario in which data arrives sequentially with changing distribution, and the goal is to predict the next batch given the
previously-arrived data [10].The goal of robust learning is to build a classifier that is less sensitive to certain types of
change, such as feature change or deletion in the test data. In addition, unsupervised domain adaptation can be considered
as a form of semi-supervised learning, but it assumes that the labeled training data and the unlabeled test data are drawn
from different distributions. The existing techniques and methods, which have thus far been used to handle the domain
adaptation problem, can be divided into four main classes [11]:
1) Instance weighting for covariate shift methods which weight samples in the source domain to match the target domain.
The covariate shift scenario might arise in cases where the training data has been biased toward one region of the input
space or is selected in a non-I.I.D. manner. It is closely related to the idea of sample-selection bias which has long been
studied in statistics [12] and in recent years it has been explored for machine-learning. Huang et al. [13] proposed a novel
procedure called Kernel Mean Matching (KMM) to estimate weights on each instance in the source domain, based on the
goal of making the weighted distribution of the source domain look similar to the distribution of the target domain.
Sugiyama et al. [14] and Tsuboi et al. [15] proposed a similar idea called the Kullback-Leibler Importance Estimation
Procedure (KLIEP). Here too the goal is to estimate weights to maximize similarity between the target and
weight-corrected source distributions.
2) Self-labeling methods which include unlabeled target domain samples in the training process and initialize their labels
and then iteratively refine the labels. Self-training has a close relationship with the Expectation Maximization (EM)
algorithm, which has hard and soft versions. The hard version adds samples with single certain labels while the soft
version assigns label confidences when fitting the model. Tan et al. [16] modified the relative contributions of the source
and target domains in EM. They increased the weight on the target data at each iteration, while Dai et al. [17] specified
the tradeoff between the source and target data terms by estimating KL divergence between the source and target
distributions, placing more weight on the target data as KL divergence increases. Self-training methods have been applied
to domain adaptation on Natural Language Processing (NLP) tasks including parsing [18-21]; part-of-speech tagging [22];
conversation summarization [23]; entity recognition [22, 24, 25]; sentiment classification [26]; spam detection [22];
cross-language document classification [27, 28]; and speech act classification [29].
3) Feature representation methods which try to find a new feature representation of the data, either to make the target and
source distributions look similar, or to find an abstracted representation for domain-specific features. The feature
representation approaches can be categorized into two classes [11]: (A) Distribution similarity approaches aim explicitly
to make the source and target domain sample distributions similar, either by penalizing or removing features whose
statistics vary between domains [24, 30-32] or by learning a feature space projection in which a distribution divergence
statistic is minimized [33-35]; (B) Latent feature approaches aim to construct new features by analyzing large amounts of
unlabeled source and target domain data [25, 36-42].
4) Cluster-based learning methods rely on the assumption that samples connected by high-density paths are likely to have
the same label if there is a high density path between them [43]. These methods aim to construct a graph in which the
labeled and unlabeled samples are the nodes, with the edge weights among samples based on their similarity. Dai et al.
[17] proposed a co-clustering based algorithm to propagate the label information across domains for document
classification. Xue et al. [44] proposed a cross-domain text classification algorithm known as TPLSA to integrate labeled

5
and unlabeled data from different but related domains.
3. Transfer learning using neural network
Neural Network aims to solve complex non-linear problems using a learning-based method inspired by human brain
structure and processes. In classical machine learning problems, many studies have demonstrated the superior
performance of neural network compared to statistical methods. This fact has encouraged many researchers to use neural
network for transfer learning, particularly in complicated problems. To address the problem in transfer learning, a number
of neural network-based transfer learning algorithms have been developed in recent years. This section reviews three of
the principal Neural Network techniques: Deep Neural Network, Multiple Tasks Neural Network, and Radial Basis
Function Neural Network, and presents their applications in transfer learning.
3.1. Transfer learning using deep neural network
Deep neural network is considered to be an intelligent feature extraction module that offers great flexibility in extracting
high-level features in transfer learning. The prominent characteristic of deep neural network is its multiple hidden layers,
which can capture the intricate non-linear representations of data. Hubel and Wiesel [45] proposed multi-stage
Hubel-Wiesel architectures that consist of alternating layers of convolutions and max pooling to extract data features. A
new model blending the above structure and multiple tasks is proposed for transfer learning [46]. In this model, a target
task and related tasks are trained together with shared input and hidden layers, and separately output neurons. The model
is then extended to the case in which each task has multiple output neurons [47]. Likewise, based on the multi-stage
Hubel-Wiesel architectures, whether shared hidden layers trained by the source task can be reused on a different target
task is detected. For the target task model, only the last classification layer needs to be retrained, but any layer of the new
model could be fine-tuned if desired. In this case, the parameters of hidden layers in the source task model act as
initialization parameters of the new target task model, and this strategy is especially promising for a model in which good
initialization is very important [48]. As mentioned above, generally all the layers except the last layer are treated as
feature extractors in a deep neural network. In contrast to this network structure, a new feature extractor structure is
proposed by Collobert and Weston [49]. Only the first two layers are used to extract features at different levels, such as
word level and sentence level in Natural Language Processing. Subsequent layers are classical neural network layers
used for prediction. The Stacked Denoising Autoencoder (SDA) is another structure that is presented in deep neural
network [50]. The core idea of SDA is that unsupervised learning is used to pre-train each layer, and ultimately all layers
are fine-tuned in a supervised learning way. Based on the SDA model, different feature transference strategies are
introduced to target tasks with varying degrees of complexity. The number of layers transferred to the new model
depends on the high-level or low-level feature representations that are needed. This means if low-level features are
needed, only the first layer parameters are transferred to the target task [50]. In addition, an interpolating path is
presented to transfer knowledge from the source task to the target task in a deep neural network. The original high
dimensional features of the source and target domains are projected to lower dimensional subspaces that lie on the
Grassman manifold, which presents a way to interpolate smoothly between the source and target domains; thus, a series
of feature sets is generated on the interpolating path and intermediate feature extractors are formed based on deep neural
network [51]. Deep neural networks can also combine with other technology to promote the performance of transfer
learning. Swietojanski, Ghoshal [52] applied restricted Boltzmann machine to pre-train deep neural network, and the
outputs of the network are used as features for a hidden Markov model.
3.2. Transfer learning using multiple task neural network
To improve the learning for the target task, multiple task learning (MTL) is proposed. Information contained in other

Citations
More filters
Journal ArticleDOI

A Comprehensive Survey on Transfer Learning

TL;DR: Transfer learning aims to improve the performance of target learners on target domains by transferring the knowledge contained in different but related source domains as discussed by the authors, in which the dependence on a large number of target-domain data can be reduced for constructing target learners.
Journal ArticleDOI

Automatic Detection of Coronavirus Disease (COVID-19) Using X-ray Images and Deep Convolutional Neural Networks

TL;DR: Five pre-trained convolutional neural network-based models have been proposed for the detection of coronavirus pneumonia-infected patient using chest X-ray radiographs and it has been seen that the pre- trained ResNet50 model provides the highest classification performance.
Journal ArticleDOI

Automatic detection of coronavirus disease (COVID-19) using X-ray images and deep convolutional neural networks.

TL;DR: In this paper, five pre-trained convolutional neural network-based models were proposed for the detection of coronavirus pneumonia-infected patient using chest X-ray radiographs.
Journal ArticleDOI

A New Deep Transfer Learning Based on Sparse Auto-Encoder for Fault Diagnosis

TL;DR: A new DTL method is proposed, which uses a three-layer sparse auto-encoder to extract the features of raw data, and applies the maximum mean discrepancy term to minimizing the discrepancy penalty between the features from training data and testing data.
Journal ArticleDOI

Deep Model Based Domain Adaptation for Fault Diagnosis

TL;DR: This work proposed a novel deep neural network model with domain adaptation for fault diagnosis, which can find the solution to this problem by adapting the classifier or the regression model trained in a source domain for use in a different but related target domain.
References
More filters
Book

Fuzzy sets

TL;DR: A separation theorem for convex fuzzy sets is proved without requiring that the fuzzy sets be disjoint.
Journal ArticleDOI

Sample Selection Bias as a Specification Error

James J. Heckman
- 01 Jan 1979 - 
TL;DR: In this article, the bias that results from using non-randomly selected samples to estimate behavioral relationships as an ordinary specification error or "omitted variables" bias is discussed, and the asymptotic distribution of the estimator is derived.
Journal ArticleDOI

A Survey on Transfer Learning

TL;DR: The relationship between transfer learning and other related machine learning techniques such as domain adaptation, multitask learning and sample selection bias, as well as covariate shift are discussed.
Journal ArticleDOI

Receptive fields, binocular interaction and functional architecture in the cat's visual cortex

TL;DR: This method is used to examine receptive fields of a more complex type and to make additional observations on binocular interaction and this approach is necessary in order to understand the behaviour of individual cells, but it fails to deal with the problem of the relationship of one cell to its neighbours.
Journal ArticleDOI

Machine learning in automated text categorization

TL;DR: This survey discusses the main approaches to text categorization that fall within the machine learning paradigm and discusses in detail issues pertaining to three different problems, namely, document representation, classifier construction, and classifier evaluation.
Related Papers (5)
Frequently Asked Questions (20)
Q1. What have the authors contributed in "Transfer learning using computational intelligence: a survey" ?

This paper systematically examines computational intelligence-based transfer learning techniques and clusters related technique developments into four main categories: a ) neural network-based transfer learning ; b ) Bayes-based transfer learning ; c ) fuzzy transfer learning, and d ) applications of computational intelligence-based transfer learning. By providing state-of-the-art knowledge, this survey will directly support researchers and practice-based professionals to understand the developments in computational intelligence-based transfer learning research and applications. 

To learn a Bayesian network from data, one needs to consider two important phases: structure learning and parameter learning, respectively. 

Self-training methods have been applied to domain adaptation on Natural Language Processing (NLP) tasks including parsing [18-21]; part-of-speech tagging [22]; conversation summarization [23]; entity recognition [22, 24, 25]; sentiment classification [26]; spam detection [22]; cross-language document classification [27, 28]; and speech act classification [29]. 

Since many real world applications have noisy and uncertainty in data, researchers take fuzzy systems into account for transfer learning more and more. 

Transfer learning approaches with the support of computational intelligence methods such as neural network, Bayesian network, and fuzzy logic have been applied in real-world applications. 

The fundamental motivation for transfer learning in the field of machine learning focuses on the need for lifelong machine learning methods that retain and reuse previously learned knowledge. 

The main limitation of such multi-task network structure learning algorithms lies in the assumption that all pairs of tasks are equally related, which violates the truth that different pairs of tasks can differ in their degree of relatedness. 

Research on transfer learning has been undertaken since 1995 under a variety of names: learning to learn; life-long learning; knowledge transfer; meta learning; inductive transfer; knowledge consolidation; context sensitive learning and multi-task learning [9]. 

Using fuzzy techniques in similarity measurement and label production, the authors revealed the advantage of fuzzy logic in knowledge transfer where the target domain lacks critical information and involves uncertainty and vagueness. 

By applying salient feature detection and tracking in videos to simulate fixations and smooth pursuit in human vision, Zou et al. [102] successfully implemented an unsupervised learning algorithm in a self-taught learning setting. 

Of the computational intelligence methods, neural network has been extensively used for transfer learning, mainly in computer vision and image processing domain. 

Transfer learning has emerged in the computer science literature as a means of transferring knowledge from a source domain to a target domain. 

The main reason why neural network has been widely used in transfer learning is that it doesn’t have I.I.D. assumption on data while almost all stochastic techniques have. 

The two primary elements within fuzzy logic, the linguistic variable and the fuzzy if-then rule, are able to mimic the human ability to capture imprecision and uncertainty within knowledge transfer. 

From the summary of transfer learning, it is concluded that transfer learning with the use computational intelligence, as an emerging research topic, starts playing an important role in almost all kinds of application. 

Chopra et al. [51] argued that in the representation learning camp for images, existing deep learning approaches could not encode the distributional shift between the source and target domains. 

As a result, once the feature space or the feature distribution of the test data changes, the prediction models cannot be used and must be rebuilt and retrained from scratch using newly-collected training data, which is very expensive and sometimes not practically possible. 

Authors [50] also considered a problem of classifying images of English lowercase a-to-z by reusing fine-tuned features of English handwritten digits 0-to-9. 

Koçer and Arslan [97] introduced the use of genetic algorithm and transfer learning by extending a previously constructed algorithm. 

since learning-based models need adequate labeled data for training, it is nearly impossible to establish a learning-based model for a target domain which has very few labeled data available for supervised learning.