What are the two phases of learning a Bayesian network from data?

To learn a Bayesian network from data, one needs to consider two important phases: structure learning and parameter learning, respectively.

What are the main reasons why researchers take fuzzy systems into account for transfer learning more and more?

Since many real world applications have noisy and uncertainty in data, researchers take fuzzy systems into account for transfer learning more and more.

What are the main applications of transfer learning?

Transfer learning approaches with the support of computational intelligence methods such as neural network, Bayesian network, and fuzzy logic have been applied in real-world applications.

What is the main limitation of multi-task network structure learning algorithms?

The main limitation of such multi-task network structure learning algorithms lies in the assumption that all pairs of tasks are equally related, which violates the truth that different pairs of tasks can differ in their degree of relatedness.

What is the advantage of fuzzy logic in knowledge transfer?

Using fuzzy techniques in similarity measurement and label production, the authors revealed the advantage of fuzzy logic in knowledge transfer where the target domain lacks critical information and involves uncertainty and vagueness.

How did Zou et al. achieve this?

By applying salient feature detection and tracking in videos to simulate fixations and smooth pursuit in human vision, Zou et al. [102] successfully implemented an unsupervised learning algorithm in a self-taught learning setting.

What are the main reasons why neural network has been widely used for transfer learning?

Of the computational intelligence methods, neural network has been extensively used for transfer learning, mainly in computer vision and image processing domain.

What is the main reason why neural network has been widely used in transfer learning?

The main reason why neural network has been widely used in transfer learning is that it doesn’t have I.I.D. assumption on data while almost all stochastic techniques have.

What is the main difference between fuzzy logic and classical transductive transfer learning?

The two primary elements within fuzzy logic, the linguistic variable and the fuzzy if-then rule, are able to mimic the human ability to capture imprecision and uncertainty within knowledge transfer.

What is the main reason why the research paper is a summary of transfer learning?

From the summary of transfer learning, it is concluded that transfer learning with the use computational intelligence, as an emerging research topic, starts playing an important role in almost all kinds of application.

What is the problem in the representation learning camp for images?

Chopra et al. [51] argued that in the representation learning camp for images, existing deep learning approaches could not encode the distributional shift between the source and target domains.

What is the problem of reusing English lowercase digits?

Authors [50] also considered a problem of classifying images of English lowercase a-to-z by reusing fine-tuned features of English handwritten digits 0-to-9.

How did they extend a previously constructed algorithm?

Koçer and Arslan [97] introduced the use of genetic algorithm and transfer learning by extending a previously constructed algorithm.

(Open Access) Transfer learning using computational intelligence (2015) | Jie Lu

Transfer Learning using Computational Intelligence: A Survey

Jie Lu, Vahid Behbood, Peng Hao, Hua Zuo, Shan Xue, Guangquan Zhang

Decision Systems & e-Service Intelligence (DeSI) Lab, Centre for Quantum Computation & Intelligent

Systems (QCIS), Faculty of Engineering and Information Technology,

University of Technology Sydney, POBOX123, Broadway, NSW2007, Australia

jie.lu@uts.edu.au, vahid.behbood@uts.edu.au, peng.hao@student.uts.edu.au, hua.zuo@student.uts.edu.au,

shan.xue@student.uts.edu.au, guangquan.zhang@uts.edu.au

Abstract

Transfer learning aims to provide a framework to utilize previously-acquired knowledge to solve new but similar

problems much more quickly and effectively. In contrast to classical machine learning methods, transfer learning

methods exploit the knowledge accumulated from data in auxiliary domains to facilitate predictive modeling consisting

of different data patterns in the current domain. To improve the performance of existing transfer learning methods and

handle the knowledge transfer process in real-world systems, computational intelligence has recently been applied in

transfer learning. This paper systematically examines computational intelligence-based transfer learning techniques and

clusters related technique developments into four main categories: a) neural network-based transfer learning; b)

Bayes-based transfer learning; c) fuzzy transfer learning, and d) applications of computational intelligence-based transfer

learning. By providing state-of-the-art knowledge, this survey will directly support researchers and practice-based

professionals to understand the developments in computational intelligence-based transfer learning research and

applications.

Keywords: Transfer learning, computational intelligence, neural network, Bayes, fuzzy sets and systems, genetic

algorithm.

1. Introduction

Although machine learning technologies have attracted a remarkable level of attention from researchers in different

computational fields, most of these technologies work under the common assumption that the training data (source

domain) and the test data (target domain) have identical feature spaces with underlying distribution. As a result, once the

feature space or the feature distribution of the test data changes, the prediction models cannot be used and must be rebuilt

and retrained from scratch using newly-collected training data, which is very expensive and sometimes not practically

possible. Similarly, since learning-based models need adequate labeled data for training, it is nearly impossible to

establish a learning-based model for a target domain which has very few labeled data available for supervised learning. If

we can transfer and exploit the knowledge from an existing similar but not identical source domain with plenty of labeled

data, however, we can pave the way for construction of the learning-based model for the target domain. In real world

scenarios, there are many situations in which very few labeled data are available, and collecting new labeled training data

and forming a particular model are practically impossible.

Transfer learning has emerged in the computer science literature as a means of transferring knowledge from a source

domain to a target domain. Unlike traditional machine learning and semi-supervised algorithms [1-4], transfer learning

considers that the domains of the training data and the test data may be different [5]. Traditional machine learning

algorithms make predictions on the future data using mathematical models that are trained on previously collected

labeled or unlabeled training data which is the same as future data [6-8]. Transfer learning, in contrast, allows the

domains, tasks, and distributions used in training and testing to be different. In the real world, we observe many examples

of transfer learning. We may find that learning to recognize apples might help us to recognize pears, or learning to play

the electronic organ may facilitate learning the piano. The study of transfer learning has been inspired by the fact that

human beings can utilize previously-acquired knowledge to solve new but similar problems much more quickly and

effectively. The fundamental motivation for transfer learning in the field of machine learning focuses on the need for

lifelong machine learning methods that retain and reuse previously learned knowledge. Research on transfer learning has

been undertaken since 1995 under a variety of names: learning to learn; life-long learning; knowledge transfer; meta

learning; inductive transfer; knowledge consolidation; context sensitive learning and multi-task learning [9]. In 2005, the

Broad Agency Announcement of the Defense Advanced Research Projects Agency’s Information Processing Technology

Office gave a new mission to transfer learning: the ability of a system to recognize and apply knowledge and skills

learned in previous tasks to novel tasks. In this definition, transfer learning aims to extract the knowledge from one or

more source tasks and then apply the knowledge to a target task. Traditional machine learning techniques only try to

learn each task from scratch, while transfer learning techniques try to transfer the knowledge from other tasks and/or

domains to a target task when the latter has few high-quality training data.

Several survey papers on transfer learning have been published in the last few years. For example, the paper by [9]

presented an extensive overview of transfer learning and different categories. However, these papers focus on transfer

learning techniques and approaches only; none of them discusses how the computational intelligence approach can be

used in transfer learning. Since the computational intelligence approach has been applied in transfer learning more

recently and has already demonstrated its advantage, this survey is timely.

There are three main types of articles being reviewed in this survey: Type 1 — articles on transfer learning

techniques (including related methods and approaches) and Type 2 — articles on transfer learning using computational

intelligence techniques. Type 3 — articles on related computational intelligence techniques. The search and selection of

these articles were performed according to the following five steps:

Step 1. Publication database identification and determination: The eminent publication databases such as Science

Direct, ACM Digital Library, IEEE Xplore and SpringerLink, were searched to provide a comprehensive bibliography of

research papers on transfer learning and transfer learning using computational intelligence.

Step 2. Type 1 article selection: These papers were selected according to the two criteria: 1) novelty; 2) impact-

published in high quality (high impact factor) journals, or in conference proceedings or book chapters but with high

citations

. These types of article are mainly used in Section 2.

Step 3. Preliminary screening of Type 2 articles: The search was first performed based on related keywords of

computational intelligence in transfer learning.

Step 4. Result filtering for Type 2 articles: The keywords of the preliminary references were extracted and clustered

manually. Based on the keywords related to application domain, these papers were divided, using “topic clustering”, into

four groups: a) Neural Network in transfer learning; b) Bayes in transfer learning; c) fuzzy and genetic algorithm in

transfer learning and d) application of transfer learning. This article selection process was based on the following criteria:

1) novelty — published within the last few years; 2) impact — see Step 2; 3) coverage — reported a new or particular

application domain; 4) typicality — only the most typical methodology and applications were retained.

Step 5. Type 3 article selection: These papers were selected according to the requirement of Step 4, aiming to

introduce related concepts of computational intelligence techniques.

The main contributions of this paper are: 1) it comprehensively and perceptively summarizes research achievements

on transfer learning from the point of view of applications of computational intelligence, and strategically clusters the

transfer learning into four computational intelligence application domains; 2) for each computational intelligence

technique it carefully analyses typical transfer learning frameworks and effectively identifies the specific requirements of

computational intelligence techniques in transfer learning. This will directly support researchers and practitioners to

promote the popularization and application of computational intelligence in transfer learning in different domains; 3) it

also covers several very new transfer learning techniques with computational intelligence, and reveals their successful

applications.

The remainder of this paper is structured as follows. In Section 2, the transfer learning techniques are reviewed and

analyzed. Sections 3 to 5 respectively present the 4 main application domains of transfer learning. Section 6 discusses the

applications of computational intelligence-based transfer learning methods. Section 7 presents our analysis and main

findings.

2. Basic transfer learning techniques

To understand and analyze the application developments of transfer learning by using computational intelligence, this

section first reviews the main transfer learning techniques. The notations and definitions that will be used throughout the

section are introduced. According to the definitions, we then categorize the various settings of transfer learning methods

that exist in the literature of machine learning.

Definition 2.1 (Domain) [9] A domain, which is denoted by   󰇝 

󰇛



󰇜

󰇞, consists of two components:

(1) Feature space χ; and

(2) Marginal probability distribution

󰇛



󰇜

, where 

󰇝





  



󰇞

 .

Definition 2.2 (Task) [9] A task, which is denoted by 

󰇝

 

󰇛



󰇜

󰇞

, consists of two components:

(1) A label space   󰇝



   



󰇞; and

(2) An objective predictive function

󰇛



󰇜

which is not observed and is to be learned by pairs󰇝



 



󰇞.

The function 

󰇛



󰇜

can be used to predict the corresponding label,

󰇛





󰇜

, of a new instance 



. From a probabilistic

viewpoint, 

󰇛





󰇜

can be written as󰇛







). In the bank failure prediction example, which is a binary prediction task,





can be the label of failed or survived. More specifically, the source domain can be denoted as





 󰇝





 





   





 





󰇞 where 





 



is the source instance or bank in the bank failure prediction example and







 



is the corresponding class label which can be failed or survived for bank failure prediction. Similarly, the target

domain can be denoted as 



 󰇝





 





   





 





󰇞 where 



 



is the target instance and 





 



is the

corresponding class label and in most scenarios



 



Definition 2.3 (Transfer learning) [9] Given a source domain 



and learning task 



, a target domain 



and

learning task 



, transfer learning aims to improve the learning of the target predictive function 



󰇛



󰇜

in 



using the

knowledge in 



and 



where 



 



or 



 



In the above definition, the condition 



 



implies that either 



 



or 



󰇛



󰇜

 



󰇛



󰇜

. Similarly, the condition





 



implies that either 



 



or 



󰇛



󰇜

 



󰇛



󰇜

. In addition, there are some explicit or implicit relationships

between the feature spaces of two domains such that we imply that the source domain and target domain are related. It

should be mentioned that when the target and source domains are the same (



 



) and their learning tasks are also the

same (



 



), the learning problem becomes a traditional machine learning problem.

According to the uniform definition of transfer learning introduced by Definition 2.3, transfer learning techniques can be

divided into three main categories [9]: 1) Inductive transfer learning, in which the learning task in the target domain is

different from the target task in the source domain (



 



󰇜; 2) Unsupervised transfer learning which is similar to

inductive transfer learning but focuses on solving unsupervised learning tasks in the target domain such as clustering,

dimensionality reduction and density estimation (



 



󰇜; and 3) Transductive transfer learning, in which the learning

tasks are the same in both domains, while the source and target domains are different (



 



 



 



). In the

literature, transductive transfer learning, domain adaptation, covariate shift, sample selection bias, transfer learning,

multi-task learning, robust learning, and concept drift are all terms which have been used to handle the related scenarios.

More specifically, when the method aims to optimize the performance on multiple tasks or domains simultaneously, it is

considered to be multi-task learning. If it optimizes performance on one domain, given training data that is from a

different but related domain, it is considered to be transductive transfer learning or domain adaptation. Transfer learning

and transductive transfer learning have often been used interchangeably with domain adaptation. Concept drift refers to a

scenario in which data arrives sequentially with changing distribution, and the goal is to predict the next batch given the

previously-arrived data [10].The goal of robust learning is to build a classifier that is less sensitive to certain types of

change, such as feature change or deletion in the test data. In addition, unsupervised domain adaptation can be considered

as a form of semi-supervised learning, but it assumes that the labeled training data and the unlabeled test data are drawn

from different distributions. The existing techniques and methods, which have thus far been used to handle the domain

adaptation problem, can be divided into four main classes [11]:

1) Instance weighting for covariate shift methods which weight samples in the source domain to match the target domain.

The covariate shift scenario might arise in cases where the training data has been biased toward one region of the input

space or is selected in a non-I.I.D. manner. It is closely related to the idea of sample-selection bias which has long been

studied in statistics [12] and in recent years it has been explored for machine-learning. Huang et al. [13] proposed a novel

procedure called Kernel Mean Matching (KMM) to estimate weights on each instance in the source domain, based on the

goal of making the weighted distribution of the source domain look similar to the distribution of the target domain.

Sugiyama et al. [14] and Tsuboi et al. [15] proposed a similar idea called the Kullback-Leibler Importance Estimation

Procedure (KLIEP). Here too the goal is to estimate weights to maximize similarity between the target and

weight-corrected source distributions.

2) Self-labeling methods which include unlabeled target domain samples in the training process and initialize their labels

and then iteratively refine the labels. Self-training has a close relationship with the Expectation Maximization (EM)

algorithm, which has hard and soft versions. The hard version adds samples with single certain labels while the soft

version assigns label confidences when fitting the model. Tan et al. [16] modified the relative contributions of the source

and target domains in EM. They increased the weight on the target data at each iteration, while Dai et al. [17] specified

the tradeoff between the source and target data terms by estimating KL divergence between the source and target

distributions, placing more weight on the target data as KL divergence increases. Self-training methods have been applied

to domain adaptation on Natural Language Processing (NLP) tasks including parsing [18-21]; part-of-speech tagging [22];

conversation summarization [23]; entity recognition [22, 24, 25]; sentiment classification [26]; spam detection [22];

cross-language document classification [27, 28]; and speech act classification [29].

3) Feature representation methods which try to find a new feature representation of the data, either to make the target and

source distributions look similar, or to find an abstracted representation for domain-specific features. The feature

representation approaches can be categorized into two classes [11]: (A) Distribution similarity approaches aim explicitly

to make the source and target domain sample distributions similar, either by penalizing or removing features whose

statistics vary between domains [24, 30-32] or by learning a feature space projection in which a distribution divergence

statistic is minimized [33-35]; (B) Latent feature approaches aim to construct new features by analyzing large amounts of

unlabeled source and target domain data [25, 36-42].

4) Cluster-based learning methods rely on the assumption that samples connected by high-density paths are likely to have

the same label if there is a high density path between them [43]. These methods aim to construct a graph in which the

labeled and unlabeled samples are the nodes, with the edge weights among samples based on their similarity. Dai et al.

[17] proposed a co-clustering based algorithm to propagate the label information across domains for document

classification. Xue et al. [44] proposed a cross-domain text classification algorithm known as TPLSA to integrate labeled

and unlabeled data from different but related domains.

3. Transfer learning using neural network

Neural Network aims to solve complex non-linear problems using a learning-based method inspired by human brain

structure and processes. In classical machine learning problems, many studies have demonstrated the superior

performance of neural network compared to statistical methods. This fact has encouraged many researchers to use neural

network for transfer learning, particularly in complicated problems. To address the problem in transfer learning, a number

of neural network-based transfer learning algorithms have been developed in recent years. This section reviews three of

the principal Neural Network techniques: Deep Neural Network, Multiple Tasks Neural Network, and Radial Basis

Function Neural Network, and presents their applications in transfer learning.

3.1. Transfer learning using deep neural network

Deep neural network is considered to be an intelligent feature extraction module that offers great flexibility in extracting

high-level features in transfer learning. The prominent characteristic of deep neural network is its multiple hidden layers,

which can capture the intricate non-linear representations of data. Hubel and Wiesel [45] proposed multi-stage

Hubel-Wiesel architectures that consist of alternating layers of convolutions and max pooling to extract data features. A

new model blending the above structure and multiple tasks is proposed for transfer learning [46]. In this model, a target

task and related tasks are trained together with shared input and hidden layers, and separately output neurons. The model

is then extended to the case in which each task has multiple output neurons [47]. Likewise, based on the multi-stage

Hubel-Wiesel architectures, whether shared hidden layers trained by the source task can be reused on a different target

task is detected. For the target task model, only the last classification layer needs to be retrained, but any layer of the new

model could be fine-tuned if desired. In this case, the parameters of hidden layers in the source task model act as

initialization parameters of the new target task model, and this strategy is especially promising for a model in which good

initialization is very important [48]. As mentioned above, generally all the layers except the last layer are treated as

feature extractors in a deep neural network. In contrast to this network structure, a new feature extractor structure is

proposed by Collobert and Weston [49]. Only the first two layers are used to extract features at different levels, such as

word level and sentence level in Natural Language Processing. Subsequent layers are classical neural network layers

used for prediction. The Stacked Denoising Autoencoder (SDA) is another structure that is presented in deep neural

network [50]. The core idea of SDA is that unsupervised learning is used to pre-train each layer, and ultimately all layers

are fine-tuned in a supervised learning way. Based on the SDA model, different feature transference strategies are

introduced to target tasks with varying degrees of complexity. The number of layers transferred to the new model

depends on the high-level or low-level feature representations that are needed. This means if low-level features are

needed, only the first layer parameters are transferred to the target task [50]. In addition, an interpolating path is

presented to transfer knowledge from the source task to the target task in a deep neural network. The original high

dimensional features of the source and target domains are projected to lower dimensional subspaces that lie on the

Grassman manifold, which presents a way to interpolate smoothly between the source and target domains; thus, a series

of feature sets is generated on the interpolating path and intermediate feature extractors are formed based on deep neural

network [51]. Deep neural networks can also combine with other technology to promote the performance of transfer

learning. Swietojanski, Ghoshal [52] applied restricted Boltzmann machine to pre-train deep neural network, and the

outputs of the network are used as features for a hidden Markov model.

3.2. Transfer learning using multiple task neural network

To improve the learning for the target task, multiple task learning (MTL) is proposed. Information contained in other

Transfer learning using computational intelligence

Figures

Citations

A Comprehensive Survey on Transfer Learning

Automatic Detection of Coronavirus Disease (COVID-19) Using X-ray Images and Deep Convolutional Neural Networks

Automatic detection of coronavirus disease (COVID-19) using X-ray images and deep convolutional neural networks.

A New Deep Transfer Learning Based on Sparse Auto-Encoder for Fault Diagnosis

Deep Model Based Domain Adaptation for Fault Diagnosis

References

Fuzzy sets

Sample Selection Bias as a Specification Error

A Survey on Transfer Learning

Receptive fields, binocular interaction and functional architecture in the cat's visual cortex

Machine learning in automated text categorization

Related Papers (5)

A Survey on Transfer Learning

Deep learning

Deep Residual Learning for Image Recognition

Domain-adversarial training of neural networks

ImageNet Classification with Deep Convolutional Neural Networks

Frequently Asked Questions (20)

Q1. What have the authors contributed in "Transfer learning using computational intelligence: a survey" ?

Q2. What are the two phases of learning a Bayesian network from data?

Q3. What is the main purpose of the self-training method?

Q4. What are the main reasons why researchers take fuzzy systems into account for transfer learning more and more?

Q5. What are the main applications of transfer learning?

Q6. What is the fundamental motivation for transfer learning in the field of machine learning?

Q7. What is the main limitation of multi-task network structure learning algorithms?

Q8. What is the main purpose of transfer learning?

Q9. What is the advantage of fuzzy logic in knowledge transfer?

Q10. How did Zou et al. achieve this?

Q11. What are the main reasons why neural network has been widely used for transfer learning?

Q12. What is the purpose of transfer learning?

Q13. What is the main reason why neural network has been widely used in transfer learning?

Q14. What is the main difference between fuzzy logic and classical transductive transfer learning?

Q15. What is the main reason why the research paper is a summary of transfer learning?

Q16. What is the problem in the representation learning camp for images?

Q17. What is the cost of retraining a prediction model?

Q18. What is the problem of reusing English lowercase digits?

Q19. How did they extend a previously constructed algorithm?

Q20. What is the main reason for the lack of labeled training data?