Survey of Machine Learning Algorithms for Disease Diagnostic

doi:10.4236/JILSA.2017.91001

Journal of Intelligent Learning Systems and Applications, 2017, 9, 1-16

http://www.scirp.org/journal/jilsa

ISSN Online: 2150-8410

ISSN Print: 2150-8402

DOI:

10.4236/jilsa.2017.91001

January 24, 2017

Survey of Machine Learning Algorithms for

Disease Diagnostic

Meherwar Fatima

1

, Maruf Pasha

2

1

Institute of CS & IT, The Women University Multan, Multan, Pakistan

2

Department of Information Technology, Bahauddin Zakariya University, Multan, Pakistan

Abstract

In medical imaging, Computer Aided Diagnosis (CAD) is a

rapidly growing

dynamic area of research. In recent years, significant attempts are made fo

r

the enhancement of computer

aided diagnosis applications because errors in

medical diagnostic systems can result in seriously misleading medical trea

t-

ments. Machine learning is important in Computer Aided Diagnosis. After u

s-

ing an easy equation, objects such as organs may not

be indicated accurately.

So, pattern recognition fundamentally involves learning from examples. I

n the

field of bio-medical, pattern recognition and machine learning promise the

improved accuracy of perception and diagnosis of disease. They also promote

the objectivity of decision-making process. For the analysis of high-dimen

sional

and multimodal bio-

medical data, machine learning offers a worthy approach

for making classy and automatic algorithms. This survey paper provides the

comparative analysis of different machine learning algorithms for diagnosis of

different diseases such as heart disease, diabetes disease, liver disease, d

engue

disease and h

epatitis disease. It brings attention towards the suite of machine

learning algorithms and tools that are used for the analysis of diseases and d

e-

cision-making process accordingly.

Keywords

Machine Learning, Artificial Intelligence, Machine Learning Techniques

1. Introduction

Artificial Intelligence can enable the computer to think. Computer is made much

more intelligent by AI. Machine learning is the subfield of AI study. Various re-

searchers think that without learning, intelligence cannot be developed. There

are many types of Machine Learning Techniques that are shown in

Figure 1. Su-

pervised, Unsupervised, Semi Supervised, Reinforcement, Evolutionary Learning

How to cite this paper:

Fatima, M

. and

Pasha

, M. (2017) Survey of Machine

Learning

Algorithms for Disease Diagnostic

.

Journal

of Intelligent Learning Systems and Appli-

cations

,

9

, 1-16.

https://doi.org/10.4236/jilsa.2017.91001

Received:

October 17, 2016

Accepted:

January 21, 2017

Published:

January 24, 2017

7 by authors and

Scientific

Research Publishing Inc.

This work is licensed under the Creative

Commons Attribution International

License (CC BY

4.0).

http://creativecommons.org/licenses/by/4.0/

Open Access

M. Fatima, M. Pasha

2

Figure 1. Types of machine learning techniques.

and Deep Learning are the types of machine learning techniques. These tech-

niques are used to classify the data set.

1) Supervised learning: Offered a training set of examples with suitable targets

and on the basis of this training set, algorithms respond correctly to all feasible

inputs. Learning from exemplars is another name of Supervised Learning. Clas-

sification and regression are the types of Supervised Learning.

Classification: It gives the prediction of Yes or No, for example, “Is this tumor

cancerous?”, “Does this cookie meet our quality standards?”

Regression: It gives the answer of “How much” and “How many”.

2) Unsupervised learning: Correct responses or targets are not provided. Un-

supervised learning technique tries to find out the similarities between the input

data and based on these similarities, un-supervised learning technique classify

the data. This is also known as density estimation. Unsupervised learning con-

tains clustering

[1].

Clustering: it makes clusters on the basis of similarity.

3) Semi supervised learning: Semi supervised learning technique is a class of

supervised learning techniques. This learning also used unlabeled data for train-

ing purpose (generally a minimum amount of labeled-data with a huge amount

of unlabeled-data). Semi-supervised learning lies between unsupervised-learning

(unlabeled-data) and supervised learning (labeled-data).

4) Reinforcement learning: This learning is encouraged by behaviorist psy-

chology. Algorithm is informed when the answer is wrong, but does not inform

that how to correct it. It has to explore and test various possibilities until it finds

the right answer. It is also known as learning with a critic. It does not recom-

mend improvements. Reinforcement learning is different from supervised learn-

M. Fatima, M. Pasha

3

ing in the sense that accurate input and output sets are not offered, nor sub-

optimal actions clearly précised. Moreover, it focuses on on-line performance.

5) Evolutionary Learning: This biological evolution learning can be consi-

dered as a learning process: biological organisms are adapted to make progress

in their survival rates and chance of having off springs. By using the idea of fit-

ness, to check how accurate the solution is, we can use this model in a computer

[1].

6) Deep learning: This branch of machine learning is based on set of algo-

rithms. In data, these learning algorithms model high-level abstraction. It uses

deep graph with various processing layer, made up of many linear and nonlinear

transformation.

Pattern recognition process and data classification are valuable for a long

time. Humans have very strong skill for sensing the environment. They take

action against what they perceive from environment

[2]. Big data turns into

Chunks due to multidisciplinary combined effort of machine learning, databases

and statistics. Today, in medical sciences disease diagnostic test is a serious task.

It is very important to understand the exact diagnosis of patients by clinical ex-

amination and assessment. For effective diagnosis and cost effective manage-

ment, decision support systems that are based upon computer may play a vital

role. Health care field generates big data about clinical assessment, report re-

garding patient, cure, follow-ups, medication etc. It is complex to arrange in a

suitable way. Quality of the data organization has been affected due to inappro-

priate management of the data. Enhancement in the amount of data needs some

proper means to extract and process data effectively and efficiently

[3]. One of

the many machine-learning applications is employed to build such classifier that

can divide the data on the basis of their attributes. Data set is divided into two or

more than two classes. Such classifiers are used for medical data analysis and

disease detection.

Initially, algorithms of ML were designed and employed to observe medical

data sets. Today, for efficient analysis of data, ML recommended various tools.

Especially in the last few years, digital revolution has offered comparatively low-

cost and obtainable means for collection and storage of data. Machines for data

collection and examination are placed in new and modern hospitals to make them

capable for collection and sharing data in big information systems. Technologies

of ML are very effective for the analysis of medical data and great work is done

regarding diagnostic problems. Correct diagnostic data are presented as a medi-

cal record or reports in modern hospitals or their particular data section. To run

an algorithm, correct diagnostic patient record is entered in a computer as an

input. Results can be automatically obtained from the previous solved cases. Phy-

sicians take assistance from this derived classifier while diagnosing novel patient

at high speed and enhanced accuracy. These classifiers can be used to train non-

specialists or students to diagnose the problem

[4].

In past, ML has offered self-driving cars, speech detection, efficient web search,

and improved perception of the human generation. Today machine learning is

M. Fatima, M. Pasha

4

present everywhere so that without knowing it, one can possibly use it many

times a day. A lot of researchers consider it as the excellent way in moving to-

wards human level. The machine learning techniques discovers electronic health

record that generally contains high dimensional patterns and multiple data sets.

Pattern recognition is the theme of MLT that offers support to predict and make

decisions for diagnosis and to plan treatment. Machine learning algorithms are

capable to manage huge number of data, to combine data from dissimilar re-

sources, and to integrate the background information in the study

[3].

2. Diagnosis of Diseases by Using Different Machine

Learning Algorithms

Many researchers have worked on different machine learning algorithms for

disease diagnosis. Researchers have been accepted that machine-learning algo-

rithms work well in diagnosis of different diseases. Figurative approach of dis-

eases diagnosed by Machine Learning Techniques is shown in

Figure 2. In this

survey paper diseases diagnosed by MLT are heart, diabetes, liver, dengue and

hepatitis.

2.1. Heart Disease

Otoom

et al

. [5] presented a system for the purpose of analysis and monitoring.

Coronary artery disease is detected and monitored by this proposed system.

Cleveland heart data set is taken from UCI. This data set consists of 303 cases

and 76 attributes/features. 13 features are used out of 76 features. Two tests with

three algorithms Bayes Net, Support vector machine, and Functional Trees FT

are performed for detection purpose. WEKA tool is used for detection. After

Figure 2. Diseases diagnosed by MLT.

M. Fatima, M. Pasha

5

experimenting Holdout test, 88.3% accuracy is attained by using SVM technique.

In Cross Validation test, Both SVM and Bayes net provide the accuracy of 83.8%.

81.5% accuracy is attained after using FT. 7 best features are picked up by using

Best First selection algorithm. For validation Cross Validation test are used. By

applying the test on 7 best selected features, Bayes Net attained 84.5% of cor-

rectness, SVM provides 85.1% accuracy and FT classify 84.5% correctly.

Vembandasamy

et al

. [6] performed a work, to diagnose heart disease by using

Naive Bayes algorithm. Bayes theorem is used in Naive Bayes. Therefore, Naive

Bayes have powerful independence assumption. The employed data-set are ob-

tained from one of the leading diabetic research institute in Chennai. Data set

consists of 500 patients. Weka is used as a tool and executes classification by us-

ing 70% of Percentage Split. Naive Bayes offers 86.419% of accuracy.

Use of data mining approaches has been suggested by Chaurasia and Pal

[7]

for heart disease detection. WEKA data mining tool is used that contains a set of

machine learning algorithms for mining purpose. Naive Bayes, J48 and bagging

are used for this perspective. UCI machine learning laboratory provide heart

disease data set that consists of 76 attributes. Only 11 attributes are employed for

prediction. Naive bayes provides 82.31% accuracy. J48 gives 84.35% of correct-

ness. 85.03% of accuracy is achieved by Bagging. Bagging offers better classifica-

tion rate on this data set.

Parthiban and Srivatsa

[8] put their effort for diagnosis of heart disease in di-

abetic patients by using the methods of machine learning. Algorithms of Naive

Bayes and SVM are applied by using WEKA. Data set of 500 patients is used that

are collected from Research Institute of Chennai. Patients that have the disease

are 142 and disease is missing in 358 patients. By using Naive Bayes Algorithm

74% of accuracy is obtained. SVM provide the highest accuracy of 94.60.

Tan

et al

. [9] proposed hybrid technique in which two machine-learning algo-

rithms named Genetic Algorithm (G.A) and Support Vector Machine (SVM) are

joined effectively by using wrapper approach. LIBSVM and WEKA data mining

tool are used in this analysis. Five data sets (Iris, Diabetes disease, disease of breast

Cancer, Heart and Hepatitis disease) are picked up from UC Irvine machine

learning repository for this experiment. After applying GA and SVM hybrid ap-

proach, 84.07% accuracy is attained for heart disease. For data set of diabetes

78.26% accuracy is achieved. Accuracy for Breast cancer is 76.20%. Correctness

of 86.12% is resulting for hepatitis disease. Graphical representation of Accuracy

according to time for detection of heart disease is shown in

Figure 3.

Analysis:

In existing literature, SVM offers highest accuracy of 94.60% in 2012 as in

Ta-

ble 1. In many application areas, SVM shows good performance result. Attribute

or features used by Parthiban and Srivatsa in 2012 are correctly responded by

SVM. In 2015, Otoom

et al

. used SVM variant called SMO. It also uses FS tech-

nique to find best features. SVM responds to these features and offers the accu-

racy of 85.1% but it is comparatively low as in 2012. Training and testing set of

both data sets are different, as well as, data types are different.

Survey of Machine Learning Algorithms for Disease Diagnostic

Citations

Machine Learning: Algorithms, Real-World Applications and Research Directions

Process mining in healthcare

Prediction of Diabetes using Classification Algorithms

Machine Learning Approach

Artificial Intelligence in Cardiovascular Imaging: JACC State-of-the-Art Review.

References

Machine learning for medical diagnosis: history, state of the art and perspective

Machine Learning: An Algorithmic Perspective

Diagnosis of diabetes using classification mining techniques

Advantage and drawback of support vector machine functionality

Classification Of Diabetes Disease Using Support Vector Machine

Related Papers (5)

Random Forests

Machine Learning and Data Mining Methods in Diabetes Research.

Deep learning

Machine learning for medical diagnosis: history, state of the art and perspective

Scikit-learn: Machine Learning in Python