scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Diagnosis of Heart Disease for Diabetic Patients using Naive Bayes Method

TL;DR: This study is applying Naive Bayes data mining classifier technique which produces an optimal prediction model using minimum training set which predicts attributes such as age, sex, blood pressure and blood sugar and the chances of a diabetic patient getting heart disease.
Abstract: objective of our paper is to predict the chances of diabetic patient getting heart disease. In this study, we are applying Naive Bayes data mining classifier technique which produces an optimal prediction model using minimum training set. Data mining is the analysis step of the Knowledge Discovery in Databases process (KDD). Data mining involves use of techniques to find underlying structures and relationships in a large database. Diabetes is a set of related diseases in which body cannot regulate the amount of sugar specifically glucose (hyperglycemia) in the blood. The diagnosis of diseases is a vital role in medical field. Using diabetic"s diagnosis, the proposed system predicts attributes such as age, sex, blood pressure and blood sugar and the chances of a diabetic patient getting a heart disease.

Content maybe subject to copyright    Report

International Journal of Computer Applications (0975 8887)
Volume 24 No.3, June 2011
7
Diagnosis of Heart Disease for Diabetic Patients using
Naive Bayes Method
G. Parthiban
Research Scholar,
Dr. MGR Educational
Research and Institute,
Maduravoyal,
Chennai, India.
A. Rajesh
Professor, Dept of CSE
C.Abdul Hakkeem College
of Engineering and
Technology,
Melvishram,
Vellore, India.
S.K.Srivatsa
Sr. Professor, Dept of
E & I,
St.Joseph’s College of
Engineering,
Chennai, India.
ABSTRACT
The objective of our paper is to predict the chances of diabetic
patient getting heart disease. In this study, we are applying
Naïve Bayes data mining classifier technique which produces an
optimal prediction model using minimum training set. Data
mining is the analysis step of the Knowledge Discovery in
Databases process (KDD). Data mining involves use of
techniques to find underlying structures and relationships in a
large database. Diabetes is a set of related diseases in which
body cannot regulate the amount of sugar specifically glucose
(hyperglycemia) in the blood. The diagnosis of diseases is a vital
role in medical field. Using diabetic‟s diagnosis, the proposed
system predicts attributes such as age, sex, blood pressure and
blood sugar and the chances of a diabetic patient getting a heart
disease.
Keywords: Knowledge Discovery, Data Mining, Diabetes,
Heart disease, Naïve Bayes M ethod.
1. INTRODUCTION
Knowledge discovery in medical databases is well-defined
process and data mining is an essential step. Data mining is the
non trivial extraction of potential useful information about data.
[1][2] Thus data mining should have been more appropriately
named “knowledge mining from data”. [3] Diabetes Mellitus is a
chronic disease which causes serious health complications
including renal (kidney) failure, heart disease, stroke, and
blindness [4].People with diabetes either do not produce enough
insulin (type 1 diabetes) or cannot use insulin properly (type 2
diabetes), or both. Type1 diabetes was also called Insulin
Dependent Diabetes Mellitus (IDDM ) or Childhood-onset
diabetes. Type2 diabetes was also referred to as Non-Insulin
Dependent Diabetes Mellitus (NIDDM) or Adult-onset diabetes
[5]. Type1 diabetes is typically recognized in childhood or
adolescence. At least 90% of patients with diabetes have type2
diabetes and it is typically recognized in adulthood where the
body cannot effectively use the insulin produced [6] [13]. The
causes of diabetes mellitus are unclear, however, there seem to
be both hereditary (genetic factors passed on in families), and
environmental factors involved.
The risk factors for type 2 diabetes are being 45 years of age or
older, being overweight, having a parent or sibling with diabetes
(family heredity), having high blood pressure (140/90 or higher),
having high cholesterol (HDL 35 or lower; triglycerides 250 or
higher) and acute stress. [7] Over 80 per cent of people with
type 2 diabetes are overweight and it is treated with diet and
exercise, the blood sugar level is lowered with drugs. [8] [15]
A family history of diabetes research has shown that people are
more at risk if there is a history of diabetes in close family
members. The physical inactivity research has shown that
people who do not lead an active life are more at risk of
developing type 2 diabetes [9][14].
Diabetes also increases the risk of micro-vascular damage and
macro-vascular complications. People with diabetes are two to
four times more likely to get cardio vascular diseases. Thus
diabetes is found to be one of the leading causes of global death
by disease. There are several methods in the literature
individually to diagnosis diabetes or heart disease. There is no
automated diagnosis method to diagnose Heart disease for
diabetic patient based on diabetes diagnosis attributes to our
knowledge.
In this paper, we propose a Naïve Bayes based method to
diagnose heart disease for diabetic patients. It should be noted
that the attributes used in our proposed method are those used
for diagnosis of diabetes and are not direct indicators of heart
disease.
2. BACKGROUND
Naïve Bayes Classifier is a term dealing with simple
probabilistic classifier based on applying Bayes Theorem with
strong independence assumptions. It assumes that the presence
or absence of particular feature of a class is unrelated to the
presence or absence of any other feature [10].
The Naive Bayes algorithm is based on conditional probabilities.
It uses Bayes' theorem, a formula that calculates a probability by
counting the frequency of values and combinations of values in
the historical data. Bayes' Theorem finds the probability of an
event occurring given the probability of another event that has
already occurred. If B represents the dependent event and A
represents the prior event, Bayes' theorem can be stated as
follows.
Prob (B given A) = Prob(A and B)/Prob(A)

International Journal of Computer Applications (0975 8887)
Volume 24 No.3, June 2011
8
To calculate the probability of B given A, the algorithm counts
the number of cases where A and B occur together and divides it
by the number of cases where A occurs alone.
An advantage of the Naive Bayes classifier is that it requires a
small amount of training data to estimate the parameters (means
and variances of the variables) necessary for classification.
Since independent variables are assumed, only the variances of
the variables for each class need to be determined and not the
entire. It can be used for both binary and multi class
classification problems.
3. EXPERIMENTAL METHODOLOGY
3.1 Data set and used variables
The data set used in this work are clinical data set collected from
one of the leading diabetic research institute in Chennai and
contain records of about 500 patients. The clinical data set
specification provides concise, unambiguous definition for items
related to diabetes.
The diabetes data set is developed to ensure people with diabetes
have up to date records of their risk factors, current
management, treatment target achievements and arrangements
and outcomes of regular surveillance for complications, to help
them monitor their care and make informed choices about their
management. It will also ensure that when people wit h diabetes
meet health care professionals the consultation is fully informed
by comprehensive, up to date and accurate information.
The diabetes attributes used in our proposed system and their
descriptions are shown in Table 1.
Table 1 Diabetes attributes used in the experimentation
Attribute
Description
Sex
A classification of the sex of the person
Age
Age of the patient
Family
Heredity
Previous history (Father / Mother)
Weight
Patient‟s weight
BP
Blood pressure
Fasting
Sugar level after fasting
PP
Post Prandial blood glucose level
A1C
HbA1c level Glycosylated
Last 4 months sugar level
LP Tot
Cholesterol
Total cholesterol level
3.2 Preprocessing and Sampling
Except for the attributes sex and family heredity all the other
attributes listed in Table 1 have numeric values. The attribute
sex takes on values M or F to denote male or female
respectively. The attribute family heredity takes on values
Father, M other or Both. In case there is no previous
diabetes history for the patient the attribute is left empty.
Since no attribute value should be left empty for the mining
algorithm to work properly, we have used the value No for
patients without any previous diabetes history. Likewise, we
need to have a categorical attribute based on which the data sets
are to be classified. The aim of our work is to predict the
chances of a diabetic patient getting heart disease. Hence, we
have taken the LP Tot Y/N attribute as the class attribute.
Since the LP Tot Y/N attribute is a numeric attribute, we have
categorized the attribute values into high cholesterol value
(Yes) or low cholesterol value (No).
This categorization has been done based on the fact that a
cholesterol value of 180 or more is taken to be high cholesterol
for Indians.
3.3. Data Analysis
The distribution of the attribute values with respect to the class
attribute LP Tot Y/N is shown in Figure 1.

International Journal of Computer Applications (0975 8887)
Volume 24 No.3, June 2011
9
Figure 1 Attribute value distributions with respect class attribute LP Tot Y/N
The blue colored regions in the graphs in Figure 1 denote high
cholesterol values. From the graphs we can see that, most of the
diabetic patients with high cholesterol values are in the age
group of 45 55, have a body weight in the range of 60 71,
have BP value of 148 or 230, have a Fasting value in the range
of 102 135, have a PP value in the range of 88 107, and have
a A1C value in the range of 7.7 9.6.
3.4. Using Data Mining in data set
The WEKA ("Waikato Environment for Knowledge Analysis")
tool is used for Data mining. Data mining finds valuable
information hidden in large volumes of data. Weka is a
collection of machine learning algorithms for data mining tasks,
written in Java and it contains tools for data pre-processing,
classification, regression, clustering, association rules, and
visualization. [11] The key features of Weka are it is open
source and platform independent. It provides many different
algorithms for data mining and machine learning [12]. We have
used Naïve bayes method to perform the mining and
classification process. We have used 10 folds cross validation to
minimize any bias in the process and improve the efficiency of
the process.
4. RESULTS AND DISCUSSION
The results of our experimentation are shown in Figure 2.

International Journal of Computer Applications (0975 8887)
Volume 24 No.3, June 2011
10
Figure 2 Result window of the data mining process
The proposed naïve bayes model was able to classify 74% of the
input instances correctly. It exhibited a precision of 71% in
average, recall of 74% in average, and F-measure of 71.2% in
average. The results show clearly that the proposed method
performs well compared to other similar methods in the
literature, taking into the fact that the attributes taken for
analysis are not direct indicators of heart disease.
5. CONCLUSIONS AND FUTURE
ENHANCEMENTS
Application of Data mining in analyzing the medical data is a
good method for considering the existing relationships between
variables. From our proposed approach we have shown that
mining helps to retrieve useful correlation even from attributes
which are not direct indicators of the class we are trying to
predict.
In our work we have tried to predict the chances of getting a
heart disease using attributes from diabetic‟s diagnosis. This can
be extended to predict other type of ailments which arise from
diabetes, such as visual impairment in future. Further, the data
analysis results can be used for further research in enhancing the
accuracy of the prediction system in future.
6. ACKNOWLEDGEMENTS
We are grateful to Dr.V.Shesiah, Chairman and M anaging
director of Dr.V.Shesiah Diabetic Research Institute, Chennai
for providing an access to medical diabetic data and for his
involvement in this domain.

International Journal of Computer Applications (0975 8887)
Volume 24 No.3, June 2011
11
7. REFERENCES
[1] Frawley and Piatetsky -Shaprio, 1996. Knowledge Discovery
in Databases An Overview. The AAAI/MIT Press, Menlo
Park,C.A.
[2] Cios, K. J., Pedrycz, W., Swiniarski, R.W. and Kurgan, L. A.
2007. Data M ining: A Knowledge Discovery Approach,
New York: Springer.
[3] Han, J., Kamber, M . 2006. Data M ining: Concepts and
Techniques, 2nd ed. San Francisco: Morgan Kaufman.
[4] World Health Organization. Definition and diagnosis of
diabetes mellitus and intermediate hyperglycemia:
http://www.who.int/topics/diabetes mellitus/en/
[5] Diabetes mellitus doctor‟s knowledge in M edicineNet :
http://www.medicinenet.com/diabetes
mellitus/page2.htm#toce.
[6] I. International Diabetes Federation, Diabetes Atlas third
edition”, IDF 2007.
[7] M .Franciosi and M.Sacco, Use of the diabetes risk score
and impaired glucose tolerance”, Diabetes care
Vol.28,no.5, pp 1187-2005.
[8] Kelling, D.G. and J.A. Wentworth et al., 1997, Diabetes
mellitus. Using a database to implement a systematic
management program. NC.Med.J.,58:368-371.
[9]International Diabetes Federation(IDF),
http://www.idf.org/about-diabetes
[10] Naïve bayes classifier based on applying bayes theorem:
http://en.wikipedia.org/wiki/Naive bayes classifier
[11] Weka Data mining software
http://www.cs.waikato.ac.nz/ml/weka
[12] An Introduction to the WEKA Data mining system -
http://www.cs.ccsu.edu/~markov/weka-tutorial.pdf
[13] Jianchao Han, Juan C. Rodriguze, and M ohsen Beheshti,
2008. Diabetes Data Analysis and Prediction M odel
Discovery Using RapidMiner. In Proceedings of the
Second International Conference on Future Generation
Communication and Networking.
[14] Asuncion, A., Newman, D. J. 2007. Pima Indians Diabetes
Data Set, UCI Machine Learning Repository,
http://archive.ics.uci.edu/ml/datasets/Pima+Indians+Diabet
s, Irvine, CA: University of California, School of
Information and Computer Science.
[15] Eleni Georga et al, 2009. Data M ining for Blood Glucose
Prediction and Knowledge Discovery in Diabetic Patients:
The METABO Diabetes M odeling and M anagement
System. In Proceedings of the 31st Annual International
Conference of the IEEE EMBS Minneapolis, M innesota,
USA.
Citations
More filters
Journal ArticleDOI
TL;DR: A comprehensive and multifaceted review of all relevant studies that were published between 1992 and 2019 for ML-based CAD diagnosis and the impacts of various factors, such as dataset characteristics, sample size, features, and the stenosis of each coronary artery are investigated in detail.

127 citations

Journal ArticleDOI
TL;DR: Using diabetics’ diagnosis, the system exhibited good accuracy and predicts attributes such as age, sex, blood pressure and blood sugar and the chances of a diabetic patient getting a heart disease.
Abstract: Classifying data is a common task in Machine learning. Data mining plays an essential role for extracting knowledge from large databases from enterprises operational databases. Data mining in health care is an emerging field of high importance for providing prognosis and a deeper understanding of medical data. Most data mining methods depend on a set of features that define the behaviour of the learning algorithm and directly or indirectly influence the complexity of resulting models. Heart disease is the leading cause of death in the world over the past 10 years. Researches have been using several data mining techniques in the diagnosis of heart disease. Diabetes is a chronic disease that occurs when the pancreas does not produce enough insulin, or when the body cannot effectively use the insulin it produces. Most of these systems have successfully employed Machine learning methods such as Naive Bayes and Support Vector Machines for the classification purpose. Support vector machines are a modern technique in the field of machine learning and have been successfully used in different fields of application. Using diabetics’ diagnosis, the system exhibited good accuracy and predicts attributes such as age, sex, blood pressure and blood sugar and the chances of a diabetic patient getting a heart disease.

97 citations


Cites methods from "Diagnosis of Heart Disease for Diab..."

  • ...Keywords Data Mining, Diabetes, Heart Disease, Machine Learning Methods, Naïve Bayes Method and Support Vector Machines....

    [...]

01 Jan 2012
TL;DR: The data mining methods and techniques will be explored to identify the suitable methods and Techniques for efficient classification of Diabetes dataset and in mining useful patterns.
Abstract: Medical professionals need a reliable prediction methodology to diagnose Diabetes. Data mining is the process of analysing data from different perspectives and summarizing it into useful information. The main goal of data mining is to discover new patterns for the users and to interpret the data patterns to provide meaningful and useful information for the users. Data mining is applied to find useful patterns to help in the important tasks of medical diagnosis and treatment. This project aims for mining the relationship in Diabetes data for efficient classification. The data mining methods and techniques will be explored to identify the suitable methods and techniques for efficient classification of Diabetes dataset and in mining useful patterns.

86 citations

Journal ArticleDOI
16 Mar 2021
TL;DR: The recent adaptive image-based classification techniques and it comparing existing classification methods to predict CAD earlier for a higher accurate value are provided and the decision-making of classified output provides better accurate results in the proposed algorithm.
Abstract: Coronary Artery Disease (CAD) prediction is a very hard and challenging task in the medical field. The early prediction in the medical field especially the cardiovascular sector is one of the virtuosi. The prior studies about the construction of the early prediction model developed an understanding of the recent techniques to find the variation in medical imaging. The prevention of cardiovascular can be fulfilled through a diet chart prepared by the concerned physician after early prediction. Our research paper consists of the prediction of CAD by the proposed algorithm by constructing of pooled area curve (PUC) in the machine learning method. This knowledgebased identification is an important factor for accurate prediction. This significant approach provides a good impact to determine variation in medical images although weak pixels surrounding it. This pooled area construction in our machine learning algorithm is bagging shrinking veins and tissues with the help of clogging and plaque of blood vessels. Besides, the noisy type database is used in this article for better clarity about identifying the classifier. This research article provides the recent adaptive image-based classification techniques and it comparing existing classification methods to predict CAD earlier for a higher accurate value. This proposed method is taking as Journal of Artificial Intelligence and Capsule Networks (2021) Vol.03/ No.01 Pages: 17-33 http://irojournals.com/aicn/ DOI: https://doi.org/10.36548/jaicn.2021.1.002 18 ISSN: 2582-2012 (online) Submitted: 5.01.2021 Revised: 10.02.2021 Accepted: 2.03.2021 Published: 16.03.2021 evidence to diagnosis any heart disease earlier. The decision-making of classified output provides better accurate results in our proposed algorithm.

84 citations


Cites methods from "Diagnosis of Heart Disease for Diab..."

  • ...They concluded Naïve Bayes classifier is the suitable method to predict heart disease with a minimum number of the dataset [17]....

    [...]

References
More filters
Book
08 Sep 2000
TL;DR: This book presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects, and provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data.
Abstract: The increasing volume of data in modern business and science calls for more complex and sophisticated tools. Although advances in data mining technology have made extensive data collection much easier, it's still always evolving and there is a constant need for new techniques and tools that can help us transform this data into useful information and knowledge. Since the previous edition's publication, great advances have been made in the field of data mining. Not only does the third of edition of Data Mining: Concepts and Techniques continue the tradition of equipping you with an understanding and application of the theory and practice of discovering patterns hidden in large data sets, it also focuses on new, important topics in the field: data warehouses and data cube technology, mining stream, mining social networks, and mining spatial, multimedia and other complex data. Each chapter is a stand-alone guide to a critical topic, presenting proven algorithms and sound implementations ready to be used directly or with strategic modification against live data. This is the resource you need if you want to apply today's most powerful data mining techniques to meet real business challenges. * Presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects. * Addresses advanced topics such as mining object-relational databases, spatial databases, multimedia databases, time-series databases, text databases, the World Wide Web, and applications in several fields. *Provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data

23,600 citations

Journal ArticleDOI
TL;DR: After a decade of fundamental interdisciplinary research in machine learning, the spadework in this field has been done; the 1990s should see the widespread exploitation of knowledge discovery as an aid to assembling knowledge bases.
Abstract: After a decade of fundamental interdisciplinary research in machine learning, the spadework in this field has been done; the 1990s should see the widespread exploitation of knowledge discovery as an aid to assembling knowledge bases. The contributors to the AAAI Press book Knowledge Discovery in Databases were excited at the potential benefits of this research. The editors hope that some of this excitement will communicate itself to "AI Magazine readers of this article.

1,332 citations

Book ChapterDOI
Usama M. Fayyad1
17 Sep 1997
TL;DR: In this paper, the authors define the basic notions in data mining and KDD, define the goals, present motivation, and give a high-level definition of the KDD Process and how it relates to Data Mining.
Abstract: Data Mining and knowledge Discovery in Databases (KDD) promise to play an important role in the way people interact with databases, especially decision support databases where analysis and exploration operations are essential. Inductive logic programming can potentially play some key roles in KDD. This is an extended abstract for an invited talk in the conference. In the talk, we define the basic notions in data mining and KDD, define the goals, present motivation, and give a high-level definition of the KDD Process and how it relates to Data Mining. We then focus on data mining methods. Basic coverage of a sampling of methods will be provided to illustrate the methods and how they are used. We cover a case study of a successful application in science data analysis: the classification of cataloging of a major astronomy sky survey covering 2 billion objects in the northern sky. The system can outperform human as well as classical computational analysis tools in astronomy on the task of recognizing faint stars and galaxies. We also cover the problem of scaling a clustering problem to a large catalog database of billions of objects. We conclude with a listing of research challenges and we outline area where ILP could play some important roles in KDD.

609 citations

Book
01 Feb 2007
TL;DR: This comprehensive textbook on data mining details the unique steps of the knowledge discovery process that prescribes the sequence in which data mining projects should be performed, from problem and data understanding through datapreprocessing to deployment of the results.
Abstract: This comprehensive textbook on data mining details the unique steps of the knowledge discovery process that prescribes the sequence in which data mining projects should be performed, from problem and data understanding through datapreprocessing to deployment of the results This knowledge discovery approach is what distinguishes Data Mining from other texts in this area The book provides a suite of exercises and includes links to instructional presentations Furthermore, it containsappendices of relevant mathematical material

526 citations


"Diagnosis of Heart Disease for Diab..." refers background in this paper

  • ...[1][2] Thus data mining should have been more appropriately named “knowledge mining from data”....

    [...]

Journal ArticleDOI
26 Jun 2006
TL;DR: This is a proposal for a half day tutorial on Weka, an open source Data Mining software package written in Java that provides a rich set of powerful Machine Learning algorithms for Data Mining tasks, some not found in commercial data mining systems.
Abstract: This is a proposal for a half day tutorial on Weka, an open source Data Mining software package written in Java and available from www.cs.waikato.ac.nz/~ml/weka/index.html. The goal of the tutorial is to introduce faculty to the package and to the pedagogical possibilities for its use in the undergraduate computer science and engineering curricula. The Weka system provides a rich set of powerful Machine Learning algorithms for Data Mining tasks, some not found in commercial data mining systems. These include basic statistics and visualization tools, as well as tools for pre-processing, classification, and clustering, all available through an easy to use graphical user interface.

120 citations