scispace - formally typeset
Search or ask a question
Book ChapterDOI

Comparative Study of Classification Algorithm for Diabetics Data

01 Jan 2019-pp 327-351
TL;DR: A solution for early detection of diabetes by applying various data mining techniques to generate informative structures to train on specific data and the potential of ensemble predictive model for predicting instance of diabetes using UCI repository diabetes data is shown.
Abstract: Data mining techniques play a major role in healthcare centers to solve large volume of datasets. For diabetes patients if the blood glucose level diverges from typical range leads to serious complications. So, they must be monitored regularly to determine any critical variations. Implementing a predictive model for monitoring the glucose level would enable the patients to take preventive measures. This paper describes a solution for early detection of diabetes by applying various data mining techniques to generate informative structures to train on specific data. The main goal of the research is to generate clear and understandable pattern description in order to extract data knowledge and information stored in the dataset. We investigate the relative performance of various classifiers such as Naive Bayes, SMO-Support Vector Machine (SVM), Decision Tree, and also Neural Network (multilayer perceptron) for our purpose. The ensemble data mining approaches have been improved by classification algorithm. The experimental result shows that Naive Bayes algorithm shows better accuracy of 83.5% by splitting techniques (ST), when the data sets is reduced by 70–30 ratio percentage. By cross-validation (CV) decision tree shows better result 78.3% when compared with other classifiers. The experiment is performed on diabetes dataset at UCI repository in Weka tool. The study shows the potential of ensemble predictive model for predicting instance of diabetes using UCI repository diabetes data. The results are compared among various classifiers and accuracy of test results is measured.
References
More filters
Journal ArticleDOI
TL;DR: An aspect-level sentiment analysis method based on ontologies in the diabetes domain, which calculates the sentiment of the aspects by considering the words around the aspect which are obtained through N-gram methods.
Abstract: In recent years, some methods of sentiment analysis have been developed for the health domain; however, the diabetes domain has not been explored yet. In addition, there is a lack of approaches that analyze the positive or negative orientation of each aspect contained in a document (a review, a piece of news, and a tweet, among others). Based on this understanding, we propose an aspect-level sentiment analysis method based on ontologies in the diabetes domain. The sentiment of the aspects is calculated by considering the words around the aspect which are obtained through N-gram methods (N-gram after, N-gram before, and N-gram around). To evaluate the effectiveness of our method, we obtained a corpus from Twitter, which has been manually labelled at aspect level as positive, negative, or neutral. The experimental results show that the best result was obtained through the N-gram around method with a precision of 81.93%, a recall of 81.13%, and an -measure of 81.24%.

119 citations

Journal ArticleDOI
TL;DR: A comparative study of different classification techniques using three data mining tools named WEKA, TANAGRA and MATLAB is presented and the classification technique that has the potential to significantly improve the common or conventional methods is suggested for use in large scale data, bioinformatics or other general applications.
Abstract: In the absence of medical diagnosis evidences, it is difficult for the experts to opine about the grade of disease with affirmation. Generally many tests are done that involve clustering or classification of large scale data. However many tests could complicate the main diagnosis process and lead to the difficulty in obtaining the end results, particularly in the case where many tests are performed. This kind of difficulty could be resolved with the aid of machine learning techniques. In this research, we present a comparative study of different classification techniques using three data mining tools named WEKA, TANAGRA and MATLAB. The aim of this paper is to analyze the performance of different classification techniques for a set of large data. A fundamental review on the selected techniques is presented for introduction purpose. The diabetes data with a total instance of 768 and 9 attributes (8 for input and 1 for output) will be used to test and justify the differences between the classification methods. Subsequently, the classification technique that has the potential to significantly improve the common or conventional methods will be suggested for use in large scale data, bioinformatics or other general applications.

89 citations

Journal ArticleDOI
TL;DR: Evidence is provided that the anti-inflammatory effect of C. cassia on H. pylori-infected gastric cells is due to blockage of the NF-κB pathway by cinnamaldehyde, which can be considered as a potential candidate for in vivo and clinical studies against various H.pylori related gastric pathogenic processes.
Abstract: Cinnamomum cassia is widely employed for gastrointestinal complaints such as dyspepsia, flatulence, diarrhea, and vomiting. Studies report cinnamaldehyde (CM) as a major active constituent of cinnamon. The aim of this study was to evaluate the anti-inflammatory mechanism of CM on Helicobacter (H.) pylori-infected gastric epithelial cells in order to validate cinnamon traditional use in gastrointestinal (GI)-related disorders. AGS/MKN-45 cells and H. pylori (193C) were employed for co-culture experiments. Anti-H. pylori cytotoxic and anti-adhesion activity of CM were determined. Enzyme linked immunosorbent assay, real time polymerase chain reaction analysis and immunoblotting were used to measure the effect on interleukin-8 (IL-8) secretion/expression. The effect on activation of nuclear factor kappa B (NF-κB) was determined by immunoblot analysis. The non-cytotoxic CM (≤125 µM) was also non-bactericidal at the given time, suggesting the effect in H. pylori/cell co-culture system was not due to alteration in H. pylori viability or the toxicity to the cells. Also, CM did not show any anti-adhesion effect against H. pylori/cell co-culture. However, pre-incubation of the cells with CM significantly inhibited the IL-8 secretion/expression from H. pylori-infected cells (p<0.01). In addition, CM suppressed H. pylori-induced NF-κB activation and prevented degradation of inhibitor (I)-κB This study provides evidence that the anti-inflammatory effect of C. cassia on H. pylori-infected gastric cells is due to blockage of the NF-κB pathway by cinnamaldehyde. This agent can be considered as a potential candidate for in vivo and clinical studies against various H. pylori related gastric pathogenic processes.

65 citations

Journal Article
TL;DR: Detailed information about data mining techniques with more focus on classification techniques as one important supervised learning technique is provided and WEKA software as a tool of choice to perform classification analysis for different kinds of available data is discussed.
Abstract: The availability of huge amounts of data resulted in great need of data mining technique in order to generate useful knowledge. In the present study we provide detailed information about data mining techniques with more focus on classification techniques as one important supervised learning technique. We also discuss WEKA software as a tool of choice to perform classification analysis for different kinds of available data. A detailed methodology is provided to facilitate utilizing the software by a wide range of users. The main features of WEKA are 49 data preprocessing tools, 76 classification/regression algorithms, 8 clustering algorithms, 3 algorithms for finding association rules, 15 attribute/subset evaluators plus 10 search algorithms for feature selection. WEKA extracts useful information from data and enables a suitable algorithm for generating an accurate predictive model from it to be identified. Moreover, medical bioinformatics analyses have been performed to illustrate the usage of WEKA in the diagnosis of Leukemia.

46 citations

01 Jan 2015
TL;DR: In this study, decision tree and Bayesian models have been compared and results indicated that the Bayesian model is much more accurate in diabetes diagnosis.
Abstract: There are many feedback loops in the human body which keep the biotic balance. Disability or malfunction of any of these loops causes severe diseases with short - or long - t erm complications. Diabetes is one such disease which is caused due to the imprecise operation of these natural loops in the body. Diabetes is ascribed to the acute conditions under which the production and consumption of insulin is disturbed in the body w hich consequently leads to the increase of glucose level in the blood. Bayesian networks are considered as helpful methods for the diagnosis of many diseases. They, in fact, are probable models which have been proved useful in displaying complex systems an d showing the relationships between variables in a graphic way. The advantage of this model is that it can take into account the uncertainty and can get the scenarios of the system change for the evaluation of diagnosis procedures. In this study, decision tree and Bayesian models have been compared. The results indicated that the Bayesian model is much more accurate in diabetes diagnosis.

7 citations