scispace - formally typeset
Search or ask a question

A Survey of Data Mining Techniques on Medical Data for Finding Locally Frequent Diseases

TL;DR: The main focus of this paper is to analyze data mining techniques required for medical data mining especially to discover locally frequent diseases such as heart ailments, lung cancer, breast cancer and so on.
Abstract: In the last decade there has been increasing usage of data mining techniques on medical data for discovering useful trends or patterns that are used in diagnosis and decision making. Data mining techniques such as clustering, classification, regression, association rule mining, CART (Classification and Regression Tree) are widely used in healthcare domain. Data mining algorithms, when appropriately used, are capable of improving the quality of prediction, diagnosis and disease classification. The main focus of this paper is to analyze data mining techniques required for medical data mining especially to discover locally frequent diseases such as heart ailments, lung cancer, breast cancer and so on. We evaluate the data mining techniques for finding locally frequent patterns in terms of cost, performance, speed and accuracy. We also compare data mining techniques with conventional methods.
Citations
More filters
Proceedings ArticleDOI
01 Sep 2016
TL;DR: Based on performance factor SMO and Bayes Net techniques show optimum performances than the performances of KStar, Multilayer Perceptron and J48 techniques.
Abstract: Heart disease is considered as one of the major causes of death throughout the world. It cannot be easily predicted by the medical practitioners as it is a difficult task which demands expertise and higher knowledge for prediction. This paper addresses the issue of prediction of heart disease according to input attributes on the basis of data mining techniques. We have investigated the heart disease prediction using KStar, J48, SMO, Bayes Net and Multilayer Perceptron through Weka software. The performance of these data mining techniques is measured by combining the results of predictive accuracy, ROC curve and AUC value using a standard data set as well as a collected data set. Based on performance factor SMO and Bayes Net techniques show optimum performances than the performances of KStar, Multilayer Perceptron and J48 techniques.

79 citations


Cites background from "A Survey of Data Mining Techniques ..."

  • ...Papers [2], [6], [7] studied the applications of different data mining techniques for prediction of various diseases....

    [...]

  • ...Khaled and Das [2] evaluated the different data mining techniques to find frequent pattern based on cost, performance, speed and accuracy....

    [...]

Journal ArticleDOI
TL;DR: The performance of the proposed optimistic multi granulation Rough set based classification is compared with other rough set based (RS), K th Nearest Neighbor (KNN) and Back propagation neural network (BPN) approaches using various classification Measures.

44 citations

Proceedings ArticleDOI
01 Dec 2017
TL;DR: The experimental results show that Multiclass Decision Forest algorithm gives a better result than the other classification algorithms and produces 99.17% accuracy.
Abstract: Kidney damage and diminished function that lasts longer than three months is known as Chronic Kidney Disease (CKD) The primary goal of this research study is to identify the suitable diet plan for a CKD patient by applying the classification algorithms on the test result obtained from patients' medical records The aim of this work is to control the disease using the suitable diet plan and to identify that suitable diet plan using classification algorithms The suggested work pacts with the recommendation of various diet plans by using predicted potassium zone for CKD patients according to their blood potassium level The experiment is performed on different algorithms like Multiclass Decision Jungle, Multiclass Decision Forest, Multiclass Neural Network and Multiclass Logistic Regression The experimental results show that Multiclass Decision Forest algorithm gives a better result than the other classification algorithms and produces 9917% accuracy

34 citations


Cites background from "A Survey of Data Mining Techniques ..."

  • ...For example, data mining can be utilized to mining medicinal information as health area generates a lot of data about ailments, pathologies and patients [2]....

    [...]

Proceedings ArticleDOI
01 Oct 2016
TL;DR: The merits and demerits of frequently used data mining techniques in the domain of health care and medical data have been compared and an analytical approach regarding the uniqueness of medical data in health care is presented.
Abstract: Data mining is an important area of research and is pragmatically used in different domains like finance, clinical research, education, healthcare etc. Further, the scope of data mining have thoroughly been reviewed and surveyed by many researchers pertaining to the domain of healthcare which is an active interdisciplinary area of research. In fact, the task of knowledge extraction from the medical data is a challenging endeavor and it is a complex task. The main motive of this review paper is to give a review of data mining in the purview of healthcare. Moreover, intertwining and interrelation of previous researches have been presented in a novel manner. Furthermore, merits and demerits of frequently used data mining techniques in the domain of health care and medical data have been compared. The use of different data mining tasks in health care is also discussed. An analytical approach regarding the uniqueness of medical data in health care is also presented.

25 citations

Book ChapterDOI
01 Jan 2020
TL;DR: In Disease Diagnosis recognition of patterns is so important for identifying the disease accurately and machine learning is the field which is used for building the models that can predict the output based upon the inputs which are correlated based on the previous data.
Abstract: In Disease Diagnosis recognition of patterns is so important for identifying the disease accurately. Machine learning is the field which is used for building the models that can predict the output based upon the inputs which are correlated based upon the previous data. Disease identification is the most crucial task for treating any disease. Classification algorithms are used for classifying the disease. There are several classification algorithms and dimensionality reduction algorithms used. Machine Learning gives the PCs the capacity to learn without being modified externally. By using the Classification Algorithm a hypothesis can be selected from the set of alternatives the best fits a set of observations. Machine Learning is used for the high-dimensional and the multi-dimensional data. Classy and automatic algorithms can be developed using Machine Learning.

13 citations

References
More filters
Book
01 Jan 2001
TL;DR: This chapter discusses the design and analysis of experiments in the context of response surface methodology, and some of the techniques used in this work were new to the literature at the time.
Abstract: Funkenbusch, P. (2005), Practical Guide to Designed Experiments, New York: Marcel Dekker. Grice, J. (2000), Review of Design and Analysis of Experiments (4th ed.), by D. Montgomery, Technometrics, 42, 208–209. Myers, R., and Montgomery, D. (2002), Response Surface Methodology (2nd ed.), New York: Wiley. Ziegel, E. (2001), Editor’s Report on Design and Analysis of Experiments (5th ed.), by R. Myers and D. Montgomery, Technometrics, 43, 245. (2002), Editor’s Report on Response Surface Methodology (2nd ed.), by R. Myers and D. Mongtomery, Technometrics, 44, 298–299.

1,294 citations

Journal ArticleDOI
TL;DR: An efficient algorithm that eliminates intron code and a demetic approach to virtually parallelize the system on a single processor are discussed, which show that GP performs comparably in classification and generalization.
Abstract: We introduce a new form of linear genetic programming (GP). Two methods of acceleration of our GP approach are discussed: 1) an efficient algorithm that eliminates intron code and 2) a demetic approach to virtually parallelize the system on a single processor. Acceleration of runtime is especially important when operating with complex data sets, because they are occurring in real-world applications. We compare GP performance on medical classification problems from a benchmark database with results obtained by neural networks. Our results show that GP performs comparably in classification and generalization.

482 citations

Journal ArticleDOI
TL;DR: This book discusses data Mining-Based Modeling of Human Visual Perception, and the discovery of Clinical Knowledge in Databases Extracted from Hospital Information Systems and Knowledge Discovery in Time Series.
Abstract: Medical Data Mining and Knowledge Discovery * Legal Policy and Security Issues in the Handling of Medical Data * Medical Natural Language Understanding as a Supporting Technology for Data Mining in Healthcare * Anatomic Pathology Data Mining * A Data Clustering and Visualization Methodology for Epidemiological Pathology Discoveries * Mining Structure-Function Associations in a Brain Image Database * ADRIS * Knowledge Discovery in Mortality Records: An Info-Fuzzy Approach * Consistent and Complete Data and "Expert" Mining in Medicine * A Medical Data Mining Application Based on Evolutionary Computation * Methods of Temporal Data Validation and Abstraction in High-Frequency Domains * Data Mining the Matrix Associated Regions for Gene Therapy * Discovery of Temporal Patterns in Sparse Course-of-Disease Data * Data Mining-Based Modeling of Human Visual Perception * Discovery of Clinical Knowledge in Databases Extracted from Hospital Information Systems * Knowledge Discovery in Time Series.

136 citations

Proceedings ArticleDOI
21 Nov 2007
TL;DR: The comparative study of multiple prediction models for survival of CHD patients along with a 10-fold cross-validation provided us with an insight into the relative prediction ability of different data.
Abstract: The prediction of survival of Coronary heart disease (CHD) has been a challenging research problem for medical society The goal of this paper is to develop data mining algorithms for predicting survival of CHD patients based on 1000 cases We carry out a clinical observation and a 6-month follow up to include 1000 CHD cases The survival information of each case is obtained via follow up Based on the data, we employed three popular data mining algorithms to develop the prediction models using the 502 cases We also used 10-fold cross-validation methods to measure the unbiased estimate of the three prediction models for performance comparison purposes The results indicated that the SVM is the best predictor with 921 % accuracy on the holdout sample artificial neural networks came out to be the second with910% accuracy and the decision trees models came out to be the worst of the three with 896% accuracy The comparative study of multiple prediction models for survival of CHD patients along with a 10-fold cross-validation provided us with an insight into the relative prediction ability of different data

125 citations

Proceedings ArticleDOI
12 Dec 2009
TL;DR: This paper proposes the use of decision tree C4.5 algorithm, bagging with decision treeC4.
Abstract: Medical data mining has been a popular data mining topic of late. Especially, diagnosing of the heart disease is one of the important issue and many researchers investigated to develop intelligent medical decision support systems to help the physicians. In this paper, we propose the use of decision tree C4.5 algorithm, bagging with decision tree C4.5 algorithm and bagging with Naive Bayes algorithm to identify the heart disease of a patient and compare the effectiveness, correction rate among them. The data we study is collected from patients with coronary artery disease.

102 citations