scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Techniques of Data Mining In Healthcare: A Review

18 Jun 2015-International Journal of Computer Applications (Foundation of Computer Science (FCS))-Vol. 120, Iss: 15, pp 38-50
TL;DR: Various Data Mining techniques such as classification, clustering, association, regression in health domain are reviewed and applications, challenges and future work of Data Mining in healthcare are highlighted.
Abstract: Data mining is gaining popularity in disparate research fields due to its boundless applications and approaches to mine the data in an appropriate manner. Owing to the changes, the current world acquiring, it is one of the optimal approach for approximating the nearby future consequences. Along with advanced researches in healthcare monstrous of data are available, but the main difficulty is how to cultivate the existing information into a useful practices. To unfold this hurdle the concept of data mining is the best suited. Data mining have a great potential to enable healthcare systems to use data more efficiently and effectively. Hence, it improves care and reduces costs. This paper reviews various Data Mining techniques such as classification, clustering, association, regression in health domain. It also highlights applications, challenges and future work of Data Mining in healthcare.

Content maybe subject to copyright    Report

Citations
More filters
01 Jan 2002

9,314 citations

Journal ArticleDOI
TL;DR: A literature review of the usage of process mining in healthcare and the most commonly used categories and emerging topics have been identified, as well as future trends, such as enhancing Hospital Information Systems to become process-aware.

453 citations

Journal ArticleDOI
TL;DR: This paper is specific to reviewing upcoding fraud analysis and detection research providing an overview of healthcare, upc coding, and a review of the current data mining techniques used therein.
Abstract: From its infancy in the 1910s, healthcare group insurance continues to increase, creating a consistently rising burden on the government and taxpayers. The growing number of people enrolled in healthcare programs such as Medicare, along with the enormous volume of money in the healthcare industry, increases the appeal for and risk of fraudulent activities. One such fraud, known as upcoding, is a means by which a provider can obtain additional reimbursement by coding a certain provided service as a more expensive service than what was actually performed. With the proliferation of data mining techniques and the recent and continued availability of public healthcare data, the application of these techniques towards fraud detection, using this increasing cache of data, has the potential to greatly reduce healthcare costs through a more robust detection of upcoding fraud. Presently, there is a sizable body of healthcare fraud detection research available but upcoding fraud studies are limited. Audit data can be difficult to obtain, limiting the usefulness of supervised learning; therefore, other data mining techniques, such as unsupervised learning, must be explored using mostly unlabeled records in order to detect upcoding fraud. This paper is specific to reviewing upcoding fraud analysis and detection research providing an overview of healthcare, upcoding, and a review of the current data mining techniques used therein.

71 citations


Cites background from "Techniques of Data Mining In Health..."

  • ...For additional information on data mining algorithms, there are many instructive resources to include (Witten and Frank 2005; Tomar and Agarwal 2013; Dave and Dadhich 2013; Gera and Joshi 2015; Ahmad et al. 2015)....

    [...]

  • ...Ahmad et al. (2015) surveyed healthcare data analysis and fraud detection techniques, highlighting applications and challenges in this domain....

    [...]

  • ...Ahmad et al. (2015) note several challenges to healthcare data mining, including the differences in data formats between organizations, quality of data regarding noisy and missing data, and the sharing of data....

    [...]

Journal ArticleDOI
TL;DR: This paper mainly focuses on cervical cancer prediction through different screening methods using data mining techniques like Boosted decision tree, decision forest and decision jungle algorithms as well performance evaluation has done on the basis of AUROC (Area under Receiver operating characteristic) curve, accuracy, specificity and sensitivity.
Abstract: Cervical cancer remains an important reason of deaths worldwide because effective access to cervical screening methods is a big challenge. Data mining techniques including decision tree algorithms are used in biomedical research for predictive analysis. The imbalanced dataset was obtained from the dataset archive belongs to the University of California, Irvine. Synthetic Minority Oversampling Technique (SMOTE) has been used to balance the dataset in which the number of instances has increased. The dataset consists of patient age, number of pregnancies, contraceptives usage, smoking patterns and chronological records of sexually transmitted diseases (STDs). Microsoft azure machine learning tool was used for simulation of results. This paper mainly focuses on cervical cancer prediction through different screening methods using data mining techniques like Boosted decision tree, decision forest and decision jungle algorithms as well performance evaluation has done on the basis of AUROC (Area under Receiver operating characteristic) curve, accuracy, specificity and sensitivity. 10-fold cross-validation method was utilized to authenticate the results and Boosted decision tree has given the best results. Boosted decision tree provided very high prediction with 0.978 on AUROC curve while Hinslemann screening method has used. The results obtained by other classifiers were significantly worse than boosted decision tree.

43 citations


Cites background from "Techniques of Data Mining In Health..."

  • ...Data mining is helpful where large collections of healthcare data are available [15]....

    [...]

Journal ArticleDOI
01 Nov 2017-Irbm
TL;DR: An attention is obligatory to develop smart diagnostic system to aware and save human masses from wide critical spectrum of diseases related to ophthalmology, oral and digestive systems.
Abstract: Background: Medical informatics has observed an unrestrained growth in the database. Latest advancements in the field of medical sciences have wiped out lots of critical diseases. Nowadays, the medical industry is affluent in data sources. These data sources are of use only if these are effectively analyzed on time. Methods: Data mining techniques are artificially intelligent and used to investigate known and unknown patterns available in the medical databases. Nowadays, data mining techniques are chronically used to mine abundant data sources of medical science. This paper explores the practice of diverse data mining techniques, the role of dataset used, effect of preprocessing, and the performances of different data mining techniques in diagnosis of different lifestyle based diseases. The venture of this paper is to fetch out stark assessments of different data mining techniques used in medical sciences. Results: By far, surveillance discloses that significant effort has been made for mining the data allied to the Cardiology and Diabetes. As per Google Scholar, in last seven years, the percentage of articles published related to cardio, diabetes, digestive, dentistry and ophthalmology disease diagnosis using data mining are 42%, 26%, 18%, 10% and 4% respectively. So, a little attention has been paid to develop predictive model for the diseases viz. ophthalmology, dentistry and digestive disorders. In addition, the rate of usage of preprocessing in diagnosis of different disorders related to cardio, diabetes, digestive, dentistry and ophthalmology lies between 10.65%–17.75%, 8.48%–14.80%, 4.58–8.93%, 2.96%–7.73% and 5.83%–12.93% respectively. Conclusion: An attention is obligatory to develop smart diagnostic system to aware and save human masses from wide critical spectrum of diseases related to ophthalmology, oral and digestive systems.

32 citations

References
More filters
01 Jan 1998
TL;DR: Presenting a method for determining the necessary and sufficient conditions for consistency of learning process, the author covers function estimates from small data pools, applying these estimations to real-life problems, and much more.
Abstract: A comprehensive look at learning and generalization theory. The statistical theory of learning and generalization concerns the problem of choosing desired functions on the basis of empirical data. Highly applicable to a variety of computer science and robotics fields, this book offers lucid coverage of the theory as a whole. Presenting a method for determining the necessary and sufficient conditions for consistency of learning process, the author covers function estimates from small data pools, applying these estimations to real-life problems, and much more.

26,531 citations

Book
08 Sep 2000
TL;DR: This book presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects, and provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data.
Abstract: The increasing volume of data in modern business and science calls for more complex and sophisticated tools. Although advances in data mining technology have made extensive data collection much easier, it's still always evolving and there is a constant need for new techniques and tools that can help us transform this data into useful information and knowledge. Since the previous edition's publication, great advances have been made in the field of data mining. Not only does the third of edition of Data Mining: Concepts and Techniques continue the tradition of equipping you with an understanding and application of the theory and practice of discovering patterns hidden in large data sets, it also focuses on new, important topics in the field: data warehouses and data cube technology, mining stream, mining social networks, and mining spatial, multimedia and other complex data. Each chapter is a stand-alone guide to a critical topic, presenting proven algorithms and sound implementations ready to be used directly or with strategic modification against live data. This is the resource you need if you want to apply today's most powerful data mining techniques to meet real business challenges. * Presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects. * Addresses advanced topics such as mining object-relational databases, spatial databases, multimedia databases, time-series databases, text databases, the World Wide Web, and applications in several fields. *Provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data

23,600 citations

Book
01 Jan 1966
TL;DR: In this article, the Straight Line Case is used to fit a straight line by least squares, and the Durbin-Watson Test is used for checking the straight line fit.
Abstract: Basic Prerequisite Knowledge. Fitting a Straight Line by Least Squares. Checking the Straight Line Fit. Fitting Straight Lines: Special Topics. Regression in Matrix Terms: Straight Line Case. The General Regression Situation. Extra Sums of Squares and Tests for Several Parameters Being Zero. Serial Correlation in the Residuals and the Durbin--Watson Test. More of Checking Fitted Models. Multiple Regression: Special Topics. Bias in Regression Estimates, and Expected Values of Mean Squares and Sums of Squares. On Worthwhile Regressions, Big F's, and R 2 . Models Containing Functions of the Predictors, Including Polynomial Models. Transformation of the Response Variable. "Dummy" Variables. Selecting the "Best" Regression Equation. Ill--Conditioning in Regression Data. Ridge Regression. Generalized Linear Models (GLIM). Mixture Ingredients as Predictor Variables. The Geometry of Least Squares. More Geometry of Least Squares. Orthogonal Polynomials and Summary Data. Multiple Regression Applied to Analysis of Variance Problems. An Introduction to Nonlinear Estimation. Robust Regression. Resampling Procedures (Bootstrapping). Bibliography. True/False Questions. Answers to Exercises. Tables. Indexes.

18,952 citations

Proceedings ArticleDOI
01 Jun 1993
TL;DR: An efficient algorithm is presented that generates all significant association rules between items in the database of customer transactions and incorporates buffer management and novel estimation and pruning techniques.
Abstract: We are given a large database of customer transactions. Each transaction consists of items purchased by a customer in a visit. We present an efficient algorithm that generates all significant association rules between items in the database. The algorithm incorporates buffer management and novel estimation and pruning techniques. We also present results of applying this algorithm to sales data obtained from a large retailing company, which shows the effectiveness of the algorithm.

15,645 citations


"Techniques of Data Mining In Health..." refers background in this paper

  • ...[114] U. Abdullah, J. Ahmad and A. Ahmed, “Analysis of Effectiveness of Apriori Algorithm in Medical Billing Data Mining”, 2008 International Conference on Emerging Technologies, IEEE-ICET 2008, Rawalpindi, Pakistan, (2008) October 18-19....

    [...]

  • ...This study proposed a method for detecting the occurrence of diseases using Apriori algorithm in particular geographical locations at particular period of time [117]....

    [...]

  • ...This attention occurred because Apriori resolved the issues identified in KID3 using the “Apriori property” so that association mining can be applied to real databases to extract association rules....

    [...]

  • ...Agarwal and his colleagues at IBM Almaden Research Center introduced a novel association rule algorithm called Apriori [110,111], association mining has received significant attention....

    [...]

  • ...But after R.Agarwal and his colleagues at IBM Almaden Research Center introduced a novel association rule algorithm called Apriori [110,111], association mining has received significant attention....

    [...]

Journal ArticleDOI
TL;DR: An overview of pattern clustering methods from a statistical pattern recognition perspective is presented, with a goal of providing useful advice and references to fundamental concepts accessible to the broad community of clustering practitioners.
Abstract: Clustering is the unsupervised classification of patterns (observations, data items, or feature vectors) into groups (clusters). The clustering problem has been addressed in many contexts and by researchers in many disciplines; this reflects its broad appeal and usefulness as one of the steps in exploratory data analysis. However, clustering is a difficult problem combinatorially, and differences in assumptions and contexts in different communities has made the transfer of useful generic concepts and methodologies slow to occur. This paper presents an overview of pattern clustering methods from a statistical pattern recognition perspective, with a goal of providing useful advice and references to fundamental concepts accessible to the broad community of clustering practitioners. We present a taxonomy of clustering techniques, and identify cross-cutting themes and recent advances. We also describe some important applications of clustering algorithms such as image segmentation, object recognition, and information retrieval.

14,054 citations


"Techniques of Data Mining In Health..." refers methods in this paper

  • ...First of all it randomly selects k objects and then decomposes these objects into k disjoint groups by iteratively relocating objects based on the similarity between the centroids and objects [92, 93]....

    [...]