scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Docaid: Predictive Healthcare Analytics Using Na\"{i}ve Bayes Classification

TL;DR: Visual Analytics is a tool to cost-effectively sort the exuberant and rapidly incrementing data in the field of medical research and helps us cope with the assorted data in an organized manner, which the human brain would be able to visualize easily.
Abstract: With the advancement in the field of medical research, there has been a copious increase in the data that is stored in the various public and private hospitals, clinics and other places of medical practice. This large store of data needs to be administered in a proper manner so that we can derive useful insights and conclusions using a proper analysis system. Such large amount of data is aptly handled through analyzing the unstructured or structured data by utilizing machine learning algorithms. Predictive analytics, a key component of machine learning algorithms, helps users make enhanced and supervised decisions. Visual Analytics is a tool to cost-effectively sort the exuberant and rapidly incrementing data in the field of medical research. It helps us cope with the assorted data in an organized manner, which the human brain would be able to visualize easily. This would in turn provide new innovative and potential results. This analytics not only provides structured data but also initiates structured thoughts in the mind of humans. As practitioners analyze certain anomalous situations, the process of visual analytics process would provide sorted relevant data related to it. This would in turn decrease the cost of maintaining huge amount of data.
Citations
More filters
Journal ArticleDOI
TL;DR: The proposed ensemble method, which is based on machine learning was tested on Dermatology datasets and showed that the dermatological prediction accuracy of the test data set is increased compared to a single classifier.
Abstract: Objective: Skin diseases are a major global health problem associated with high number of people. With the rapid development of technologies and the application of various data mining techniques in recent years, the progress of dermatological predictive classification has become more and more predictive and accurate. Therefore, development of machine learning techniques, which can effectively differentiate skin disease classification, is of vast importance. The machine learning techniques applied to skin disease prediction so far, no techniques outperforms over all the others. Methods: In this research paper, we present a new method, which applies five different data mining techniques and then developed an ensemble approach that consists all the five different data mining techniques as a single unit. We use informative Dermatology data to analysis different data mining techniques to classify the skin disease and then, an ensemble machine learning method is applied. Results: The proposed ensemble method, which is based on machine learning was tested on Dermatology datasets and classify the type of skin disease in six different classes like include C1: psoriasis, C2: seborrheic dermatitis, C3: lichen planus, C4: pityriasis rosea, C5: chronic dermatitis, C6: pityriasis rubra. The results show that the dermatological prediction accuracy of the test data set is increased compared to a single classifier. Conclusion: The ensemble method used on Dermatology datasets give better performance as compared to different classifier algorithms. Ensemble method gives more accurate and effective skin disease prediction.

42 citations

Journal ArticleDOI
TL;DR: The ensemble method provides a more accurate and effective skin disease prediction and feature selection applied to dermatology datasets yields a better performance as compared to individual classifier algorithms.

41 citations

Journal ArticleDOI
TL;DR: A new method is presented, which applies six different data mining classification techniques and then developed an ensemble approach using bagging, AdaBoost, and gradient boosting classifiers techniques to predict the different classes of skin disease.
Abstract: Nowadays, skin disease is a major problem among peoples worldwide. Different machine learning techniques are applied to predict the various classes of skin disease. In this research paper, we have applied six different machine learning algorithm to categorize different classes of skin disease using three ensemble techniques and then a feature selection method to compare the results obtained from different machine learning techniques. In the proposed study, we present a new method, which applies six different data mining classification techniques and then developed an ensemble approach using bagging, AdaBoost, and gradient boosting classifiers techniques to predict the different classes of skin disease. Further, the feature importance method is used to select important 15 features which play a major role in prediction. A subset of the original dataset is obtained after selecting only 15 features to compare the results of used six machine learning techniques and ensemble approach as on the whole dataset. The ensemble method used on skin disease dataset is compared with the new subset of the original dataset obtained from feature selection method. The outcome shows that the dermatological prediction accuracy of the test dataset is increased compared with an individual classifier and a better accuracy is obtained as compared with subset obtained from feature selection method. The ensemble method and feature selection used on dermatology datasets give better performance as compared with individual classifier algorithms. Ensemble method gives more accurate and effective skin disease prediction.

32 citations

Journal ArticleDOI
TL;DR: This paper addresses both theoretical and practical aspects related to the application of six classification models to pressure ulcer prediction in modular critical care data, while utilizing one of the largest available Medical Information Mart for Intensive Care databases.
Abstract: Increasingly available open medical and health datasets encourage data-driven research with a promise of improving patient care through knowledge discovery and algorithm development. Among efficient approaches to such high-dimensional problems are a number of machine learning methods, which are applied in this paper to pressure ulcer prediction in modular critical care data. An inherent property of many health-related datasets is a high number of irregularly sampled time-variant and scarcely populated features, often exceeding the number of observations. Although machine learning methods are known to work well under such circumstances, many choices regarding model and data processing exist. In particular, this paper address both theoretical and practical aspects related to the application of six classification models to pressure ulcers, while utilizing one of the largest available Medical Information Mart for Intensive Care (MIMIC-IV) databases. Random forest, with an accuracy of 96%, is the best-performing approach among the considered machine learning algorithms.

11 citations

Book ChapterDOI
01 Jan 2021
TL;DR: A new method was developed using four types of classification methods which is used in this research paper to classify the different classes of skin disease and the results are more accurate than the results obtained by previous studies.
Abstract: Skin diseases are very common nowadays and spreading widely among people in present time. With the growth of computer-based technology and relevance of different machine learning methods in current decade, the development of skin disease prediction using classifier methods is analytical and exact. Therefore, development of data mining techniques can efficiently distinguish classes of skin disease is importance. A new method was developed using four types of classification methods which is used in this research paper. We use skin disease dataset for analyzing various machine learning algorithms to classify the different classes of skin disease. The proposed data mining techniques were checked on skin disease datasets for analyzing six types of skin disease, which are psoriasis, seborrheic dermatitis, lichen planus, pityriasis rosea, chronic dermatitis, and pityriasis rubra. The results of base learners used in this paper are more accurate than the results obtained by previous studies.

6 citations

References
More filters
Journal ArticleDOI
TL;DR: The Bayesian classifier is shown to be optimal for learning conjunctions and disjunctions, even though they violate the independence assumption, and will often outperform more powerful classifiers for common training set sizes and numbers of attributes, even if its bias is a priori much less appropriate to the domain.
Abstract: The simple Bayesian classifier is known to be optimal when attributes are independent given the class, but the question of whether other sufficient conditions for its optimality exist has so far not been explored. Empirical results showing that it performs surprisingly well in many domains containing clear attribute dependences suggest that the answer to this question may be positive. This article shows that, although the Bayesian classifier‘s probability estimates are only optimal under quadratic loss if the independence assumption holds, the classifier itself can be optimal under zero-one loss (misclassification rate) even when this assumption is violated by a wide margin. The region of quadratic-loss optimality of the Bayesian classifier is in fact a second-order infinitesimal fraction of the region of zero-one optimality. This implies that the Bayesian classifier has a much greater range of applicability than previously thought. For example, in this article it is shown to be optimal for learning conjunctions and disjunctions, even though they violate the independence assumption. Further, studies in artificial domains show that it will often outperform more powerful classifiers for common training set sizes and numbers of attributes, even if its bias is a priori much less appropriate to the domain. This article‘s results also imply that detecting attribute dependence is not necessarily the best way to extend the Bayesian classifier, and this is also verified empirically.

3,225 citations


"Docaid: Predictive Healthcare Analy..." refers methods in this paper

  • ...We have developed a disease prediction system, DOCAID, for Typhoid, Malaria, Jaundice, Tuberculosis and Gastroenteritis, which bases its diagnosis on the patient symptoms and complaints using Naive Bayes Classification....

    [...]

  • ...We utilize the Naïve Bayes Classification [1] algorithm to develop the predictive analytics system and predict aptly the diseases for the patients....

    [...]

  • ...It is followed by Naïve Bayes Classification implementation to predict diseases for a patient and visual representation for the purpose of graphical analysis....

    [...]

Posted Content
TL;DR: This paper abandon the normality assumption and instead use statistical methods for nonparametric density estimation for kernel estimation, which suggests that kernel estimation is a useful tool for learning Bayesian models.
Abstract: When modeling a probability distribution with a Bayesian network, we are faced with the problem of how to handle continuous variables. Most previous work has either solved the problem by discretizing, or assumed that the data are generated by a single Gaussian. In this paper we abandon the normality assumption and instead use statistical methods for nonparametric density estimation. For a naive Bayesian classifier, we present experimental results on a variety of natural and artificial domains, comparing two methods of density estimation: assuming normality and modeling each conditional distribution with a single Gaussian; and using nonparametric kernel density estimation. We observe large reductions in error on several natural and artificial data sets, which suggests that kernel estimation is a useful tool for learning Bayesian models.

3,071 citations

Journal ArticleDOI
TL;DR: The technique set out in the paper, CHAID, is an offshoot of AID (Automatic Interaction Detection) designed for a categorized dependent variable with built-in significance testing, multi-way splits, and a new type of predictor which is especially useful in handling missing information.
Abstract: SUMMARY The technique set out in the paper, CHAID, is an offshoot of AID (Automatic Interaction Detection) designed for a categorized dependent variable. Some important modifications which are relevant to standard AID include: built-in significance testing with the consequence of using the most significant predictor (rather than the most explanatory), multi-way splits (in contrast to binary) and a new type of predictor which is especially useful in handling missing information.

2,744 citations

Proceedings Article
18 Aug 1995
TL;DR: In this paper, the authors use statistical methods for nonparametric density estimation for a naive Bayesian classifier, comparing two methods of density estimation: assuming normality and modeling each conditional distribution with a single Gaussian; and using non-parametric kernel density estimation.
Abstract: When modeling a probability distribution with a Bayesian network, we are faced with the problem of how to handle continuous variables. Most previous work has either solved the problem by discretizing, or assumed that the data are generated by a single Gaussian. In this paper we abandon the normality assumption and instead use statistical methods for nonparametric density estimation. For a naive Bayesian classifier, we present experimental results on a variety of natural and artificial domains, comparing two methods of density estimation: assuming normality and modeling each conditional distribution with a single Gaussian; and using nonparametric kernel density estimation. We observe large reductions in error on several natural and artificial data sets, which suggests that kernel estimation is a useful tool for learning Bayesian models.

2,524 citations

Proceedings ArticleDOI
25 Jun 2006
TL;DR: A large-scale empirical comparison between ten supervised learning methods: SVMs, neural nets, logistic regression, naive bayes, memory-based learning, random forests, decision trees, bagged trees, boosted trees, and boosted stumps is presented.
Abstract: A number of supervised learning methods have been introduced in the last decade. Unfortunately, the last comprehensive empirical evaluation of supervised learning was the Statlog Project in the early 90's. We present a large-scale empirical comparison between ten supervised learning methods: SVMs, neural nets, logistic regression, naive bayes, memory-based learning, random forests, decision trees, bagged trees, boosted trees, and boosted stumps. We also examine the effect that calibrating the models via Platt Scaling and Isotonic Regression has on their performance. An important aspect of our study is the use of a variety of performance criteria to evaluate the learning methods.

2,450 citations