scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Heart Disease Prediction using Synthetic Minority Oversampling Technique and Soft Voting

TL;DR: In this paper, the authors have used different ML classifiers such as Gaussian Naive Bayes, Support Vector Machine (SVM), K-Nearest Neighbor (KNN) and applied Soft Voting on them.
Abstract: Heart disease is a vital cause of mortality in this world. The number of patients with this noxious disease is rising every day. It is taking millions of lives each year. It is dismaying that there are not many effective ways to detect heart disease gleaned on elementary information. Nowadays, in order to achieve unprecedented results, Machine Learning (ML) has been exclusively used in various fields. So, we have come up with a proposition of a heart disease prediction model using ML techniques in this paper to accomplish an effective result. We have used different ML classifiers such as Gaussian Naive Bayes, Support Vector Machine (SVM), K-Nearest Neighbor (KNN) and applied Soft Voting on them. The result shows that the Voting methods give us the most effective results with an Accuracy of 92.42%, Precision of 92.50%, Recall of 92.22% and F1-score of 92.34%. Our purpose is to detect this deleterious disease more precisely to enhance the medical field.
References
More filters
Journal ArticleDOI
TL;DR: This year's edition of the Statistical Update includes data on the monitoring and benefits of cardiovascular health in the population, metrics to assess and monitor healthy diets, an enhanced focus on social determinants of health, a focus on the global burden of cardiovascular disease, and further evidence-based approaches to changing behaviors, implementation strategies, and implications of the American Heart Association’s 2020 Impact Goals.
Abstract: Background: The American Heart Association, in conjunction with the National Institutes of Health, annually reports on the most up-to-date statistics related to heart disease, stroke, and cardiovas...

5,078 citations

Journal ArticleDOI
TL;DR: This paper proposes a novel method that aims at finding significant features by applying machine learning techniques resulting in improving the accuracy in the prediction of cardiovascular disease with the hybrid random forest with a linear model (HRFLM).
Abstract: Heart disease is one of the most significant causes of mortality in the world today. Prediction of cardiovascular disease is a critical challenge in the area of clinical data analysis. Machine learning (ML) has been shown to be effective in assisting in making decisions and predictions from the large quantity of data produced by the healthcare industry. We have also seen ML techniques being used in recent developments in different areas of the Internet of Things (IoT). Various studies give only a glimpse into predicting heart disease with ML techniques. In this paper, we propose a novel method that aims at finding significant features by applying machine learning techniques resulting in improving the accuracy in the prediction of cardiovascular disease. The prediction model is introduced with different combinations of features and several known classification techniques. We produce an enhanced performance level with an accuracy level of 88.7% through the prediction model for heart disease with the hybrid random forest with a linear model (HRFLM).

783 citations

Proceedings ArticleDOI
29 Mar 2018
TL;DR: The machine learning algorithm neural networks has proven to be the most accurate and reliable algorithm and hence used in the proposed system to predict the vulnerability of a heart disease given basic symptoms.
Abstract: with the rampant increase in the heart stroke rates at juvenile ages, we need to put a system in place to be able to detect the symptoms of a heart stroke at an early stage and thus prevent it. It is impractical for a common man to frequently undergo costly tests like the ECG and thus there needs to be a system in place which is handy and at the same time reliable, in predicting the chances of a heart disease. Thus we propose to develop an application which can predict the vulnerability of a heart disease given basic symptoms like age, sex, pulse rate etc. The machine learning algorithm neural networks has proven to be the most accurate and reliable algorithm and hence used in the proposed system.

209 citations

Journal ArticleDOI
TL;DR: In this article, the authors proposed a model that incorporates different methods to achieve effective prediction of heart disease, which used efficient Data Collection, Data Pre-processing and Data Transformation methods to create accurate information for the training model.
Abstract: Cardiovascular diseases (CVD) are among the most common serious illnesses affecting human health. CVDs may be prevented or mitigated by early diagnosis, and this may reduce mortality rates. Identifying risk factors using machine learning models is a promising approach. We would like to propose a model that incorporates different methods to achieve effective prediction of heart disease. For our proposed model to be successful, we have used efficient Data Collection, Data Pre-processing and Data Transformation methods to create accurate information for the training model. We have used a combined dataset (Cleveland, Long Beach VA, Switzerland, Hungarian and Stat log). Suitable features are selected by using the Relief, and Least Absolute Shrinkage and Selection Operator (LASSO) techniques. New hybrid classifiers like Decision Tree Bagging Method (DTBM), Random Forest Bagging Method (RFBM), K-Nearest Neighbors Bagging Method (KNNBM), AdaBoost Boosting Method (ABBM), and Gradient Boosting Boosting Method (GBBM) are developed by integrating the traditional classifiers with bagging and boosting methods, which are used in the training process. We have also instrumented some machine learning algorithms to calculate the Accuracy (ACC), Sensitivity (SEN), Error Rate, Precision (PRE) and F1 Score (F1) of our model, along with the Negative Predictive Value (NPR), False Positive Rate (FPR), and False Negative Rate (FNR). The results are shown separately to provide comparisons. Based on the result analysis, we can conclude that our proposed model produced the highest accuracy while using RFBM and Relief feature selection methods (99.05%).

169 citations

Journal ArticleDOI
TL;DR: In this paper, the authors analyzed the heart failure survivors from the dataset of 299 patients admitted in hospital and found significant features and effective data mining techniques that can boost the accuracy of cardiovascular patient's survivor prediction.
Abstract: Cardiovascular disease is a substantial cause of mortality and morbidity in the world. In clinical data analytics, it is a great challenge to predict heart disease survivor. Data mining transforms huge amounts of raw data generated by the health industry into useful information that can help in making informed decisions. Various studies proved that significant features play a key role in improving performance of machine learning models. This study analyzes the heart failure survivors from the dataset of 299 patients admitted in hospital. The aim is to find significant features and effective data mining techniques that can boost the accuracy of cardiovascular patient’s survivor prediction. To predict patient’s survival, this study employs nine classification models: Decision Tree (DT), Adaptive boosting classifier (AdaBoost), Logistic Regression (LR), Stochastic Gradient classifier (SGD), Random Forest (RF), Gradient Boosting classifier (GBM), Extra Tree Classifier (ETC), Gaussian Naive Bayes classifier (G-NB) and Support Vector Machine (SVM). The imbalance class problem is handled by Synthetic Minority Oversampling Technique (SMOTE). Furthermore, machine learning models are trained on the highest ranked features selected by RF. The results are compared with those provided by machine learning algorithms using full set of features. Experimental results demonstrate that ETC outperforms other models and achieves 0.9262 accuracy value with SMOTE in prediction of heart patient’s survival.

162 citations