Improving the Prediction of Heart Failure Patients’ Survival Using SMOTE and Effective Data Mining Techniques

doi:10.1109/ACCESS.2021.3064084

Open AccessJournal ArticleDOI

Improving the Prediction of Heart Failure Patients’ Survival Using SMOTE and Effective Data Mining Techniques

Abid Ishaq, +6 more

- 04 Mar 2021 -

IEEE Access

- Vol. 9, pp 39707-39716

TLDR

In this paper, the authors analyzed the heart failure survivors from the dataset of 299 patients admitted in hospital and found significant features and effective data mining techniques that can boost the accuracy of cardiovascular patient's survivor prediction.

Abstract:

Cardiovascular disease is a substantial cause of mortality and morbidity in the world. In clinical data analytics, it is a great challenge to predict heart disease survivor. Data mining transforms huge amounts of raw data generated by the health industry into useful information that can help in making informed decisions. Various studies proved that significant features play a key role in improving performance of machine learning models. This study analyzes the heart failure survivors from the dataset of 299 patients admitted in hospital. The aim is to find significant features and effective data mining techniques that can boost the accuracy of cardiovascular patient’s survivor prediction. To predict patient’s survival, this study employs nine classification models: Decision Tree (DT), Adaptive boosting classifier (AdaBoost), Logistic Regression (LR), Stochastic Gradient classifier (SGD), Random Forest (RF), Gradient Boosting classifier (GBM), Extra Tree Classifier (ETC), Gaussian Naive Bayes classifier (G-NB) and Support Vector Machine (SVM). The imbalance class problem is handled by Synthetic Minority Oversampling Technique (SMOTE). Furthermore, machine learning models are trained on the highest ranked features selected by RF. The results are compared with those provided by machine learning algorithms using full set of features. Experimental results demonstrate that ETC outperforms other models and achieves 0.9262 accuracy value with SMOTE in prediction of heart patient’s survival.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

A Neural Network Ensemble With Feature Engineering for Improved Credit Card Fraud Detection

- 01 Jan 2022 -

IEEE Access

TL;DR: Wang et al. as discussed by the authors proposed an efficient approach to detect credit card fraud using a neural network ensemble classifier and a hybrid data resampling method, which is obtained using a long short-term memory (LSTM) neural network as the base learner in the adaptive boosting technique.

...read moreread less

Journal ArticleDOI

A Neural Network Ensemble With Feature Engineering for Improved Credit Card Fraud Detection

Ebenezer Esenogho, +4 more

IEEE Access

TL;DR: Results show that the classifiers performed better when trained with the resampled data, and the proposed LSTM ensemble outperformed the other algorithms by obtaining a sensitivity and specificity of 0.996 and 0.998, respectively.

...read moreread less

Journal ArticleDOI

A Comprehensive Investigation of the Performances of Different Machine Learning Classifiers with SMOTE-ENN Oversampling Technique and Hyperparameter Optimization for Imbalanced Heart Failure Dataset

Mirza Muntasir Nishat, +6 more

- 09 Mar 2022 -

Scientific Programming

TL;DR: This comprehensive investigation portrays a vivid visualization of the applicability and compatibility of different machine learning algorithms in such an imbalanced dataset and presents the role of the SMOTE-ENN algorithm and hyperparameter optimization for enhancing the performances of the machinelearning algorithms.

...read moreread less

Journal ArticleDOI

A CNN-based novel solution for determining the survival status of heart failure patients with clinical record data: numeric to image

Muhammet Fatih Aslan, +2 more

- 01 Jul 2021 -

Biomedical Signal Processing and Control

TL;DR: In this paper, a heart failure dataset consisting of numerical values only, needs to be converted into image data for analysis using the advantages of CNN and the highest accuracy of 95.13 % is obtained with the ResNet18 model and this accuracy is superior to studies using previous numerical raw data.

...read moreread less

Journal ArticleDOI

Bidimensional and Tridimensional Poincaré Maps in Cardiology: A Multiclass Machine Learning Study

Leandro Donisi, +6 more

- 02 Feb 2022 -

Electronics

TL;DR: The study shows the proposed combination of unconventional features extracted from Poincaré maps and well-known machine learning algorithms represents a valuable approach to automatically classify patients with different cardiac diseases.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Random Forests

Leo Breiman

TL;DR: Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the forest, and are also applicable to regression.

...read moreread less

Journal ArticleDOI

Greedy function approximation: A gradient boosting machine.

Jerome H. Friedman

- 01 Oct 2001 -

Annals of Statistics

TL;DR: A general gradient descent boosting paradigm is developed for additive expansions based on any fitting criterion, and specific algorithms are presented for least-squares, least absolute deviation, and Huber-M loss functions for regression, and multiclass logistic likelihood for classification.

...read moreread less

UCI Machine Learning Repository

A. Asuncion

Journal ArticleDOI

Heart Disease and Stroke Statistics—2019 Update: A Report From the American Heart Association

Emelia J. Benjamin, +47 more

- 05 Mar 2019 -

Circulation

TL;DR: March 5, 2019 e1 WRITING GROUP MEMBERS Emelia J. Virani, MD, PhD, FAHA, Chair Elect On behalf of the American Heart Association Council on Epidemiology and Prevention Statistics Committee and Stroke Statistics Subcommittee.

...read moreread less

Journal ArticleDOI

Extremely randomized trees

Pierre Geurts, +2 more

- 01 Apr 2006 -

Machine Learning

TL;DR: A new tree-based ensemble method for supervised classification and regression problems that consists of randomizing strongly both attribute and cut-point choice while splitting a tree node and builds totally randomized trees whose structures are independent of the output values of the learning sample.

...read moreread less