scispace - formally typeset
Open accessJournal ArticleDOI: 10.1109/ACCESS.2021.3064084

Improving the Prediction of Heart Failure Patients’ Survival Using SMOTE and Effective Data Mining Techniques

04 Mar 2021-IEEE Access (IEEE)-Vol. 9, pp 39707-39716
Abstract: Cardiovascular disease is a substantial cause of mortality and morbidity in the world. In clinical data analytics, it is a great challenge to predict heart disease survivor. Data mining transforms huge amounts of raw data generated by the health industry into useful information that can help in making informed decisions. Various studies proved that significant features play a key role in improving performance of machine learning models. This study analyzes the heart failure survivors from the dataset of 299 patients admitted in hospital. The aim is to find significant features and effective data mining techniques that can boost the accuracy of cardiovascular patient’s survivor prediction. To predict patient’s survival, this study employs nine classification models: Decision Tree (DT), Adaptive boosting classifier (AdaBoost), Logistic Regression (LR), Stochastic Gradient classifier (SGD), Random Forest (RF), Gradient Boosting classifier (GBM), Extra Tree Classifier (ETC), Gaussian Naive Bayes classifier (G-NB) and Support Vector Machine (SVM). The imbalance class problem is handled by Synthetic Minority Oversampling Technique (SMOTE). Furthermore, machine learning models are trained on the highest ranked features selected by RF. The results are compared with those provided by machine learning algorithms using full set of features. Experimental results demonstrate that ETC outperforms other models and achieves 0.9262 accuracy value with SMOTE in prediction of heart patient’s survival.

... read more

Topics: Boosting (machine learning) (60%), Naive Bayes classifier (57%), Gradient boosting (56%) ... show more
Citations
  More

18 results found


Journal ArticleDOI: 10.1016/J.BBE.2021.05.002
Mohammad Nejadeh1, Peyman Bayat1, Jalal Kheirkhah1, Jalal Kheirkhah2  +2 moreInstitutions (2)
Abstract: Cardiac Resynchronization Therapy Defibrillator (CRT-D) is a method to improve heart rate variability and arrhythmia-related symptoms in heart failure patients. According to clinical reports, CRT is not entirely safe and risk-free like other surgery. It can reduce heart failure risks, shorten hospital stays, and enhance the patients' quality of life. The present study aims to perform the proper selection of patients before surgery to avoid potential costs. This article focuses on the data collection of heart failure patients' activities, the process of features effective extraction, and identifying an optimal pattern using a Deep Learning (DL) algorithm. Also, the main tasks of the proposed methods include the use of qualitative indicators for initial feature extraction, oversampling from minority class, Density-Based Spatial Clustering of Applications with Noise (DBSCAN) hierarchical clustering, selecting features from Low error clusters, selecting samples from high error clusters, and classification using customized DL configuration. The research data collection consisted of 209 patients with 60 demographic, clinical, laboratory, ECG, and echo features. In addition, features were analyzed based on their significance in predicting CRT response status. The DL algorithm, which used dense layers and convolution for its architecture, was employed to heart failure patients optimally identify the treatment status. The proposed method predicted the response to cardiac resynchronization therapy with an error rate of 91.85% and an Area Under Curve (AUC) of 0.957 and a sensitivity of 94.22%.

... read more

2 Citations


Journal ArticleDOI: 10.1108/DTA-05-2020-0109
Abstract: PurposeGene selection is considered as the fundamental process in the bioinformatics field. The existing methodologies pertain to cancer classification are mostly clinical basis, and its diagnosis capability is limited. Nowadays, the significant problems of cancer diagnosis are solved by the utilization of gene expression data. The researchers have been introducing many possibilities to diagnose cancer appropriately and effectively. This paper aims to develop the cancer data classification using gene expression data.Design/methodology/approachThe proposed classification model involves three main phases: “(1) Feature extraction, (2) Optimal Feature Selection and (3) Classification”. Initially, five benchmark gene expression datasets are collected. From the collected gene expression data, the feature extraction is performed. To diminish the length of the feature vectors, optimal feature selection is performed, for which a new meta-heuristic algorithm termed as quantum-inspired immune clone optimization algorithm (QICO) is used. Once the relevant features are selected, the classification is performed by a deep learning model called recurrent neural network (RNN). Finally, the experimental analysis reveals that the proposed QICO-based feature selection model outperforms the other heuristic-based feature selection and optimized RNN outperforms the other machine learning methods.FindingsThe proposed QICO-RNN is acquiring the best outcomes at any learning percentage. On considering the learning percentage 85, the accuracy of the proposed QICO-RNN was 3.2% excellent than RNN, 4.3% excellent than RF, 3.8% excellent than NB and 2.1% excellent than KNN for Dataset 1. For Dataset 2, at learning percentage 35, the accuracy of the proposed QICO-RNN was 13.3% exclusive than RNN, 8.9% exclusive than RF and 14.8% exclusive than NB and KNN. Hence, the developed QICO algorithm is performing well in classifying the cancer data using gene expression data accurately.Originality/valueThis paper introduces a new optimal feature selection model using QICO and QICO-based RNN for effective classification of cancer data using gene expression data. This is the first work that utilizes an optimal feature selection model using QICO and QICO-RNN for effective classification of cancer data using gene expression data.

... read more

Topics: Clone (cell biology) (64%)

1 Citations


Journal ArticleDOI: 10.1007/S12046-021-01631-2
N Ramshankar, P M Joe Prathap1Institutions (1)
Abstract: Sentiment analysis is the most frequently adopted technique and is also termed as opinion mining. A numerous data is generated by the E-Commerce portals. These data will assist the online retailers for knowing the expectation of customers. From the numerous sources, sentiment analysis can process huge amount of online opinions. This paper develops a novel sentiment classification approach, named BH-GWO-Fuzzy, in E-commerce application for framing an efficient recommendation system. The proposed model undergoes five processing steps, such as (a) Data acquisition, (b) Pre-processing, (c) Feature extraction, (d) Weighted feature extraction, and (e) Classification. The pre-processing is done by three steps, namely stop word removal, stemming, and blank space removal. Further, the feature extraction is performed by measuring the joint similarity score and cross similarity score for the positive, negative and neutral keywords from the tweets. From the resultant features, the weighted feature extraction is carried out, in which the weight is multiplied with the features to attain the better scalable feature suitable for classification. Here, the weight is tuned or optimized by the hybrid Black Hole-based Grey Wolf Optimization (BH-GWO). The BH-GWO is developed by integrating BH and GWO algorithms. After that, the extracted features are subjected to Adaptive Fuzzy Classifier, in which the membership function is optimized by the same hybrid BH-GWO algorithm. Finally, the sentiment classification for recommendation system will be empirically evaluated against the gathered benchmark dataset using diverse machine learning algorithms.The accuracy of the BH-GWO-Fuzzy is 11.7% better than Fuzzy, 28.3% better than K-Nearest Neighbor (KNN), 20.2% better than Support Vector Machine (SVM), and 18.75% better than Neural Network (NN) at learning percentage 45 for dataset 1.

... read more

Topics: Feature extraction (59%), Sentiment analysis (57%), Feature (machine learning) (56%) ... show more

Journal ArticleDOI: 10.1016/J.BSPC.2021.103185
Abstract: Heart disease is increasing, and their detection is a significant concern. With the several technologies developed, the approaches for detection mechanisms can be improved further with better and improved algorithms. Data related with patient’s health are of huge amount and stored in the larger space of cloud storage. Accessing cloud storage is an easy task, where the data stored is available to many users of cloud and there comes the need of security. Generally security can be improved using encryption and decryption algorithms. In this paper, a framework using the ResNet-50 classifier approach for secured transmission of heart disease features is performed. This study focused on an enhanced ElGamal encryption-decryption method for the encryption of data with a generated private key and a public key for decryption to better access the data. The data encrypted are then decrypted when the user request data. With Convolutional Neural Network classifier of ResNet-50 with its near 50 layers, the refinement or classification process is performed. The heart disease dataset from the UCI heart disease repository is considered for the evaluation of the proposed work. Further, with feature selection method, better selection of input can be filtered from the selected dataset. The results are obtained with respect to various performance measures, then compared and analyzed with some of the existing methodologies. The results proved to be better than other existing frameworks.

... read more

Topics: ElGamal encryption (58%), Encryption (57%), Cloud storage (56%) ... show more

Proceedings ArticleDOI: 10.1109/ACMI53878.2021.9528188
08 Jul 2021-
Abstract: Heart disease is a vital cause of mortality in this world. The number of patients with this noxious disease is rising every day. It is taking millions of lives each year. It is dismaying that there are not many effective ways to detect heart disease gleaned on elementary information. Nowadays, in order to achieve unprecedented results, Machine Learning (ML) has been exclusively used in various fields. So, we have come up with a proposition of a heart disease prediction model using ML techniques in this paper to accomplish an effective result. We have used different ML classifiers such as Gaussian Naive Bayes, Support Vector Machine (SVM), K-Nearest Neighbor (KNN) and applied Soft Voting on them. The result shows that the Voting methods give us the most effective results with an Accuracy of 92.42%, Precision of 92.50%, Recall of 92.22% and F1-score of 92.34%. Our purpose is to detect this deleterious disease more precisely to enhance the medical field.

... read more

Topics: Naive Bayes classifier (51%)

References
  More

46 results found


Open accessJournal ArticleDOI: 10.1023/A:1010933404324
Leo Breiman1Institutions (1)
01 Oct 2001-
Abstract: Random forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest. The generalization error for forests converges a.s. to a limit as the number of trees in the forest becomes large. The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation between them. Using a random selection of features to split each node yields error rates that compare favorably to Adaboost (Y. Freund & R. Schapire, Machine Learning: Proceedings of the Thirteenth International conference, aaa, 148–156), but are more robust with respect to noise. Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the splitting. Internal estimates are also used to measure variable importance. These ideas are also applicable to regression.

... read more

Topics: Random forest (63%), Multivariate random variable (57%), Random subspace method (57%) ... show more

58,232 Citations


Open access
01 Jan 2007-

17,312 Citations


Open accessJournal ArticleDOI: 10.1214/AOS/1013203451
Jerome H. Friedman1Institutions (1)
Abstract: Function estimation/approximation is viewed from the perspective of numerical optimization in function space, rather than parameter space. A connection is made between stagewise additive expansions and steepest-descent minimization. A general gradient descent “boosting” paradigm is developed for additive expansions based on any fitting criterion.Specific algorithms are presented for least-squares, least absolute deviation, and Huber-M loss functions for regression, and multiclass logistic likelihood for classification. Special enhancements are derived for the particular case where the individual additive components are regression trees, and tools for interpreting such “TreeBoost” models are presented. Gradient boosting of regression trees produces competitive, highly robust, interpretable procedures for both regression and classification, especially appropriate for mining less than clean data. Connections between this approach and the boosting methods of Freund and Shapire and Friedman, Hastie and Tibshirani are discussed.

... read more

Topics: Gradient boosting (69%), BrownBoost (62%), LogitBoost (60%) ... show more

12,602 Citations


Journal ArticleDOI: 10.1161/CIR.0000000000000659
05 Mar 2019-Circulation
Abstract: March 5, 2019 e1 WRITING GROUP MEMBERS Emelia J. Benjamin, MD, ScM, FAHA, Chair Paul Muntner, PhD, MHS, FAHA, Vice Chair Alvaro Alonso, MD, PhD, FAHA Marcio S. Bittencourt, MD, PhD, MPH Clifton W. Callaway, MD, FAHA April P. Carson, PhD, MSPH, FAHA Alanna M. Chamberlain, PhD Alexander R. Chang, MD, MS Susan Cheng, MD, MMSc, MPH, FAHA Sandeep R. Das, MD, MPH, MBA, FAHA Francesca N. Delling, MD, MPH Luc Djousse, MD, ScD, MPH Mitchell S.V. Elkind, MD, MS, FAHA Jane F. Ferguson, PhD, FAHA Myriam Fornage, PhD, FAHA Lori Chaffin Jordan, MD, PhD, FAHA Sadiya S. Khan, MD, MSc Brett M. Kissela, MD, MS Kristen L. Knutson, PhD Tak W. Kwan, MD, FAHA Daniel T. Lackland, DrPH, FAHA Tené T. Lewis, PhD Judith H. Lichtman, PhD, MPH, FAHA Chris T. Longenecker, MD Matthew Shane Loop, PhD Pamela L. Lutsey, PhD, MPH, FAHA Seth S. Martin, MD, MHS, FAHA Kunihiro Matsushita, MD, PhD, FAHA Andrew E. Moran, MD, MPH, FAHA Michael E. Mussolino, PhD, FAHA Martin O’Flaherty, MD, MSc, PhD Ambarish Pandey, MD, MSCS Amanda M. Perak, MD, MS Wayne D. Rosamond, PhD, MS, FAHA Gregory A. Roth, MD, MPH, FAHA Uchechukwu K.A. Sampson, MD, MBA, MPH, FAHA Gary M. Satou, MD, FAHA Emily B. Schroeder, MD, PhD, FAHA Svati H. Shah, MD, MHS, FAHA Nicole L. Spartano, PhD Andrew Stokes, PhD David L. Tirschwell, MD, MS, MSc, FAHA Connie W. Tsao, MD, MPH, Vice Chair Elect Mintu P. Turakhia, MD, MAS, FAHA Lisa B. VanWagner, MD, MSc, FAST John T. Wilkins, MD, MS, FAHA Sally S. Wong, PhD, RD, CDN, FAHA Salim S. Virani, MD, PhD, FAHA, Chair Elect On behalf of the American Heart Association Council on Epidemiology and Prevention Statistics Committee and Stroke Statistics Subcommittee

... read more

Topics: Heart disease (60%), Stroke (60%), Epidemiology (51%)

3,812 Citations


Open accessJournal ArticleDOI: 10.1007/S10994-006-6226-1
Pierre Geurts1, Damien Ernst1, Louis Wehenkel1Institutions (1)
01 Apr 2006-Machine Learning
Abstract: This paper proposes a new tree-based ensemble method for supervised classification and regression problems. It essentially consists of randomizing strongly both attribute and cut-point choice while splitting a tree node. In the extreme case, it builds totally randomized trees whose structures are independent of the output values of the learning sample. The strength of the randomization can be tuned to problem specifics by the appropriate choice of a parameter. We evaluate the robustness of the default choice of this parameter, and we also provide insight on how to adjust it in particular situations. Besides accuracy, the main strength of the resulting algorithm is computational efficiency. A bias/variance analysis of the Extra-Trees algorithm is also provided as well as a geometrical and a kernel characterization of the models induced.

... read more

Topics: Ensemble learning (55%), Supervised learning (54%), Bias–variance tradeoff (54%) ... show more

3,644 Citations


Performance
Metrics
No. of citations received by the Paper in previous years
YearCitations
20222
202116
Network Information