scispace - formally typeset
Search or ask a question
Journal ArticleDOI

An Efficient, Ensemble-Based Classification Framework for Big Medical Data.

TL;DR: In this article, the authors proposed an efficient, ensemble-based classification framework for big medical data to deal with the problem of insufficient classification algorithms for handling big medical datasets, which is a complicated task in the big data age.
Abstract: Fetching useful information from big medical datasets is a complicated task in the big data age. Various classification algorithms are used in the data mining process to analyze information from the big medical dataset. Nevertheless, these classification algorithms are insufficient to handle big medical data. This work proposes an efficient, ensemble-based classification framework for big medical data to deal with this problem. The proposed work involves initially applying the preprocessing technique to remove noise, missing values, and unwanted features from big medical data. The process selects a subset of classifiers from a pool of classifiers. The selected classifiers are combined to form a hybrid system for efficient classification. The methodology further involves incremental learning from data samples, explaining the predicted outputs, and achieving high classification performance. Java is used for simulation, and the Cleveland Heart Disease big dataset and Diabetes big dataset are used for classification. The experimental result shows that the proposed ensemble algorithm provides an efficient classification compared with existing algorithms based on accuracy, precision, F-measure, recall, and execution time.
Citations
More filters
Journal ArticleDOI
TL;DR: Based on the number of points belonging to a specific class/magnet, the proposed magnetic force (MF) classifier calculates the magnetic force at each discrete point in the feature space as discussed by the authors .
Abstract: There are a plethora of invented classifiers in Machine learning literature, however, there is no optimal classifier in terms of accuracy and time taken to build the trained model, especially with the tremendous development and growth of Big data. Hence, there is still room for improvement. In this paper, we propose a new classification method that is based on the well-known magnetic force. Based on the number of points belonging to a specific class/magnet, the proposed magnetic force (MF) classifier calculates the magnetic force at each discrete point in the feature space. Unknown examples are classified using the magnetic forces recorded in the trained model by various magnets/classes. When compared to existing classifiers, the proposed MF classifier achieves comparable classification accuracy, according to the experimental results utilizing 28 different datasets. More importantly, we found that the proposed MF classifier is significantly faster than all other classifiers tested, particularly when applied to Big datasets and hence could be a viable option for structured Big data classification with some optimization.

3 citations

Journal ArticleDOI
TL;DR: The importance of federated learning in healthcare is highlighted and the privacy and security issues in communicating the e-health data are discussed.
Abstract: Abstract Securing medical records is a significant task in Healthcare communication. The major setback during the transfer of medical data in the electronic medium is the inherent difficulty in preserving data confidentiality and patients’ privacy. The innovation in technology and improvisation in the medical field has given numerous advancements in transferring the medical data with foolproof security. In today’s healthcare industry, federated network operation is gaining significance to deal with distributed network resources due to the efficient handling of privacy issues. The design of a federated security system for healthcare services is one of the intense research topics. This article highlights the importance of federated learning in healthcare. Also, the article discusses the privacy and security issues in communicating the e-health data.

1 citations

Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper explored how clinical uncertainty influences antibiotic prescribing practices among township hospital physicians and village doctors in rural Shandong Province, China, and suggested that interventions to reduce clinical uncertainty may help minimize the unnecessary use of antibiotics in these settings.
Abstract: Objective: This study aimed to explore how clinical uncertainty influences antibiotic prescribing practices among township hospital physicians and village doctors in rural Shandong Province, China. Methods: Qualitative semi-structured interviews were conducted with 30 township hospital physicians and 6 village doctors from rural Shandong Province, China. A multi-stage random sampling method was used to identify respondents. Conceptual content analysis together with Colaizzi’s method were used to generate qualitative codes and identify themes. Results: Three final thematic categories emerged during the data analysis: (1) Incidence and treatment of Upper Respiratory Tract Infections (URTIs) in township hospitals and village clinics; (2) Antibiotic prescribing practices based on the clinical experience of clinicians; (3) Influence of clinical uncertainty on antibiotic prescribing. Respondents from both township hospitals and village clinics reported that URTIs were the most common reason for antibiotic prescriptions at their facilities and that clinical uncertainty appears to be an important driver for the overuse of antibiotics for URTIs. Clinical uncertainty was primarily due to: (1) Diagnostic uncertainty (establishing a relevant diagnosis is hindered by limited diagnostic resources and capacities, as well as limited willingness of patients to pay for investigations), and (2) Insufficient prognostic evidence. As a consequence of the clinical uncertainty caused by both diagnostic and prognostic uncertainty, respondents stated that antibiotics are frequently prescribed for URTIs to prevent both prolonged courses or recurrence of the disease, as well as clinical worsening, hospital admission, or complications. Conclusion: Our study suggests that clinical uncertainty is a key driver for the overuse and misuse of prescribing antibiotics for URTIs in both rural township hospitals and village clinics in Shandong province, China, and that interventions to reduce clinical uncertainty may help minimize the unnecessary use of antibiotics in these settings. Interventions that use clinical rules to identify patients at low risk of complications or hospitalization may be more feasible in the near-future than laboratory-based interventions aimed at reducing diagnostic uncertainty.
Journal ArticleDOI
TL;DR: In this article , the ensemble approaches based on support vector machines are proposed for classifying medical data, which can predict diseases with an accuracy rate of 82.82 and 81.76 percent without feature selection in the preprocessing data stage.
Abstract: In recent years, the increasing volume and availability of healthcare and biomedical data are opening up new opportunities for computational methods to enhance healthcare in many hospitals. Medical data classification is regarded as the challenging task to develop intelligent medical decision support systems in hospitals. In this paper, the ensemble approaches based on support vector machines are proposed for classifying medical data. This research’s key contribution is that the ensemble multiple support vector machines use the function kernel in the style of gradient boosting and bagging to produce a more accurate fusion model than the mono-modality models. Extensive experiments have been conducted on forty benchmark medical datasets from the University of California at Irvine machine learning repository. The classification results show that there is a statistically significant difference (p-values < 0.05) between the proposed approaches and the best classification models. In addition, the empirical analysis of forty medical datasets indicated that our models can predict diseases with an accuracy rate of 82.82 and 81.76 percent without feature selection in the preprocessing data stage.
References
More filters
Journal ArticleDOI
TL;DR: This paper provides an introduction to the WEKA workbench, reviews the history of the project, and, in light of the recent 3.6 stable release, briefly discusses what has been added since the last stable version (Weka 3.4) released in 2003.
Abstract: More than twelve years have elapsed since the first public release of WEKA. In that time, the software has been rewritten entirely from scratch, evolved substantially and now accompanies a text on data mining [35]. These days, WEKA enjoys widespread acceptance in both academia and business, has an active community, and has been downloaded more than 1.4 million times since being placed on Source-Forge in April 2000. This paper provides an introduction to the WEKA workbench, reviews the history of the project, and, in light of the recent 3.6 stable release, briefly discusses what has been added since the last stable version (Weka 3.4) released in 2003.

19,603 citations

01 Jan 2007

17,341 citations

Journal ArticleDOI
TL;DR: A basic taxonomy of feature selection techniques is provided, providing their use, variety and potential in a number of both common as well as upcoming bioinformatics applications.
Abstract: Feature selection techniques have become an apparent need in many bioinformatics applications. In addition to the large pool of techniques that have already been developed in the machine learning and data mining fields, specific applications in bioinformatics have led to a wealth of newly proposed techniques. In this article, we make the interested reader aware of the possibilities of feature selection, providing a basic taxonomy of feature selection techniques, and discussing their use, variety and potential in a number of both common as well as upcoming bioinformatics applications. Contact: yvan.saeys@psb.ugent.be Supplementary information: http://bioinformatics.psb.ugent.be/supplementary_data/yvsae/fsreview

4,706 citations

Journal ArticleDOI
TL;DR: This survey reviews work in machine learning on methods for handling data sets containing large amounts of irrelevant information and describes the advances that have been made in both empirical and theoretical work in this area.

2,869 citations

Journal ArticleDOI
TL;DR: This book can be used by researchers and graduate students in machine learning, data mining, and knowledge discovery, who wish to understand techniques of feature extraction, construction and selection for data pre-processing and to solve large size, real-world problems.
Abstract: From the Publisher: The book can be used by researchers and graduate students in machine learning, data mining, and knowledge discovery, who wish to understand techniques of feature extraction, construction and selection for data pre-processing and to solve large size, real-world problems. The book can also serve as a reference book for those who are conducting research about feature extraction, construction and selection, and are ready to meet the exciting challenges ahead of us.

953 citations