scispace - formally typeset
Open AccessJournal ArticleDOI

A Comparative Performance Analysis of Data Resampling Methods on Imbalance Medical Data

Reads0
Chats0
TLDR
In this article, the performance of 23 class imbalance methods (resampling and hybrid systems) with three classical classifiers (logistic regression, random forest, and LinearSVC) was used to identify the best imbalance techniques suitable for medical datasets.
Abstract
Medical datasets are usually imbalanced, where negative cases severely outnumber positive cases. Therefore, it is essential to deal with this data skew problem when training machine learning algorithms. This study uses two representative lung cancer datasets, PLCO and NLST, with imbalance ratios (the proportion of samples in the majority class to those in the minority class) of 24.7 and 25.0, respectively, to predict lung cancer incidence. This research uses the performance of 23 class imbalance methods (resampling and hybrid systems) with three classical classifiers (logistic regression, random forest, and LinearSVC) to identify the best imbalance techniques suitable for medical datasets. Resampling includes ten under-sampling methods (RUS, etc.), seven over-sampling methods (SMOTE, etc.), and two integrated sampling methods (SMOTEENN, SMOTE-Tomek). Hybrid systems include (Balanced Bagging, etc.). The results show that class imbalance learning can improve the classification ability of the model. Compared with other imbalanced techniques, under-sampling techniques have the highest standard deviation (SD), and over-sampling techniques have the lowest SD. Over-sampling is a stable method, and the AUC in the model is generally higher than in other ways. Using ROS, the random forest performs the best predictive ability and is more suitable for the lung cancer datasets used in this study. The code is available at https://mkhushi.github.io/

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

An Efficient Deep Learning-Based Skin Cancer Classifier for an Imbalanced Dataset

TL;DR: A novel deep learning-based skin cancer detector using an imbalanced dataset was proposed and RegNetY-320 outperformed InceptionV3 and AlexNet in terms of the accuracy, F1-score, and receiver operating characteristic (ROC) curve both on the imbalanced and balanced datasets.
Journal ArticleDOI

A Novel Method for Performance Measurement of Public Educational Institutions Using Machine Learning Models

TL;DR: In this paper, the authors proposed a model to measure institutional performance based on key performance indicators through data mining techniques, such as J48 decision tree, support vector machines, random forest, rotation forest, and artificial neural networks.
Journal ArticleDOI

An Efficient Deep Learning Model to Detect COVID-19 Using Chest X-ray Images

TL;DR: A Deep Learning Method (DLM) is used to detect COVID-19 using chest X-ray (CXR) images and shows that ML approaches may be used for rapid analysis of CXR images and thus enable radiologists to filter potential candidates in a time-effective manner to detect the disease.
Journal ArticleDOI

Bias and Class Imbalance in Oncologic Data—Towards Inclusive and Transferrable AI in Large Scale Oncology Data Sets

TL;DR: This review discusses the essential strategies for addressing and mitigating the class imbalance problems for different medical data types in the oncologic domain and proposes a set of guidelines aimed at limiting and addressing data and algorithmic bias.
Journal ArticleDOI

Wearable IMU-Based Human Activity Recognition Algorithm for Clinical Balance Assessment Using 1D-CNN and GRU Ensemble Model.

TL;DR: In this article, a wearable inertial measurement unit system was introduced to assess patients via the Berg balance scale (BBS), a clinical test for balance assessment, and an automatic scoring algorithm was developed.
References
More filters
Posted Content

Class Imbalance Problem in Data Mining Review

TL;DR: There are different methods available for classification of imbalance data set which are divided into three main categories, the algorithmic approach, data-preprocessing approach and feature selection approach as mentioned in this paper.
Proceedings ArticleDOI

On the Class Imbalance Problem

TL;DR: This paper reviewed academic activities special for the class imbalance problem and investigated various remedies in four different levels according to learning phases, and showed some future directions at last.
Journal ArticleDOI

An instance level analysis of data complexity

TL;DR: This paper identifies instances that are hard to classify correctly (instance hardness) by classifying over 190,000 instances from 64 data sets with 9 learning algorithms and finds that class overlap is a principal contributor to instance hardness.

The effect of class distribution on classifier learning: an empirical study

TL;DR: This study shows that the naturally occurring class distribution often is not best for learning, and often substantially better performance can be obtained by using a different class distribution.
Journal Article

Evaluation Measures for Models Assessment over Imbalanced Data Sets

TL;DR: This article presents a set of alternative for imbalanced data learning assessment, using a combined measures (G-means, likelihood ratios, Discriminant power, F-Measure Balanced Accuracy, Youden index, Matthews correlation coefficient), and graphical performance assessment, that aim to provide a more credible evaluation.
Related Papers (5)