Adaptive semi-unsupervised weighted oversampling (A-SUWO) for imbalanced datasets

doi:10.1016/J.ESWA.2015.10.031

Journal ArticleDOI

Adaptive semi-unsupervised weighted oversampling (A-SUWO) for imbalanced datasets

Iman Nekooeimehr, +1 more

- 15 Mar 2016 -

Expert Systems With Applications

- Vol. 46, pp 405-416

TLDR

A new oversampling method called Adaptive Semi-Unsupervised Weighted Oversampling (A-SUWO) for imbalanced binary dataset classification that aims to identify hard-to-learn instances by considering minority instances from each sub-cluster that are closer to the borderline.

Abstract:

A new oversampling method for imbalanced dataset classification is presented.It clusters the minority class and identifies borderline minority instances.Considering majority class during minority class clustering improves oversampling.Cluster size after oversampling should be dependent on its misclassification error.Generated synthetic instances improved subsequent classification. In many applications, the dataset for classification may be highly imbalanced where most of the instances in the training set may belong to one of the classes (majority class), while only a few instances are from the other class (minority class). Conventional classifiers will strongly favor the majority class and ignore the minority instances. In this paper, we present a new oversampling method called Adaptive Semi-Unsupervised Weighted Oversampling (A-SUWO) for imbalanced binary dataset classification. The proposed method clusters the minority instances using a semi-unsupervised hierarchical clustering approach and adaptively determines the size to oversample each sub-cluster using its classification complexity and cross validation. Then, the minority instances are oversampled depending on their Euclidean distance to the majority class. A-SUWO aims to identify hard-to-learn instances by considering minority instances from each sub-cluster that are closer to the borderline. It also avoids generating synthetic minority instances that overlap with the majority class by considering the majority class in the clustering and oversampling stages. Results demonstrate that the proposed method achieves significantly better results in most datasets compared with other sampling methods.

Adaptive semi-unsupervised weighted oversampling (A-SUWO) for imbalanced datasets

Citations

Learning from class-imbalanced data

SMOTE for learning from imbalanced data: progress and challenges, marking the 15-year anniversary

Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE

Effective data generation for imbalanced learning using conditional generative adversarial networks

An empirical comparison and evaluation of minority oversampling techniques on a large number of imbalanced datasets

References

LIBSVM: A library for support vector machines

Generalized Linear Models

SMOTE: synthetic minority over-sampling technique

Nearest neighbor pattern classification

SMOTE: Synthetic Minority Over-sampling Technique

Related Papers (5)

SMOTE: synthetic minority over-sampling technique

Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning

Learning from Imbalanced Data

ADASYN: Adaptive synthetic sampling approach for imbalanced learning

Learning from class-imbalanced data