SMOTE-Out, SMOTE-Cosine, and Selected-SMOTE: An enhancement strategy to handle imbalance in data level

doi:10.1109/ICACSIS.2014.7065849

Proceedings ArticleDOI

SMOTE-Out, SMOTE-Cosine, and Selected-SMOTE: An enhancement strategy to handle imbalance in data level

- pp 280-284

TLDR

The results show that the proposed SMOTE give some improvements of B-ACC and F1- score, in order to cover cases which are not already done by SMOTE.

Abstract:

The imbalanced dataset often becomes obstacle in supervised learning process. Imbalance is case in which the example in training data belonging to one class is heavily outnumber the examples in the other class. Applying classifier to this dataset results in the failure of classifier to learn the minority class. Synthetic Minority Oversampling Technique (SMOTE) is a well known over-sampling method that tackles imbalance in data level. SMOTE creates synthetic example between two close vectors that lay together. Our study considers three improvements of SMOTE and call them as SMOTE-Out, SMOTE-Cosine, and Selected-SMOTE, in order to cover cases which are not already done by SMOTE. To investigate the proposed method, our experiments were conducted with eighteen different datasets. The results show that our proposed SMOTE give some improvements of B-ACC and F1-Score.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

SMOTE for learning from imbalanced data: progress and challenges, marking the 15-year anniversary

Alberto Fernández, +3 more

- 01 Jan 2018 -

Journal of Artificial Intelligence Resea...

TL;DR: The Synthetic Minority Oversampling Technique (SMOTE) preprocessing algorithm is considered "de facto" standard in the framework of learning from imbalanced data because of its simplicity in the design, as well as its robustness when applied to different type of problems.

...read moreread less

Journal ArticleDOI

An ensemble imbalanced classification method based on model dynamic selection driven by data partition hybrid sampling

Xin Gao, +7 more

- 01 Dec 2020 -

Expert Systems With Applications

TL;DR: An ensemble classification method based on model dynamic selection driven by data partition hybrid sampling for imbalanced data that outperforms typical imbalanced classification methods for F-measure and G-mean.

...read moreread less

Journal ArticleDOI

The balancing trick: Optimized sampling of imbalanced datasets—A brief survey of the recent State of the Art

Seba Susan, +1 more

TL;DR: A plethora of conventional and recent techniques that address the problem of imbalanced class distribution through intelligent representations of samples from the majority and minority classes, that are given as input to the learning module are surveyed.

...read moreread less

Journal ArticleDOI

Overly Optimistic Prediction Results on Imbalanced Data: a Case Study of Flaws and Benefits when Applying Over-sampling

Gilles Vandewiele, +11 more

- 15 Jan 2020 -

arXiv: Signal Processing

TL;DR: This work evaluates the actual impact of over-sampling on predictive performance, when applied prior to data partitioning, using the same methodologies of related studies, to provide a realistic view of these methodologies' generalization capabilities.

...read moreread less

Journal ArticleDOI

Uncertainty Based Under-Sampling for Learning Naive Bayes Classifiers Under Imbalanced Data Sets

Christos K. Aridas, +4 more

- 01 Jan 2020 -

IEEE Access

TL;DR: An under-sampling approach is proposed, which leverages the usage of a Naive Bayes classifier in order to select the most informative instances from the available training set, based on a random initial selection.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

LIBSVM: A library for support vector machines

Chih-Chung Chang, +1 more

- 06 May 2011 -

ACM Transactions on Intelligent Systems ...

TL;DR: Issues such as solving SVM optimization problems theoretical convergence multiclass classification probability estimates and parameter selection are discussed in detail.

...read moreread less

Journal ArticleDOI

SMOTE: synthetic minority over-sampling technique

Nitesh V. Chawla, +3 more

- 01 Jan 2002 -

Journal of Artificial Intelligence Resea...

TL;DR: In this article, a method of over-sampling the minority class involves creating synthetic minority class examples, which is evaluated using the area under the Receiver Operating Characteristic curve (AUC) and the ROC convex hull strategy.

...read moreread less

Journal ArticleDOI

SMOTE: Synthetic Minority Over-sampling Technique

Nitesh V. Chawla, +3 more

- 09 Jun 2011 -

arXiv: Artificial Intelligence

TL;DR: In this article, a method of over-sampling the minority class involves creating synthetic minority class examples, which is evaluated using the area under the Receiver Operating Characteristic curve (AUC) and the ROC convex hull strategy.

...read moreread less

Journal ArticleDOI

A study of the behavior of several methods for balancing machine learning training data

Gustavo E. A. P. A. Batista, +2 more

- 01 Jun 2004 -

Sigkdd Explorations

TL;DR: This work performs a broad experimental evaluation involving ten methods, three of them proposed by the authors, to deal with the class imbalance problem in thirteen UCI data sets, and shows that, in general, over-sampling methods provide more accurate results than under-sampled methods considering the area under the ROC curve (AUC).

...read moreread less

Book ChapterDOI

Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning

Hui Han, +2 more

TL;DR: Two new minority over-sampling methods are presented, borderline- SMOTE1 and borderline-SMOTE2, in which only the minority examples near the borderline are over- Sampling, which achieve better TP rate and F-value than SMOTE and random over-Sampling methods.

...read moreread less