scispace - formally typeset
Proceedings ArticleDOI

SMOTE-Out, SMOTE-Cosine, and Selected-SMOTE: An enhancement strategy to handle imbalance in data level

Fajri Koto
- pp 280-284
TLDR
The results show that the proposed SMOTE give some improvements of B-ACC and F1- score, in order to cover cases which are not already done by SMOTE.
Abstract
The imbalanced dataset often becomes obstacle in supervised learning process. Imbalance is case in which the example in training data belonging to one class is heavily outnumber the examples in the other class. Applying classifier to this dataset results in the failure of classifier to learn the minority class. Synthetic Minority Oversampling Technique (SMOTE) is a well known over-sampling method that tackles imbalance in data level. SMOTE creates synthetic example between two close vectors that lay together. Our study considers three improvements of SMOTE and call them as SMOTE-Out, SMOTE-Cosine, and Selected-SMOTE, in order to cover cases which are not already done by SMOTE. To investigate the proposed method, our experiments were conducted with eighteen different datasets. The results show that our proposed SMOTE give some improvements of B-ACC and F1-Score.

read more

Citations
More filters
Journal ArticleDOI

SMOTE for learning from imbalanced data: progress and challenges, marking the 15-year anniversary

TL;DR: The Synthetic Minority Oversampling Technique (SMOTE) preprocessing algorithm is considered "de facto" standard in the framework of learning from imbalanced data because of its simplicity in the design, as well as its robustness when applied to different type of problems.
Journal ArticleDOI

An ensemble imbalanced classification method based on model dynamic selection driven by data partition hybrid sampling

TL;DR: An ensemble classification method based on model dynamic selection driven by data partition hybrid sampling for imbalanced data that outperforms typical imbalanced classification methods for F-measure and G-mean.
Journal ArticleDOI

The balancing trick: Optimized sampling of imbalanced datasets—A brief survey of the recent State of the Art

TL;DR: A plethora of conventional and recent techniques that address the problem of imbalanced class distribution through intelligent representations of samples from the majority and minority classes, that are given as input to the learning module are surveyed.
Journal ArticleDOI

Overly Optimistic Prediction Results on Imbalanced Data: a Case Study of Flaws and Benefits when Applying Over-sampling

TL;DR: This work evaluates the actual impact of over-sampling on predictive performance, when applied prior to data partitioning, using the same methodologies of related studies, to provide a realistic view of these methodologies' generalization capabilities.
Journal ArticleDOI

Uncertainty Based Under-Sampling for Learning Naive Bayes Classifiers Under Imbalanced Data Sets

TL;DR: An under-sampling approach is proposed, which leverages the usage of a Naive Bayes classifier in order to select the most informative instances from the available training set, based on a random initial selection.
References
More filters
Journal ArticleDOI

LIBSVM: A library for support vector machines

TL;DR: Issues such as solving SVM optimization problems theoretical convergence multiclass classification probability estimates and parameter selection are discussed in detail.
Journal ArticleDOI

SMOTE: synthetic minority over-sampling technique

TL;DR: In this article, a method of over-sampling the minority class involves creating synthetic minority class examples, which is evaluated using the area under the Receiver Operating Characteristic curve (AUC) and the ROC convex hull strategy.
Journal ArticleDOI

SMOTE: Synthetic Minority Over-sampling Technique

TL;DR: In this article, a method of over-sampling the minority class involves creating synthetic minority class examples, which is evaluated using the area under the Receiver Operating Characteristic curve (AUC) and the ROC convex hull strategy.
Journal ArticleDOI

A study of the behavior of several methods for balancing machine learning training data

TL;DR: This work performs a broad experimental evaluation involving ten methods, three of them proposed by the authors, to deal with the class imbalance problem in thirteen UCI data sets, and shows that, in general, over-sampling methods provide more accurate results than under-sampled methods considering the area under the ROC curve (AUC).
Book ChapterDOI

Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning

TL;DR: Two new minority over-sampling methods are presented, borderline- SMOTE1 and borderline-SMOTE2, in which only the minority examples near the borderline are over- Sampling, which achieve better TP rate and F-value than SMOTE and random over-Sampling methods.
Related Papers (5)