Proceedings ArticleDOI
SMOTE-Out, SMOTE-Cosine, and Selected-SMOTE: An enhancement strategy to handle imbalance in data level
Fajri Koto
- pp 280-284
TLDR
The results show that the proposed SMOTE give some improvements of B-ACC and F1- score, in order to cover cases which are not already done by SMOTE.Abstract:
The imbalanced dataset often becomes obstacle in supervised learning process. Imbalance is case in which the example in training data belonging to one class is heavily outnumber the examples in the other class. Applying classifier to this dataset results in the failure of classifier to learn the minority class. Synthetic Minority Oversampling Technique (SMOTE) is a well known over-sampling method that tackles imbalance in data level. SMOTE creates synthetic example between two close vectors that lay together. Our study considers three improvements of SMOTE and call them as SMOTE-Out, SMOTE-Cosine, and Selected-SMOTE, in order to cover cases which are not already done by SMOTE. To investigate the proposed method, our experiments were conducted with eighteen different datasets. The results show that our proposed SMOTE give some improvements of B-ACC and F1-Score.read more
Citations
More filters
Journal ArticleDOI
SMOTE for learning from imbalanced data: progress and challenges, marking the 15-year anniversary
TL;DR: The Synthetic Minority Oversampling Technique (SMOTE) preprocessing algorithm is considered "de facto" standard in the framework of learning from imbalanced data because of its simplicity in the design, as well as its robustness when applied to different type of problems.
Journal ArticleDOI
An ensemble imbalanced classification method based on model dynamic selection driven by data partition hybrid sampling
TL;DR: An ensemble classification method based on model dynamic selection driven by data partition hybrid sampling for imbalanced data that outperforms typical imbalanced classification methods for F-measure and G-mean.
Journal ArticleDOI
The balancing trick: Optimized sampling of imbalanced datasets—A brief survey of the recent State of the Art
Seba Susan,Amitesh Kumar +1 more
TL;DR: A plethora of conventional and recent techniques that address the problem of imbalanced class distribution through intelligent representations of samples from the majority and minority classes, that are given as input to the learning module are surveyed.
Journal ArticleDOI
Overly Optimistic Prediction Results on Imbalanced Data: a Case Study of Flaws and Benefits when Applying Over-sampling
Gilles Vandewiele,Isabelle Dehaene,György Kovács,Lucas Sterckx,Olivier Janssens,Femke Ongenae,Femke De Backere,Filip De Turck,Kristien Roelens,Johan Decruyenaere,Sofie Van Hoecke,Thomas Demeester +11 more
TL;DR: This work evaluates the actual impact of over-sampling on predictive performance, when applied prior to data partitioning, using the same methodologies of related studies, to provide a realistic view of these methodologies' generalization capabilities.
Journal ArticleDOI
Uncertainty Based Under-Sampling for Learning Naive Bayes Classifiers Under Imbalanced Data Sets
TL;DR: An under-sampling approach is proposed, which leverages the usage of a Naive Bayes classifier in order to select the most informative instances from the available training set, based on a random initial selection.
References
More filters
Journal ArticleDOI
LIBSVM: A library for support vector machines
Chih-Chung Chang,Chih-Jen Lin +1 more
TL;DR: Issues such as solving SVM optimization problems theoretical convergence multiclass classification probability estimates and parameter selection are discussed in detail.
Journal ArticleDOI
SMOTE: synthetic minority over-sampling technique
TL;DR: In this article, a method of over-sampling the minority class involves creating synthetic minority class examples, which is evaluated using the area under the Receiver Operating Characteristic curve (AUC) and the ROC convex hull strategy.
Journal ArticleDOI
SMOTE: Synthetic Minority Over-sampling Technique
TL;DR: In this article, a method of over-sampling the minority class involves creating synthetic minority class examples, which is evaluated using the area under the Receiver Operating Characteristic curve (AUC) and the ROC convex hull strategy.
Journal ArticleDOI
A study of the behavior of several methods for balancing machine learning training data
TL;DR: This work performs a broad experimental evaluation involving ten methods, three of them proposed by the authors, to deal with the class imbalance problem in thirteen UCI data sets, and shows that, in general, over-sampling methods provide more accurate results than under-sampled methods considering the area under the ROC curve (AUC).
Book ChapterDOI
Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning
TL;DR: Two new minority over-sampling methods are presented, borderline- SMOTE1 and borderline-SMOTE2, in which only the minority examples near the borderline are over- Sampling, which achieve better TP rate and F-value than SMOTE and random over-Sampling methods.