SCUT: Multi-class imbalanced data classification using SMOTE and cluster-based undersampling
read more
Citations
Machine learning workflows to estimate class probabilities for precision cancer diagnostics on DNA methylation microarray data.
Spruce budworm tree host species distribution and abundance mapping using multi-temporal Sentinel-1 and Sentinel-2 satellite imagery
Using Information on Class Interrelations to Improve Classification of Multiclass Imbalanced Data: A New Resampling Algorithm
What makes multi-class imbalanced problems difficult? An experimental study
A clustering based ensemble of weighted kernelized extreme learning machine for class imbalance learning
References
Maximum likelihood from incomplete data via the EM algorithm
SMOTE: synthetic minority over-sampling technique
SMOTE: Synthetic Minority Over-sampling Technique
Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning
KEEL Data-Mining Software Tool: Data Set Repository, Integration of Algorithms and Experimental Analysis Framework
Related Papers (5)
Frequently Asked Questions (12)
Q2. What are the future works mentioned in the paper "Scut: multi-class imbalanced data classification using smote and cluster-based undersampling" ?
Designing a multi-class cost-sensitive learning approach for inconsistent costs without transforming the problem into a binary-class problem will be the focus of their future work.
Q3. How does the SCUT method help to reduce between class imbalance?
the combination of cluster-based undersampling and SMOTE aids to reduce between-class imbalance, without excessive use of sampling.
Q4. What is the focus of the future work?
Designing a multi-class cost-sensitive learning approach for inconsistent costs without transforming the problem into a binary-class problem will be the focus of their future work.
Q5. what is the cost-sensitive learning approach for binary class problems?
which is a popular cost-sensitive learning approach for binary class problems can be applied directly on multi-class datasets to obtain good performance only when the costs are consistent (Zhou and Liu, 2010).
Q6. How do you plan to extend your approach to large datasets?
The authors also intend extending their approach to very large datasets with extreme levels of imbalances, since their early results indicate that their SCUT approach would potentially outperform undersampling-only techniques in such a setting.
Q7. How does SCUT improve the classification performance on multi-class imbalanced datasets?
In this paper, the authors have proposed a hybrid sampling method called SCUT which combines SMOTE and cluster-based undersampling to improve the classification performance on multi-class imbalanced datasets.
Q8. What is the main idea behind the hybrid sampling method?
When used in conjunction with SMOTE, the hybrid sampling method thus aid to ensure that between-class imbalance is reduced without excessive use of oversampling and undersampling.
Q9. What is the common approach for a multiclass dataset?
For instance, the One-versus-one (OVO) approach employs multiple classifiers for each possible pair of classes, discarding the remaining instances that do not belong to the pair under consideration.
Q10. How many instances are obtained from class 2?
For class 2, the number of examples is 81, so EM is applied and 3 clusters are obtained, with the numbers of instances equal to 29, 17 and 35 respectively.
Q11. How many instances are selected from the class?
Input: Dataset D with n classes Output: Dataset D' with all classes having m instances, where m is the mean number of instances of all classesSplit D into D1, D2, D3, ..., Dn where Di is a single class Calculate mUndersampling: For each Di, i=1,2, ... , n where number of instances >mCluster Di using EM algorithm
Q12. How many clusters are created in the EM experiment?
In order to determine the number of clusters, cross validation is performed as follows:1. Initially, the number of clusters is set to one (1).