ADASYN: Adaptive synthetic sampling approach for imbalanced learning
Haibo He,Yang Bai,E.A. Garcia,Shutao Li +3 more
- pp 1322-1328
Reads0
Chats0
TLDR
Simulation analyses on several machine learning data sets show the effectiveness of the ADASYN sampling approach across five evaluation metrics.Abstract:
This paper presents a novel adaptive synthetic (ADASYN) sampling approach for learning from imbalanced data sets. The essential idea of ADASYN is to use a weighted distribution for different minority class examples according to their level of difficulty in learning, where more synthetic data is generated for minority class examples that are harder to learn compared to those minority examples that are easier to learn. As a result, the ADASYN approach improves learning with respect to the data distributions in two ways: (1) reducing the bias introduced by the class imbalance, and (2) adaptively shifting the classification decision boundary toward the difficult examples. Simulation analyses on several machine learning data sets show the effectiveness of this method across five evaluation metrics.read more
Citations
More filters
Journal ArticleDOI
Learning from Imbalanced Data
Haibo He,E.A. Garcia +1 more
TL;DR: A critical review of the nature of the problem, the state-of-the-art technologies, and the current assessment metrics used to evaluate learning performance under the imbalanced learning scenario is provided.
Journal ArticleDOI
Learning from imbalanced data: open challenges and future directions
TL;DR: Seven vital areas of research in this topic are identified, covering the full spectrum of learning from imbalanced data: classification, regression, clustering, data streams, big data analytics and applications, e.g., in social media and computer vision.
Proceedings ArticleDOI
Class-Balanced Loss Based on Effective Number of Samples
TL;DR: This work designs a re-weighting scheme that uses the effective number of samples for each class to re-balance the loss, thereby yielding a class-balanced loss and introduces a novel theoretical framework to measure data overlap by associating with each sample a small neighboring region rather than a single point.
Journal ArticleDOI
An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics
TL;DR: This work carries out a thorough discussion on the main issues related to using data intrinsic characteristics in this classification problem, and introduces several approaches and recommendations to address these problems in conjunction with imbalanced data.
Journal ArticleDOI
SMOTE for learning from imbalanced data: progress and challenges, marking the 15-year anniversary
TL;DR: The Synthetic Minority Oversampling Technique (SMOTE) preprocessing algorithm is considered "de facto" standard in the framework of learning from imbalanced data because of its simplicity in the design, as well as its robustness when applied to different type of problems.
References
More filters
Journal ArticleDOI
SMOTE: synthetic minority over-sampling technique
TL;DR: In this article, a method of over-sampling the minority class involves creating synthetic minority class examples, which is evaluated using the area under the Receiver Operating Characteristic curve (AUC) and the ROC convex hull strategy.
Journal ArticleDOI
A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting
Yoav Freund,Robert E. Schapire +1 more
TL;DR: The model studied can be interpreted as a broad, abstract extension of the well-studied on-line prediction model to a general decision-theoretic setting, and it is shown that the multiplicative weight-update Littlestone?Warmuth rule can be adapted to this model, yielding bounds that are slightly weaker in some cases, but applicable to a considerably more general class of learning problems.
Journal ArticleDOI
SMOTE: Synthetic Minority Over-sampling Technique
TL;DR: In this article, a method of over-sampling the minority class involves creating synthetic minority class examples, which is evaluated using the area under the Receiver Operating Characteristic curve (AUC) and the ROC convex hull strategy.
Proceedings Article
Experiments with a new boosting algorithm
Yoav Freund,Robert E. Schapire +1 more
TL;DR: This paper describes experiments carried out to assess how well AdaBoost with and without pseudo-loss, performs on real learning problems and compared boosting to Breiman's "bagging" method when used to aggregate various classifiers.