Imbalanced-learn: a python toolbox to tackle the curse of imbalanced datasets in machine learning

Open AccessJournal Article

Imbalanced-learn: a python toolbox to tackle the curse of imbalanced datasets in machine learning

Guillaume Lemaitre, +2 more

- 01 Jan 2017 -

Journal of Machine Learning Research

- Vol. 18, Iss: 1, pp 559-563

TLDR

imbalanced-learn as mentioned in this paper is an open-source python toolbox aiming at providing a wide range of methods to cope with the problem of imbalanced dataset frequently encountered in machine learning and pattern recognition.

Abstract:

imbalanced-learn is an open-source python toolbox aiming at providing a wide range of methods to cope with the problem of imbalanced dataset frequently encountered in machine learning and pattern recognition. The implemented state-of-the-art methods can be categorized into 4 groups: (i) under-sampling, (ii) over-sampling, (iii) combination of over-and under-sampling, and (iv) ensemble learning methods. The proposed toolbox depends only on numpy, scipy, and scikit-learn and is distributed under MIT license. Furthermore, it is fully compatible with scikit-learn and is part of the scikit-learn-contrib supported project. Documentation, unit tests as well as integration tests are provided to ease usage and contribution. Source code, binaries, and documentation can be downloaded from https://github.com/scikit-learn-contrib/imbalanced-learn.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

SMOTE for learning from imbalanced data: progress and challenges, marking the 15-year anniversary

Alberto Fernández, +3 more

- 01 Jan 2018 -

Journal of Artificial Intelligence Resea...

TL;DR: The Synthetic Minority Oversampling Technique (SMOTE) preprocessing algorithm is considered "de facto" standard in the framework of learning from imbalanced data because of its simplicity in the design, as well as its robustness when applied to different type of problems.

...read moreread less

Journal ArticleDOI

A comprehensive survey on machine learning for networking: evolution, applications and research opportunities

Raouf Boutaba, +7 more

- 21 Jun 2018 -

Journal of Internet Services and Applica...

TL;DR: This survey delineates the limitations, give insights, research challenges and future opportunities to advance ML in networking, and jointly presents the application of diverse ML techniques in various key areas of networking across different network technologies.

...read moreread less

Journal ArticleDOI

Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE

Georgios Douzas, +1 more

- 01 Oct 2018 -

Information Sciences

TL;DR: This work presents a simple and effective oversampling method based on k-means clustering and SMOTE (synthetic minority oversampled technique), which avoids the generation of noise and effectively overcomes imbalances between and within classes.

...read moreread less

Journal ArticleDOI

BioSeq-Analysis: a platform for DNA, RNA and protein sequence analysis based on machine learning approaches

Bin Liu

- 19 Jul 2019 -

Briefings in Bioinformatics

TL;DR: Experimental results showed that the predictors generated by BioSeq-Analysis even outperformed some state-of-the-art methods and will become a useful tool for biological sequence analysis.

...read moreread less

Journal ArticleDOI

Comparative cellular analysis of motor cortex in human, marmoset and mouse

Trygve E. Bakken, +121 more

- 01 Oct 2021 -

Nature

TL;DR: The primary motor cortex (M1) is essential for voluntary fine-motor control and is functionally conserved across mammals using high-throughput transcriptomic and epigenomic profiling of more than 450k single nuclei in humans, marmoset monkeys and mice as mentioned in this paper.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

SMOTE: synthetic minority over-sampling technique

Nitesh V. Chawla, +3 more

- 01 Jan 2002 -

Journal of Artificial Intelligence Resea...

TL;DR: In this article, a method of over-sampling the minority class involves creating synthetic minority class examples, which is evaluated using the area under the Receiver Operating Characteristic curve (AUC) and the ROC convex hull strategy.

...read moreread less

Journal ArticleDOI

Learning from Imbalanced Data

Haibo He, +1 more

- 01 Sep 2009 -

IEEE Transactions on Knowledge and Data ...

TL;DR: A critical review of the nature of the problem, the state-of-the-art technologies, and the current assessment metrics used to evaluate learning performance under the imbalanced learning scenario is provided.

...read moreread less

Book ChapterDOI

Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning

Hui Han, +2 more

TL;DR: Two new minority over-sampling methods are presented, borderline- SMOTE1 and borderline-SMOTE2, in which only the minority examples near the borderline are over- Sampling, which achieve better TP rate and F-value than SMOTE and random over-Sampling methods.

...read moreread less

Proceedings ArticleDOI

ADASYN: Adaptive synthetic sampling approach for imbalanced learning

Haibo He, +3 more

TL;DR: Simulation analyses on several machine learning data sets show the effectiveness of the ADASYN sampling approach across five evaluation metrics.

...read moreread less

Proceedings Article

Addressing the Curse of Imbalanced Training Sets: One-Sided Selection.

Miroslav Kubat, +1 more

TL;DR: Criteria to evaluate the utility of clas-siiers induced from such imbalanced training sets are discussed, explanation of the poor behavior of some learners under these circumstances is given, and a simple technique called one-sided selection of examples is suggested.

...read moreread less